Re: Questions about the hash function transition
On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason wrote: > If we instead had something like clean/smudge filters: > > [extensions] > objectFilter = sha256-to-sha1 > compatObjectFormat = sha1 > [objectFilter "sha256-to-sha1"] > clean = ... > smudge = ... > > We could apply arbitrary transformations on objects through filters > which would accept/return some simple format requesting them to > translate such-and-such objects, and would either return object > names/types under which to store them, or "nothing to do". If I'm understanding you correctly, then on the libgit2 side, I'm very much opposed to this proposal. We never execute commands, nor do I want to start thinking that we can do so arbitrarily. We run in environments where that's a non-starter At present, in libgit2, users can provide their own mechanism for running clean/smudge filters. But hash transformation / compatibility is going to be a crucial compatibility component. So this is not something that we could simply opt out of or require users to implement themselves. -ed
Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files
On Sat, Aug 04, 2018 at 08:54:49AM -0700, Junio C Hamano wrote: > > My first reaction was to say that I am not going to take a new > command written only for bash with full bashism, even if it came > with docs, tests nor Makefile integration, for Git itself. Then I > reconsidered, as not everything related to Git is git-core, and all > of the above traits are sign of this patch _not_ meant for git-core. Yes, obviously I was not suggesting that this would be mergeable with the bashims, as I mentioned in my cover letter. In any case, it sounds like you're not particularly interested in this, although I certainly appreciate you taking the time to suggest improvements despite that. There's some good feedback there. Cheers- -ed
[RFC PATCH 1/1] recover: restoration of deleted worktree files
Introduce git-recover, a simple script to aide in restoration of deleted worktree files. This will look for unreachable blobs in the object database and prompt users to restore them to disk, either interactively or on the command-line. --- git-recover.sh | 311 + 1 file changed, 311 insertions(+) create mode 100755 git-recover.sh diff --git a/git-recover.sh b/git-recover.sh new file mode 100755 index 0..651d4116f --- /dev/null +++ b/git-recover.sh @@ -0,0 +1,311 @@ +#!/usr/bin/env bash +# +# This program helps recover files in your repository that were deleted +# from the working tree. +# +# Copyright (c) 2017-2018 Edward Thomson. + +set -e + +IFS=$'\n' + +PROGNAME=$(echo "$0" | sed -e 's/.*\///') +GIT_DIR=$(git rev-parse --git-dir) + +DO_RECOVER=0 +DO_FULL=0 +DO_INTERACTIVE=0 +BLOBS=() +FILENAMES=() + +function die_usage { + echo "usage: $PROGNAME [-a] [-i] [--full] [ [-f ] ...]" >&2 + exit 1 +} + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--all) + DO_RECOVER=1 + ;; + -i|--interactive) + DO_INTERACTIVE=1 + ;; + --full) + DO_FULL=1 + ;; + *) + if [ "${1:0:1}" == "-" ]; then + echo "$PROGNAME: unknown argument: $1" >&2 + die_usage + fi + BLOBS+=("$1") + + shift + if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then + shift + if [ $# == 0 ]; then + die_usage + fi + FILENAMES+=("$1") + shift + else + FILENAMES+=("") + fi + continue + ;; + esac + shift +done + +if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then + die_usage +elif [ ${#BLOBS[@]} != 0 ]; then + DO_RECOVER=1 +fi + +case "$OSTYPE" in + darwin*|freebsd*) IS_BSD=1 ;; + *) IS_BSD=0 ;; +esac + +function expand_given_blobs() { + for i in "${!BLOBS[@]}"; do + ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true) + + if [ -z "$ID" ]; then + echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 1>&2 + exit 1 + fi + + TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true) + + if [ "$TYPE" != "blob" ]; then + echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2 + exit + fi + + BLOBS[$i]=$ID + done +} + +# find all the unreachable blobs +function find_unreachable() { + FULLNESS="--no-full" + + if [ $DO_FULL == 1 ]; then FULLNESS="--full"; fi + + BLOBS=($(git fsck --unreachable --no-reflogs \ + "${FULLNESS}" --no-progress | sed -ne 's/^unreachable blob //p')) +} + +function read_one_file { + BLOB=$1 + FILTER_NAME=$2 + ARGS=() + + if [ -z "$FILTER_NAME" ]; then + ARGS+=("blob") + else + ARGS+=("--filters" "--path=$FILTER_NAME") + fi + + git cat-file "${ARGS[@]}" "$BLOB" +} + +function write_one_file { + BLOB=$1 + FILTER_NAME=$2 + OUTPUT_NAME=$3 + + ABBREV=$(git rev-parse --short "${BLOB}") + + echo -n "Writing $ABBREV: " + read_one_file "$BLOB" "$FILTER_NAME" > "$OUTPUT_NAME" + echo "$OUTPUT_NAME." +} + +function unique_filename { + if [ ! -f "${BLOB}" ]; then + echo "$BLOB" + else + cnt=1 + while true + do + fn="${BLOB}~${cnt}" + if [ ! -f "${fn}" ]; then + echo "${fn}" + break + fi + cnt=$((cnt+1)) + done + fi +} + +function write_recoverable { + for i in "${!BLOBS[@]}"; do + BLOB=${BLOBS[$i]} + FILTER_NAME=${FILENAMES[$i]} + OUTPUT_NAME=${FILENAMES[$i]:-$(unique_filename)} + + write_one_file "$BLOB" "$FILTER_NAME" "$OUTPUT_NAME" + done +} + +function file_time { + if [ $IS_BSD == 1 ]; then + stat -f %c "$1" + else +
[RFC PATCH 0/1] Introduce git-recover
Hello- I created a simple shell script a while back to help people recover files that they deleted from their working directory (but had been added to the repository), which looks for unreachable blobs in the object database and places them in the working directory (either en masse, interactively, or via command-line arguments). This has been available at https://github.com/ethomson/git-recover for about a year, and in that time, someone has suggested that I propose this as part of git itself. So I thought I'd see if there's any interest in this. If there is, I'd like to get a sense of the amount of work required to make this suitable for inclusion. There are some larger pieces of work required -- at a minimum, I think this requires: - Tests -- there are none, which is fine with me but probably less fine for inclusion here. - Documentation -- the current README is below but it will need proper documentation that can be rendered into manpages, etc, by the tools. - Remove bashisms -- there are many. Again, this may not be particularly interesting, but I thought I'd send it along in case it is. Cheers- -ed --- git-recover allows you to recover some files that you've accidentally deleted from your working directory. It helps you find files that exist in the repository's object database - because you ran git add - but were never committed. Getting Started --- The simplest way to use git-recover is in interactive mode - simply run `git-recover -i` and it will show you all the files that you can recover and prompt you to act. Using git-recover - Running git-recover without any arguments will list all the files (git "blobs") that were recently orphaned, by their ID. (Their filename is not known.) You can examine these blobs by running `git show `. If you find one that you want to recover, you can provide the ID as the argument to git-recover. You can specify the `--filename` option to write the file out and apply any filters that are set up in the repository. For example: git-recover 38762cf7f55934b34d179ae6a4c80cadccbb7f0a \ --filename shattered.pdf You can also specify multiple files to recover, each with an optional output filename: git-recover 38762c --filename one.txt cafebae --filename bae.txt If you want to recover _all_ the orphaned blobs in your repository, run `git-recover --all`. This will write all the orphaned files to the current working directory, so it's best to run this inside a temporary directory beneath your working directory. For example: mkdir _tmp && cd _tmp && git-recover --all By default, git-recover limits itself to recently created orphaned blobs. If you want to see _all_ orphaned files that have been created in your repository (but haven't yet been garbage collected), you can run: git-recover --full Options --- git-recover [-a] [-i] [--full] [ [-f ] ...] -a, --all Write all orphaned blobs to the current working directory. Each file will be named using its 40 character object ID. -i, --interactive Display information about each orphaned blob and prompt to recover it. --full List or recover all orphaned blobs, even those that are in packfiles. By default, `git-recover` will only look at loose object files, which limits it to the most recently created files. Examining packfiles may be slow, especially in large repositories. The object ID (or its abbreviation) to recover. The file will be written to the current working directory and named using its 40 character object ID, unless the `-f` option is specified. -f , --filename When specified after an object ID, the file written will use this filename. In addition, any filters (for example: CRLF conversion or Git-LFS) will be run according to the `gitattributes` configuration. Edward Thomson (1): recover: restoration of deleted worktree files git-recover.sh | 311 + 1 file changed, 311 insertions(+) create mode 100755 git-recover.sh -- 2.0.0 (libgit2)
Re: Hash algorithm analysis
On Fri, Jul 20, 2018 at 09:52:20PM +, brian m. carlson wrote: > > To summarize the discussion that's been had in addition to the above, > Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b > over SHA-256 over SHA3-256, although any of them would be fine. > > Are there other contributors who have a strong opinion? Are there > things I can do to help us coalesce around an option? Overall, I prefer SHA-256. I mentioned this at the contributor summit - so this may have been captured in the notes. But if not, when I look at this from the perspective of my day job at Notorious Big Software Company, we would prefer SHA-256 due to its performance characteristics and the availability of hardware acceleration. We think about git object ids in a few different ways: Obviously we use git as a version control system - we have a significant investment in hosting repositories (for both internal Microsoft teams and our external customers). What may be less obvious is that often, git blob ids are used as fingerprints: on a typical Windows machine, you don't have the command-line hash functions (md5sum and friends), but every developer has git installed. So we end up calculating git object ids in places within the development pipeline that are beyond the scope of just version control. Not to dwell too much on implementation details, but this is especially advantageous for us in (say) labs where we can ensure that particular hardware is available to speed this up as necessary. Switching gears, if I look at this from the perspective of the libgit2 project, I would also prefer SHA-256 or SHA3 over blake2b. To support blake2b, we'd have to include - and support - that code ourselves. But to support SHA-256, we would simply use the system's crypto libraries that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or SecureTransport). All of those support SHA-256 and none of them include support for blake2b. That means if there's a problem with (say) OpenSSL's SHA-256 implementation, then it will be fixed by their vendor. If there's a problem with libb2, then that's now my responsibility. This is not to suggest that one library is of higher or lower quality than another. And surely we would try to use the same blake2b library that git itself is using to minimize some of this risk (so that at least we're all in the same boat and can leverage each other's communications to users) but even then, there will be inevitable drift between our vendored dependencies and the upstream code. You can see this in action in xdiff: git's xdiff has deviated from upstream, and libgit2 has taken git's and ours has deviated from that. Cheers- -ed
Re: [PATCH] checkout files in-place
On Tue, Jun 12, 2018 at 09:13:54AM +0300, Orgad Shaneh wrote: > Some of my colleagues use an ancient version of Source Insight, which also > locks files for write. If that application is locking files for writing (that is to say, it did not specify the `FILE_SHARE_WRITE` bit in the sharing modes during `CreateFile`) then this patch would not help. Applications, generally speaking, should be locking files for write. It's the default in Win32 and .NET's file open APIs because few applications are prepared to detect and support a file changing out from underneath them in the middle of a read. > It's less important than it was before those fixes, but it is still needed > for users of Qt Creator 4.6 (previous versions just avoided mmap, 4.7 uses > mmap only for system headers). Other tools on Windows might as well > misbehave. I don't understand what mmap'ing via `CreateFileMapping` has to do with this. It takes an existing `HANDLE` that was opened with `CreateFile`, which is where the sharing mode was supplied. I would be surprised if there are other tools on Windows that have specified `FILE_SHARE_WRITE` but not `FILE_SHARE_DELETE`. Generally speaking, if you don't care about another process changing a file underneath you then you should specify both. If you do then you should specify neither. I'm not saying that git shouldn't work around a bug in QT Creator - that's not my call, though I would be loathe to support this configuration option in libgit2. But I am saying that it seems like this patch doesn't have broad applicability beyond that particular tool. -ed
Re: [PATCH] checkout files in-place
On Sun, Jun 10, 2018 at 09:44:45PM +0200, Clemens Buchacher wrote: > > It is safe to do this on Linux file systems, even if open file handles > still exist, because unlink only removes the directory reference to the > file. On Windows, however, a file cannot be deleted until all handles to > it are closed. If a file cannot be deleted, its name cannot be reused. I'm nervous about this proposed change, since it feels like it's addressing an issue that only exists in QT Creator. You've accurately described the default semantics in Win32. A file cannot be deleted until all handles to it are closed, unless it was opened with `FILE_SHARE_DELETE` as their sharing mode. This is not the default sharing mode in either Win32 or .NET. However, for your patch to have an effect, all processes with a handle open must have specified `FILE_SHARE_WRITE`. This is rather uncommon, since it's also not included in the default Win32 or .NET sharing mode. This is because it's uncommon that you would want other processes to change the data underneath you in between ReadFile() calls. So your patch will benefit people who have processes that have `FILE_SHARE_WRITE` set but not `FILE_SHARE_DELETE` set, which I think is generally an uncommon scenario to want to support. Generally if you're willing to accept files changing underneath you, then you probably want to allow them to be deleted, too. So this feels like something that's very specific to QT Creator. Or are there other IDEs or development tools that use these open semantics that I'm not aware of? Cheers- -ed
Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files
On Mon, Mar 26, 2018 at 2:08 PM, Derrick Stoleewrote: > Since most heavily-used tools that didn't spawn Git processes use > LibGit2 to interact with Git repos, I added Ed Thomson to CC to see > if libgit2 could ever write these bad header comments. We added the `sorted` capability to our `packed-refs` header relatively recently (approximately two months ago, v0.27.0 will be the first release to include it as of today). So, at the moment, libgit2 writes: "# pack-refs with: peeled fully-peeled sorted " Prior to this change, libgit2's header was stable for the last five years as: "# pack-refs with: peeled fully-peeled " And prior to that, we advertised only `peeled`: "# pack-refs with: peeled " Thanks for thinking of us! -ed
Contributor Summit Dinner
Hello- Microsoft would like to invite the contributors summit to dinner tonight; it will take place at 8pm: La Tagliatella (Placa Catalunya) Ronda de la Universitat, 31 It's conveniently right around the corner from the Beers with Bitbucket event, in case you're going to that. Hope to see you there. Thanks- -ed
[ANNOUNCE] libgit2 summit: 9 March 2018, Barcelona
The libgit2 community is hosting the libgit2 Summit 2018 on March 9th 2018 in Barcelona. As peff previously announced, Git Merge 2018 is happening in Barcelona on March 8th (and the Git Contributor's Summit is planned for March 7.) We will follow the Git Merge event and have our summit on March 9th so that people can attend both summits. We ask that you please register ahead of time. https://goo.gl/forms/3HdRBeBWixxQE3rf1 Like the Git Contributor's Summit, we will likely start with a business update on libgit2 and the overall community. After that, we will open the floor for discussions and/or presentations. We will also provide some time to provide a getting started discussion: if you think you might want to use or contribute to libgit2, then we'll let you know how easy it is to get started working with the library. We also hope that we'll have some time to do some hacking. If you're a prior contributor to libgit2, you use the library, or you think that you might be interested, we encourage you to attend. Looking forward to seeing you in Barcelona. -ed
Re: git merge-tree: bug report and some feature requests
On Tue, Jan 23, 2018 at 7:08 AM, Josh Bleecher Snyderwrote: > Looking over your list above, at a minimum, libgit2 might not have a > particularly good way to represent submodule/file or > submodule/directory conflicts, because is-a-submodule is defined > external to a git_index_entry. libgit2 should detect submodule/file and submodule/directory conflicts. While, yes, some metadata about the submodule is located outside the index, you can look at the mode to determine that this _is_ a submodule. You should be able to reliably detect submodule/file conflicts because one will be a regular or executable file (mode 0100644 or 0100755), while the other entry at the same path will be a gitlink (mode 016). Similarly, submodule/directory conflict detection works just like regular file/directory conflict detection. If some path `foo` is a submodule (or a regular file) then some path `foo/bar` existing in the other side of the merge causes a conflict. > Cataloging or special-casing all possible conflict types does seem > unwise because of the sheer number of kinds of conflicts. > > But the alternative appears to be punting entirely, as libgit2 does, > and merely providing something akin to three index entries. Indeed, when I added merge to libgit2, we put the higher-level conflict analysis into application code because there was not much interest in it at the time. I've been meaning to add this to `git_status` in libgit2, but it's not been a high priority. > This which > leaves it unclear what exactly the conflict was, at which point any > user (read: porcelain developer) will end up having to recreate some > merge logic to figure out what went wrong. And if merge-tree starts > doing rename detection, the user might then have to emulate that as > well. That's not a good idea. Trying to figure out what merge did would be painful at best, and likely impossible, since a rename conflict is recorded in the main index without any way to piece it together. eg: 100644 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 1 bar.c 100644 cafec4fecafec4fecafec4fecafec4fecafec4fe 2 bar.c 100644 c4cc188a892898e13927dc4a02e7f68814b874b2 1 foo.c 100644 71f5af150b25e3d2d67ff46759311401036f 2 foo.c 100644 351cfbdd55d656edd2c5c995aae3caafb9ec11fa 3 rename1.c 100644 e407c7d138fb457674c3b114fcf47748169ab0c5 3 rename2.c This is the main index that results when bar.c has a rename/edit conflict, and foo.c also has a rename/edit conflict. One was renamed to rename1.c and the other to rename2.c. Trying to determine which is which _after the fact_ would be regrettable. Especially since rename detection is not static - you would need to know the thresholds that were configured at the time merge was performed and try to replay the rename detection with that. libgit2 records a `NAME` section in the index that pairs the rename detection decisions that it performed so that you can analyze them and display them after the fact. -ed
[ANNOUNCE] Git London User Group: 16 January 2018
Git London User Group: 16 January, 2018 === I'm pleased to announce the formation of the Git London User Group, where Git users and experts from throughout the UK can get together to share tips, experience and assistance for using Git successfully. The first meeting takes place Tuesday, 16 January, 2018 at 19:00. Agenda -- Extending Git through Scripting: Charles Bailey Git is the most popular version control system in use today; it is highly flexible and supports many different workflows. One of its strengths is its openness to scripting. This talk looks at the basic principles that support best practice for scripting Git and how to avoid some common pitfalls. Building Git Tools with libgit2: Edward Thomson Edward introduces the libgit2 framework (http://libgit2.org), which is a portable, implementation of Git as a library. If you're looking for more advanced programmatic access to working with Git repositories, libgit2 is a good option, which is why it's used by many Git servers like GitHub and VSTS and clients like gmaster and GitKraken. Edward will introduce libgit2 and some of the language bindings like LibGit2Sharp (for .NET) and Rugged (for Ruby). Location General Assembly. The Relay Building, 1st floor. 114 Whitechapel High Street London, E1 7PT. We ask that you please RSVP at http://londongit.org/. Sponsors A big thank you to the sponsors of the Git London User Group. Bloomberg has been kind enough to sponsor the meeting space for us to use. Microsoft has sponsored food for dinner. And All Things Git (the Podcast about Git) has sponsored meetup and registration fees. Thanks also to Henry Kleynhans, the other organizer of the group. Contact --- Follow us on Meetup by visiting http://londongit.org/, or on Twitter at @londongit. If you're in or around London, we hope that you'll join us next Tuesday! Sincerely, Edward Thomson (ethom...@edwardthomson.com)
Re: [PATCH] add: add --chmod=+x / --chmod=-x options
On Wed, Jun 01, 2016 at 09:00:34AM -0700, Junio C Hamano wrote: > > Unlike the "something like this" we saw earlier, this draws the > boundary of responsibility between the caller and the API at a much > more sensible place. This makes sense to me - Junio, are you taking (or have you already taken) dscho's patch, or would you like me to squash it and resend? Thanks- -ed -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] add: add --chmod=+x / --chmod=-x options
The executable bit will not be detected (and therefore will not be set) for paths in a repository with `core.filemode` set to false, though the users may still wish to add files as executable for compatibility with other users who _do_ have `core.filemode` functionality. For example, Windows users adding shell scripts may wish to add them as executable for compatibility with users on non-Windows. Although this can be done with a plumbing command (`git update-index --add --chmod=+x foo`), teaching the `git-add` command allows users to set a file executable with a command that they're already familiar with. Signed-off-by: Edward Thomson <ethom...@edwardthomson.com> --- builtin/add.c | 12 +++- cache.h| 2 ++ read-cache.c | 11 +-- t/t3700-add.sh | 30 ++ 4 files changed, 52 insertions(+), 3 deletions(-) diff --git a/builtin/add.c b/builtin/add.c index 145f06e..44b6c97 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -238,6 +238,8 @@ static int ignore_add_errors, intent_to_add, ignore_missing; static int addremove = ADDREMOVE_DEFAULT; static int addremove_explicit = -1; /* unspecified */ +static char *chmod_arg = NULL; + static int ignore_removal_cb(const struct option *opt, const char *arg, int unset) { /* if we are told to ignore, we are not adding removals */ @@ -263,6 +265,7 @@ static struct option builtin_add_options[] = { OPT_BOOL( 0 , "refresh", _only, N_("don't add, only refresh the index")), OPT_BOOL( 0 , "ignore-errors", _add_errors, N_("just skip files which cannot be added because of errors")), OPT_BOOL( 0 , "ignore-missing", _missing, N_("check if - even missing - files are ignored in dry run")), + OPT_STRING( 0 , "chmod", _arg, N_("(+/-)x"), N_("override the executable bit of the listed files")), OPT_END(), }; @@ -336,6 +339,11 @@ int cmd_add(int argc, const char **argv, const char *prefix) if (!show_only && ignore_missing) die(_("Option --ignore-missing can only be used together with --dry-run")); + if (chmod_arg) { + if (strcmp(chmod_arg, "-x") && strcmp(chmod_arg, "+x")) + die(_("--chmod param must be either -x or +x")); + } + add_new_files = !take_worktree_changes && !refresh_only; require_pathspec = !(take_worktree_changes || (0 < addremove_explicit)); @@ -346,7 +354,9 @@ int cmd_add(int argc, const char **argv, const char *prefix) (intent_to_add ? ADD_CACHE_INTENT : 0) | (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) | (!(addremove || take_worktree_changes) - ? ADD_CACHE_IGNORE_REMOVAL : 0)); + ? ADD_CACHE_IGNORE_REMOVAL : 0)) | +(chmod_arg && *chmod_arg == '+' ? ADD_CACHE_FORCE_EXECUTABLE : 0) | +(chmod_arg && *chmod_arg == '-' ? ADD_CACHE_FORCE_NOTEXECUTABLE : 0); if (require_pathspec && argc == 0) { fprintf(stderr, _("Nothing specified, nothing added.\n")); diff --git a/cache.h b/cache.h index 6049f86..da03cd9 100644 --- a/cache.h +++ b/cache.h @@ -581,6 +581,8 @@ extern int remove_file_from_index(struct index_state *, const char *path); #define ADD_CACHE_IGNORE_ERRORS4 #define ADD_CACHE_IGNORE_REMOVAL 8 #define ADD_CACHE_INTENT 16 +#define ADD_CACHE_FORCE_EXECUTABLE 32 +#define ADD_CACHE_FORCE_NOTEXECUTABLE 64 extern int add_to_index(struct index_state *, const char *path, struct stat *, int flags); extern int add_file_to_index(struct index_state *, const char *path, int flags); extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned char *sha1, const char *path, int stage, unsigned int refresh_options); diff --git a/read-cache.c b/read-cache.c index d9fb78b..d12d143 100644 --- a/read-cache.c +++ b/read-cache.c @@ -641,6 +641,8 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, int intent_only = flags & ADD_CACHE_INTENT; int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE| (intent_only ? ADD_CACHE_NEW_ONLY : 0)); + int force_executable = flags & ADD_CACHE_FORCE_EXECUTABLE; + int force_notexecutable = flags & ADD_CACHE_FORCE_NOTEXECUTABLE; if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode)) return error("%s: can only add regular files, symbolic links or git-directories", path); @@ -659,9 +661,14 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, else ce->ce_flags |= CE_INTENT_TO_ADD; - if (trust_executable_bit && has_symlinks) + if (S_ISREG(st_m
Re: [PATCH] add: add --chmod=+x / --chmod=-x options
On Fri, May 27, 2016 at 11:30:40AM -0700, Junio C Hamano wrote: > > Oh, having said all of that, the comments on the implementation > still stand. Certainly; sorry for the delay. I've squashed in your tests and applied your recommendations. Resending the patch momentarily. Thanks- -ed -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] add: add --chmod=+x / --chmod=-x options
On Wed, May 25, 2016 at 12:36:55AM -0700, Junio C Hamano wrote: > > At the design level, I have a few comments. Thanks, I will submit a new patch that incorporates your (and dscho's) comments. > * This is about a repository with core.filemode=0; I wonder if >something for a repository with core.symlinks=0 would also help? >That is, would it be a big help to users if they can prepare a >text file that holds symbolic link contents and add it as if it >were a symlink with "git add", instead of having to run two >commands, "hash-objects && update-index --cacheinfo"? I think that this is much less common and - speaking only from personal experience - nobody has ever asked me how to stage a symlink on a Windows machine. I think that this is due to the fact that symlinks on Windows are basically impossible to use, so people doing cross-platform development wouldn't even try. On the other hand, it's quite common for cross-platform teams to use some scripting language since those do work across platforms, and Windows users would want to add new scripts as executable for the benefit of their brethren on platforms with an executable bit. > * I am not familiar with life on filesystems with core.filemode=0; >do files people would want to be able to "add --chmod=+x" share >common trait that can be expressed with .gitattributes mechanism? Perhaps... It would not be things like `*.bat` or `*.exe` - Windows gets those as executable "for free" and would not care about adding the execute bit on those files (since they're not executable anywhere else). It would be items like `*.sh` or `*.rb` that should be executable on POSIX platforms. However I do not think that this is a common enough action that it needs to be made automatic such that when I `git add foo.rb` it is automatically made executable. I think that the reduced complexity of having a single mechanism to control executability (that being the execute mode in the index or a tree) is preferable to a gitattributes based mechanism, at least until somebody else makes a cogent argument that the gitattributes approach would be helpful for them. :) Thanks again for the comments, an updated patch is forthcoming. -ed -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] format_commit_message: honor `color=auto` for `%C(auto)`
On Wed, May 25, 2016 at 05:39:04PM -0500, Jeff King wrote: > Looks like we didn't have any tests at all for %C(auto). And the tests > for %C(auto,...) were labeled as %C(auto), making it all the more > confusing. Perhaps it is worth squashing this in: Thanks, peff. Indeed I did squash that into my updated patch. -ed -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`
git-log(1) documents that when specifying the `%C(auto)` format placeholder will "turn on auto coloring on the next %placeholders until the color is switched again." However, when `%C(auto)` is used, the present implementation will turn colors on unconditionally (even if the color configuration is turned off for the current context - for example, `--no-color` was specified or the color is `auto` and the output is not a tty). Update `format_commit_one` to examine the current context when a format string of `%C(auto)` is specified, which ensures that we will not unconditionally write colors. This brings that behavior in line with the behavior of `%C(auto,)`, and allows the user the ability to specify that color should be displayed only when the output is a tty. Additionally, add a test for `%C(auto)` and update the existing tests for `%C(auto,...)` as they were misidentified as being applicable to `%C(auto)`. Signed-off-by: Edward Thomson <ethom...@edwardthomson.com> Tests from Jeff King. Signed-off-by: Jeff King <p...@peff.net> --- pretty.c | 2 +- t/t6006-rev-list-format.sh | 26 +++--- 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/pretty.c b/pretty.c index 87c4497..c3ec430 100644 --- a/pretty.c +++ b/pretty.c @@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ switch (placeholder[0]) { case 'C': if (starts_with(placeholder + 1, "(auto)")) { - c->auto_color = 1; + c->auto_color = want_color(c->pretty_ctx->color); return 7; /* consumed 7 bytes, "C(auto)" */ } else { int ret = parse_color(sb, placeholder, c); diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh index b77d4c9..a1dcdb8 100755 --- a/t/t6006-rev-list-format.sh +++ b/t/t6006-rev-list-format.sh @@ -184,38 +184,38 @@ commit $head1 [1;31;43mfoo[m EOF -test_expect_success '%C(auto) does not enable color by default' ' +test_expect_success '%C(auto,...) does not enable color by default' ' git log --format=$AUTO_COLOR -1 >actual && has_no_color actual ' -test_expect_success '%C(auto) enables colors for color.diff' ' +test_expect_success '%C(auto,...) enables colors for color.diff' ' git -c color.diff=always log --format=$AUTO_COLOR -1 >actual && has_color actual ' -test_expect_success '%C(auto) enables colors for color.ui' ' +test_expect_success '%C(auto,...) enables colors for color.ui' ' git -c color.ui=always log --format=$AUTO_COLOR -1 >actual && has_color actual ' -test_expect_success '%C(auto) respects --color' ' +test_expect_success '%C(auto,...) respects --color' ' git log --format=$AUTO_COLOR -1 --color >actual && has_color actual ' -test_expect_success '%C(auto) respects --no-color' ' +test_expect_success '%C(auto,...) respects --no-color' ' git -c color.ui=always log --format=$AUTO_COLOR -1 --no-color >actual && has_no_color actual ' -test_expect_success TTY '%C(auto) respects --color=auto (stdout is tty)' ' +test_expect_success TTY '%C(auto,...) respects --color=auto (stdout is tty)' ' test_terminal env TERM=vt100 \ git log --format=$AUTO_COLOR -1 --color=auto >actual && has_color actual ' -test_expect_success '%C(auto) respects --color=auto (stdout not tty)' ' +test_expect_success '%C(auto,...) respects --color=auto (stdout not tty)' ' ( TERM=vt100 && export TERM && git log --format=$AUTO_COLOR -1 --color=auto >actual && @@ -223,6 +223,18 @@ test_expect_success '%C(auto) respects --color=auto (stdout not tty)' ' ) ' +test_expect_success '%C(auto) respects --color' ' + git log --color --format="%C(auto)%H" -1 >actual && + printf "\\033[33m%s\\033[m\\n" $(git rev-parse HEAD) >expect && + test_cmp expect actual +' + +test_expect_success '%C(auto) respects --no-color' ' + git log --no-color --format="%C(auto)%H" -1 >actual && + git rev-parse HEAD >expect && + test_cmp expect actual +' + iconv -f utf-8 -t $test_encoding > commit-msg <http://vger.kernel.org/majordomo-info.html
[PATCH] add: add --chmod=+x / --chmod=-x options
Users on deficient filesystems that lack an execute bit may still wish to add files to the repository with the appropriate execute bit set (or not). Although this can be done in two steps (`git add foo && git update-index --chmod=+x foo`), providing the `--chmod=+x` option to the add command allows users to set a file executable in a single command that they're already familiar with. Signed-off-by: Edward Thomson <ethom...@edwardthomson.com> --- builtin/add.c | 18 +- cache.h| 2 ++ read-cache.c | 6 ++ t/t3700-add.sh | 19 +++ 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/builtin/add.c b/builtin/add.c index 145f06e..2a9abf7 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -238,6 +238,8 @@ static int ignore_add_errors, intent_to_add, ignore_missing; static int addremove = ADDREMOVE_DEFAULT; static int addremove_explicit = -1; /* unspecified */ +static char should_chmod = 0; + static int ignore_removal_cb(const struct option *opt, const char *arg, int unset) { /* if we are told to ignore, we are not adding removals */ @@ -245,6 +247,15 @@ static int ignore_removal_cb(const struct option *opt, const char *arg, int unse return 0; } +static int chmod_cb(const struct option *opt, const char *arg, int unset) +{ + char *flip = opt->value; + if ((arg[0] != '-' && arg[0] != '+') || arg[1] != 'x' || arg[2]) + return error("option 'chmod' expects \"+x\" or \"-x\""); + *flip = arg[0]; + return 0; +} + static struct option builtin_add_options[] = { OPT__DRY_RUN(_only, N_("dry run")), OPT__VERBOSE(, N_("be verbose")), @@ -263,6 +274,9 @@ static struct option builtin_add_options[] = { OPT_BOOL( 0 , "refresh", _only, N_("don't add, only refresh the index")), OPT_BOOL( 0 , "ignore-errors", _add_errors, N_("just skip files which cannot be added because of errors")), OPT_BOOL( 0 , "ignore-missing", _missing, N_("check if - even missing - files are ignored in dry run")), + { OPTION_CALLBACK, 0, "chmod", _chmod, N_("(+/-)x"), + N_("override the executable bit of the listed files"), + PARSE_OPT_NONEG | PARSE_OPT_LITERAL_ARGHELP, chmod_cb}, OPT_END(), }; @@ -346,7 +360,9 @@ int cmd_add(int argc, const char **argv, const char *prefix) (intent_to_add ? ADD_CACHE_INTENT : 0) | (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) | (!(addremove || take_worktree_changes) - ? ADD_CACHE_IGNORE_REMOVAL : 0)); + ? ADD_CACHE_IGNORE_REMOVAL : 0)) | +(should_chmod == '+' ? ADD_CACHE_FORCE_EXECUTABLE : 0) | +(should_chmod == '-' ? ADD_CACHE_FORCE_NOTEXECUTABLE : 0); if (require_pathspec && argc == 0) { fprintf(stderr, _("Nothing specified, nothing added.\n")); diff --git a/cache.h b/cache.h index 6049f86..da03cd9 100644 --- a/cache.h +++ b/cache.h @@ -581,6 +581,8 @@ extern int remove_file_from_index(struct index_state *, const char *path); #define ADD_CACHE_IGNORE_ERRORS4 #define ADD_CACHE_IGNORE_REMOVAL 8 #define ADD_CACHE_INTENT 16 +#define ADD_CACHE_FORCE_EXECUTABLE 32 +#define ADD_CACHE_FORCE_NOTEXECUTABLE 64 extern int add_to_index(struct index_state *, const char *path, struct stat *, int flags); extern int add_file_to_index(struct index_state *, const char *path, int flags); extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned char *sha1, const char *path, int stage, unsigned int refresh_options); diff --git a/read-cache.c b/read-cache.c index d9fb78b..81bf186 100644 --- a/read-cache.c +++ b/read-cache.c @@ -641,6 +641,8 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, int intent_only = flags & ADD_CACHE_INTENT; int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE| (intent_only ? ADD_CACHE_NEW_ONLY : 0)); + int force_executable = flags & ADD_CACHE_FORCE_EXECUTABLE; + int force_notexecutable = flags & ADD_CACHE_FORCE_NOTEXECUTABLE; if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode)) return error("%s: can only add regular files, symbolic links or git-directories", path); @@ -661,6 +663,10 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, if (trust_executable_bit && has_symlinks) ce->ce_mode = create_ce_mode(st_mode); + else if (force_executable) + ce->ce_mode = create_ce_mode(0777); + else if (force_notexecutable) + ce->ce_mode = create_ce_mode(0666); else {
[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`
Check that we are configured to display colors in the given context when the user specifies a format string of `%C(auto)`. This brings that behavior in line with the behavior of `%C(auto,)`, which will display the given color only when the configuration specifies to do so. This allows the user the ability to specify that color should be displayed only when the output is a tty, and to use the default color for the given context (instead of a hardcoded color value). Signed-off-by: Edward Thomson <ethom...@edwardthomson.com> --- pretty.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pretty.c b/pretty.c index 87c4497..c3ec430 100644 --- a/pretty.c +++ b/pretty.c @@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ switch (placeholder[0]) { case 'C': if (starts_with(placeholder + 1, "(auto)")) { - c->auto_color = 1; + c->auto_color = want_color(c->pretty_ctx->color); return 7; /* consumed 7 bytes, "C(auto)" */ } else { int ret = parse_color(sb, placeholder, c); -- 2.6.4 (Apple Git-63) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] format_commit_message: honor `color=auto` for `%C(auto)`
On Tue, May 24, 2016 at 05:55:02PM -0700, Junio C Hamano wrote: > Looks obviously the right thing to do from a cursory read. > Missing sign-off? Yes, apologies for the oversight; will re-send. Thanks- -ed -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`
Check that we are configured to display colors in the given context when the user specifies a format string of `%C(auto)`, instead of always displaying the default color for the given context. This makes `%C(auto)` obey the `color=auto` setting and brings its behavior in line with the behavior of `%C(auto,)`. This allows the user the ability to specify that color should be displayed for a string only when the output is a tty, and to use the default color for the given context without having to hardcode a color value. --- pretty.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pretty.c b/pretty.c index 87c4497..c3ec430 100644 --- a/pretty.c +++ b/pretty.c @@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ switch (placeholder[0]) { case 'C': if (starts_with(placeholder + 1, "(auto)")) { - c->auto_color = 1; + c->auto_color = want_color(c->pretty_ctx->color); return 7; /* consumed 7 bytes, "C(auto)" */ } else { int ret = parse_color(sb, placeholder, c); -- 2.6.4 (Apple Git-63) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking
On Tue, Sep 23, 2014 at 11:47:51AM -0400, Jeff King wrote: As far as I can tell, fsck's mode-checking has been totally broken basically forever. Which makes me a little nervous to fix it. :) linux.git does have some bogus modes, but they are 100664, which is specifically ignored here unless fsck --strict is in effect. I'm in favor of checking the mode in fsck, at least when --strict. But I would suggest we be lax (by default) about other likely-to-exist but strictly invalid modes to prevent peoples previously workable repositories from being now broken. I have, for example, encountered 100775 in the wild, and would argue that like 100644, it should probably not fail unless we are in --strict mode. Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] upload-pack: keep poll(2)'s timeout to -1
Keep poll's timeout at -1 when uploadpack.keepalive = 0, instead of setting it to -1000, since some pedantic old systems (eg HP-UX) and the gnulib compat/poll will treat only -1 as the valid value for an infinite timeout. Signed-off-by: Edward Thomson ethom...@microsoft.com --- upload-pack.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/upload-pack.c b/upload-pack.c index 01de944..433211a 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -167,7 +167,9 @@ static void create_pack_file(void) if (!pollsize) break; - ret = poll(pfd, pollsize, 1000 * keepalive); + ret = poll(pfd, pollsize, + keepalive 0 ? -1 : 1000 * keepalive); + if (ret 0) { if (errno != EINTR) { error(poll failed, resuming: %s, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] upload-pack: keep poll(2)'s timeout to -1
On Fri, Aug 22, 2014 at 12:03:34PM -0400, Jeff King wrote: Yeah, I wasn't thinking we would get negative values from the user (we don't document them at all), but we should probably do something sensible. Let's just leave it at Ed's patch. Thanks, both. Apologies for the dumb question: is there anything additional that I need to do (repost with your Acked-by, for example) or is this adequate as-is? Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Edward Thomson [ethom...@microsoft.com] wrote: Junio C Hamano [mailto:gis...@pobox.com] wrote: * Path A may have only stage #1, while path B and C has only stage #2 and stage #3 (the user would have to notice these three correspond to each other, and resolve manually). You would want to annotate B at stage #2 seems to have been at A in the original (similarly for C#3) if you choose to do so. If we're going to make changes to the way conflicts are recorded in the main index, then I would prefer this approach. It is unambiguous and all data about all sides are recorded, including the names that items had in their respective branches. Junio, did you have additional thoughts on this? What would you like from me to proceed? If the aforementioned seems reasonable, I can update Documentation/technical/index-format.txt and we can iron out the details in that fashion? Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: Edward Thomson ethom...@microsoft.com writes: Junio, did you have additional thoughts on this? Not at this moment. I think we have covered the principles (do not unnecessarily duplicate information, do not break existing implementations unnecessarily, etc.) already, and we know how we want to record one side renamed A to B, the other side renamed A to C case, but I do not think the discussion covered all cases yet. Sorry, I'm not sure what you're asking for - would you just like some more examples of what this looks like with aforementioned exotic conflict types? Or are you looking for something more strict - BNF format, for example? -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gis...@pobox.com] wrote: As long as the format will be backward compatible to allow existing users use existing tools to deal with cases the existing tools can handle, then that is OK. I didn't get that impression which is where my non starter came from. I see now. Thank you for the clarification. I apologize if I was not clear about this; indeed, the duplication of data in my proposed extension was specifically to avoid any compatibility problems amongst clients. In particular, when we have a rename in ours, edit in theirs conflict, we store the conflict at the new (ours) path. If, for example, I rename a-b in my branch and merge a branch that edits a: mode hash 1 b mode hash 2 b mode hash 3 b This prohibits us from storing anything else in the theirs side at that path, so if I were to have added b in their branch in addition to modifying b, I cannot record it. I was assuming that any change to this behavior would be a breaking one, which is where the new section came from. * Path A may have only stage #1, while path B and C has only stage #2 and stage #3 (the user would have to notice these three correspond to each other, and resolve manually). You would want to annotate B at stage #2 seems to have been at A in the original (similarly for C#3) if you choose to do so. If we're going to make changes to the way conflicts are recorded in the main index, then I would prefer this approach. It is unambiguous and all data about all sides are recorded, including the names that items had in their respective branches. I would think that this might be a burden on current tools, however. Now if I rename a-b my just my branch, my conflict will be recorded as: mode hash 1 a mode hash 2 b mode hash 3 a And current git-status will not look at any rename annotations to know how to report this. However, maybe this is not as big a problem as I'm concerned it would be. * You can choose to favor our choice, and have path B with three stages (if we guessed wrong and the user wants to move it to C, the user can resolve and then git mv the path). I think this approach suffers from the drawback that the current approach has, wherein this conflicts when they had path B, also, as noted above. I think that if you were to put both B and C with all three stages, this would be problematic for the same reason. *1* Instead of a three-way merge that inspects only the endpoints, you might get a better rename trail if you looked at the histories of both branches. It would be a lot more expensive than the simple three-way, but burning CPU cycles is better than burning human neurons. For the record, I like this approach very much. It's not something that libgit2 will be able to tackle in the near future; we're in a sort of walk-before-you-can-run situation with merge at the moment, as you can probably see. But any improvement that avoids burning neurons is a valuable one. Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: Edward Thomson ethom...@microsoft.com writes: I would propose that this not simply track rename conflicts, but all conflicts. That is a no starter. So. Can you explain to me why this would be a non starter? Can you suggest some alternate strategy here? Maybe there's something I'm fundamentally misunderstanding. It seems that at present, git will: 1. Detect rename conflicts when performing a merge (at least, git-merge-recursive will, which is the default.) 2. If the rename itself caused a conflict (eg, renamed in one side, added in the other) then the merge cannot succeed. 3. The resultant index is written as if renames were not detected, which means - at best - records the files that went in to the name conflict and git status reports an added in ours conflict, which is a pretty disappointing conflict. Often, though, many of the files will not exist at higher stage entries, since without rename detection, they would have not been conflicts. At worst, one side is staged, there are no conflicts in the index and the user can commit (and thus lose the other side.) Thus it's not like we could add some extension that merely records the names that produced the rename conflicts and point them at the higher stage entries in the index. That would require that they actually be in the index. Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: We do not gratuitously break existing implementations. If no conflict is stored as higher-stage index entries in an index that has your index extension, no existing implementation can read a conflicted index written by your implementation and have users resolve conflicts. I'm not suggesting that anybody stop writing 0 stage entries. When a path originally at A is moved to B on only one branch, and there are content-level conflicts between the changes made by one branch (while going from A to B) and by the other branch (while keeping it at A), we would end up having three stages for path B without any trace of path A. I do not offhand know how much it helps to learn A in such a situation in the real life, but we are indeed losing information, and I do not have any problem with an extension that records in the index the fact that in the two (of the three) commits involved in the merge, the path was at A. What you've described is true only for a certain class of rename conflicts, for example the rename/edit conflict you've described above. It's also true if you were to rename some item 'a' to 'b' in both branches. But when 'b' is sufficiently dissimilar to become a rewrite, then I end up with a rename of a-b on one side and deleting a and adding b on the other. The result is a mysterious added by us conflict: 100644 e2dd530c9f31550a2b0c90773ccde056929d6d66 2 b Worse yet is if I don't do the rename in my side, but I just add a new b so that in theirs I've renamed a to b and in mine I have both a and b. When I do the merge, I'm told I have conflicts, except that I don't: 100644 08d4f831774aed5d4c6cb496affefd4020dce40c 0 b The other branch's b is long gone and exists only as a dirty file in the workdir. But people have been successfully using existing versions of Git without that information to merge branches with renames, and resolving the content-level conflicts. But you aren't afforded the option to resolve content-level conflicts if you don't know where the conflict came from. For example, in a rename 1-2 conflict, we dutifully detect that a was renamed to both b and c and fail, but that fact is never given to the index. This conflict could be fed into a merge tool or, better, automerged, with the user only needing to pick a path: 100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 1 a 100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 3 b 100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 2 c I hate to sound like a broken record here, but without some more data in the index - anywhere, really - any tool that doesn't have the luxury of emitting data about what happened to stdout certainly can't infer anything about what happened in the merge. -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: Edward Thomson ethom...@microsoft.com writes: I would propose that we store the data about the file in conflict as it occurred through the renames. For example, in a rename 1-2 conflict where A was renamed to both B and C, you would have a single conflict entry containing the data for A, B and C. This would allow us to provide more detailed information to the user - and allow them to (say) choose a single name to proceed with. Is this something that has value to core git as well? Alternately, is there something particularly stupid about this proposal? I do not offhand see anything particularly stupid; a new optional index extension section CACHE_EXT_RENAME_CONFLICT might be a good addition. Is one side moves A to B while the other side moves it to C the only case, or is it just an example? Off the top of my head, one side moves A to x while the other side moves B to x/y would also be something we would want to know. I am sure there are other cases that need to be considered. I do not think we can discuss the design at the concrete level until the proposal spells out to cover all interesting cases in order for implementations to agree on the common semantics. Sorry about the delay here: besides getting busy with some other things, I wanted both a complete writeup and to have taken a pass at a test implementation this in libgit2 to make sure seemed like a reasonably sensible approach. I would propose a new extension, 'CONF', to handle conflict data, differing from the stage 0 entries in the index in that this extension tracks the conflicting file across names if the underlying merge engine has support for renames. I made an attempt to keep the entry data similar to other entries in the index. I would propose that entries in the conflict are as follows: Flags Four octets that describe the conflict. Data includes: 0x01 HAS_ANCESTOR There is a file in the common ancestor branch that contributes to this conflict. Its data will follow. 0x02 HAS_OURS There is a file in our branch that contributes to this conflict. Its data will follow. 0x04 HAS_THEIRS There is a file in their branch that contributes to this conflict. Its data will follow. 0x08 NAME_CONFLICT_OURS This item has a path in our branch that overlaps a different item in their branch. (Eg, this conflict represents the our side of a rename/add conflict.) 0x10 NAME_CONFLICT_THEIRS This item has a path in their branch that overlaps a different item in our branch. (Eg, this conflict represents the theirs side of a rename/add conflict.) 0x20 DF_CONFLICT_FILE This is the file involved in a directory/file conflict. 0x40 DF_CONFLICT_CHILD This is a child of a directory involved in a directory/file conflict. Other bits are reserved. Conflict Sides The data about one side of a conflict will contain: mode (ASCII string representation of octal, null-terminated) path (null terminated) sha1 (raw bytes) The conflict sides will be written in this order: Ancestor (if HAS_ANCESTOR is set) Ours (if HAS_OURS is set) Theirs (if HAS_THEIRS is set) I would propose that this not simply track rename conflicts, but all conflicts. Having a single canonical location is preferable - if the index contains a CONF section (and the client supports it), it would use that. Otherwise, the client would look at stage 0 entries. I would propose that another extension, 'RSVD', track these conflicts once they are resolved. The format would be the same - when a conflict is resolved from the CONF the entry will be placed as-is in the RSVD. Examples are not an exhaustive list, but should help elucidate the name and d/f conflicts: Normal edit / edit conflict, where A is edited in ours and theirs: Conflict one: Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS Entry 1 = A [Ancestor] Entry 2 = B [Ancestor] Entry 3 = C [Ancestor] Rename / add conflict, where A is renamed to B in ours and B is added in theirs: Conflict one: Flags = HAS_ANCESTOR|HAS_OURS|NAME_CONFLICT_OURS Entry 1 = A [Ancestor] Entry 2 = B [Ours] Entry 3 = A [Theirs] Conflict two: Flags = HAS_THEIRS|NAME_CONFLICT_THEIRS Entry 1 = File B [Theirs] D/F conflict, where some file A is deleted in theirs, and a directory A is created with file child: Conflict one: Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS|DF_CONFLICT_FILE Entry 1 = A [Ancestor] Entry 2 = A [Ours] Conflict two: Flags = HAS_THEIRS|DF_CONFLICT_CHILD Entry 1 = A/child [Theirs] Thanks for your input on this. -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: Edward Thomson ethom...@microsoft.com writes: I would propose a new extension, 'CONF', to handle conflict data, differing from the stage 0 entries in the index in that this extension tracks the conflicting file across names if the underlying merge engine has support for renames. I made an attempt to keep the entry data similar to other entries in the index. I would propose that entries in the conflict are as follows: Flags Four octets that describe the conflict. Data includes: 0x01 HAS_ANCESTOR There is a file in the common ancestor branch that contributes to this conflict. Its data will follow. 0x02 HAS_OURS There is a file in our branch that contributes to this conflict. Its data will follow. 0x04 HAS_THEIRS There is a file in their branch that contributes to this conflict. Its data will follow. 0x08 NAME_CONFLICT_OURS This item has a path in our branch that overlaps a different item in their branch. (Eg, this conflict represents the our side of a rename/add conflict.) 0x10 NAME_CONFLICT_THEIRS This item has a path in their branch that overlaps a different item in our branch. (Eg, this conflict represents the theirs side of a rename/add conflict.) 0x20 DF_CONFLICT_FILE This is the file involved in a directory/file conflict. 0x40 DF_CONFLICT_CHILD This is a child of a directory involved in a directory/file conflict. Other bits are reserved. Conflict Sides The data about one side of a conflict will contain: mode (ASCII string representation of octal, null-terminated) path (null terminated) sha1 (raw bytes) The conflict sides will be written in this order: Ancestor (if HAS_ANCESTOR is set) Ours (if HAS_OURS is set) Theirs (if HAS_THEIRS is set) Puzzled. Most of the above, except NAME_CONFLICT_{OURS,THEIRS} bits, look totally pointless duplication. Obviously HAS_ANCESTOR / HAS_OURS / HAS_THEIRS is to indicate to a reader whether there is data to be read or not. Similar to how a mode of 0 in the REUC indicates that the rest of the record should not be read.) When you are working with Git, you have to be prepared to read from the datafile like the index that other people (and your previous version) created, and you also have to make sure you do not make what you write out unusable by other people without a good reason. I'm acutely aware that you need to be able to read an index that other people created - that's the problem at hand. git does not produce an index that allows anyone (including itself) to reason about rename conflicts. It doesn't even bother to write high-stage conflict entries in many instances, so you can have an instance where git tells you that a conflict occurred but one of those files is staged anyway, the other is just dirty in the workdir and you can commit immediately thereafter. While obviously it's possible to handle this situation (is file A in conflict? Look in the rename conflict extension. Not there? Okay, look in the index.) That's not exactly elegant. My goal here was to have a single source for conflicts. (by the way CONF sounds as if it is some sort of configuration data). There's only four letters, and not everything's as easy as TREE. REUC, for example, sounds like a donkey, though I suppose it depends on the language in question. -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rename conflicts in the index
When a rename conflict occurs, the information about the conflict is written to stdout and the index is updated as if the conflict were a simpler conflict that did not involve renames. This doesn't give a lot of information to users after the fact - a status of added in ours does not provide a lot of details about what occurred. In reimplementations, we face a similar challenge. Unfortunately, we can't simply print the output to the console, we would like to provide this information back to the caller by some mechanism. My preference would be to return this information in the index - so that the results of unpack trees (if you will) was a single data structure (the index) and so that we could persist this information to disk. I've been considering the idea of an extension that contains more detailed data about conflicts, but I certainly wouldn't want to proceed without discussing this here further. I would propose that we store the data about the file in conflict as it occurred through the renames. For example, in a rename 1-2 conflict where A was renamed to both B and C, you would have a single conflict entry containing the data for A, B and C. This would allow us to provide more detailed information to the user - and allow them to (say) choose a single name to proceed with. Is this something that has value to core git as well? Alternately, is there something particularly stupid about this proposal? Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Rename conflicts in the index
Junio C Hamano [mailto:gits...@pobox.com] wrote: I do not offhand see anything particularly stupid; a new optional index extension section CACHE_EXT_RENAME_CONFLICT might be a good addition. Is one side moves A to B while the other side moves it to C the only case, or is it just an example? Off the top of my head, one side moves A to x while the other side moves B to x/y would also be something we would want to know. I am sure there are other cases that need to be considered. Yes, that was just an example. Certainly I was intending that all conflicts that arose from renames would end up here since one can't really reason why the merge tool created a conflict by looking at the index alone - even knowing the merge tool's similarity algorithms, this would be awfully expensive to piece back together, even if the index did contain non-zero stage entries for all the items that were involved in the conflicts. That said, my rather naive initial thought was that we could repeat *all* conflicts in this area. This would give tools that knew how to understand this the ability to go to a single place for conflict data, rather than producing some merge of high-stage entries that comprise non-rename conflicts and data from the rename conflict area for rename conflicts. Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Merge with staged and unstaged changes
Hi- I've been investigating the cases where merge is allowed to proceed when there are staged changes in the index or unstaged files in the working directory. There are cases where I find the behavior surprising and I hope I can get clarification. There are also two cases that I will report as bugs, where it appears that the unstaged file contents are deleted. For these cases below, please consider the contents of a single path. In the tables below, we will show the contents of a file across each input and output of the merge - consider that we're merging a single file in some branch theirs into the current branch ours and that these two branches have a common ancestor anc. The state of that file in our index and workdir are represented by idx and wd, respectively. Unless otherwise noted, these cases are true for both git-merge-resolve and git-merge-recursive. For completeness and illustration purposes, I'll included the cases where there are no changes staged or unstaged. These succeed, as expected: input result anc ours theirs idx wd merge result idx wd 1 A AB A A take BB B 2 A BA B B take AA A Merge is also expected to proceed if the contents of our branch are the merge result, and there are unstaged changes for that file in the workdir. In this case, the file remains unstaged: input result anc ours theirs idx wd merge result idx wd 3 A BA B C take BB C What was surprising to me was that my merge can proceed if I stage a change that is identical to the merge result. That is, if my merge result would be to take the contents from theirs, then my merge can proceed if I've already staged the same contents: input result anc ours theirs idx wd merge result idx wd 4 A AB B B take BB B 5 A AB B C take BB C This seems unexpected - is there a use-case that this enables or is this accidental? Another surprising result was that if I have deleted a file (and staged the deletion or not) then the merge will proceed and the file in question will be recreated. Consider X to be a missing file: input result anc ours theirs idx wd merge result idx wd 6 A AB A X take BB B 7 A AB X X take BB B I wouldn't have expected a file I deleted to be recreated with the other branch's contents. Is this behavior also intentional? Finally, there are cases when you have staged a deletion of the file and you have unstaged changes in your workdir where the merge will silently delete the unstaged data. If there is a conflict, the xdiff output will overwrite the unstaged file: input result anc ours theirs idx wd merge result idx wd 8 A BC X D conflict X diff3_file And similarly, while git-merge-recursive (only) will also remove my untracked file when there are no changes in our branch but the file was deleted in their branch: input result anc ours theirs idx wd merge result idx wd 9 A AX X B delete file X X I trust the last two cases, where data is lost, are bugs to report, but could I get clarification on the other situations? Thanks- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with staged and unstaged changes
On 2/20/13 2:21 PM, Junio C Hamano gits...@pobox.com wrote: Both are very much on purpose. The integrator may have seen the patch on the list, ran git apply [--index] on it to contemplate on it, and before commiting the result, saw a pull request for a branch that contains the change. The above two allow the pull from such a state to succeed without losing any information. I think we have a similar table in Documentation/technical area that explains these things, by the way. I believe you are referring to trivial-merge.txt which has been exceptionally helpful in understanding what unpack trees does. I appreciate this detailed explanation in providing the why. I also appreciate your explanation of the affect of the workdir, and that makes sense. I would have expected that the default was to presume the workdir files were existent, rather than the other way around, but we can agree that is an implementation detail. My biggest concern, of course, was having the unstaged files in my workdir overwritten or deleted. Thanks again- -ed -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html