Re: Questions about the hash function transition

2018-08-28 Thread Edward Thomson
On Tue, Aug 28, 2018 at 2:50 PM, Ævar Arnfjörð Bjarmason
 wrote:
> If we instead had something like clean/smudge filters:
>
> [extensions]
> objectFilter = sha256-to-sha1
> compatObjectFormat = sha1
> [objectFilter "sha256-to-sha1"]
> clean  = ...
> smudge = ...
>
> We could apply arbitrary transformations on objects through filters
> which would accept/return some simple format requesting them to
> translate such-and-such objects, and would either return object
> names/types under which to store them, or "nothing to do".

If I'm understanding you correctly, then on the libgit2 side, I'm very much
opposed to this proposal.  We never execute commands, nor do I want to start
thinking that we can do so arbitrarily.  We run in environments where that's
a non-starter

At present, in libgit2, users can provide their own mechanism for running
clean/smudge filters.  But hash transformation / compatibility is going to
be a crucial compatibility component.  So this is not something that we
could simply opt out of or require users to implement themselves.

-ed


Re: [RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Edward Thomson
On Sat, Aug 04, 2018 at 08:54:49AM -0700, Junio C Hamano wrote:
> 
> My first reaction was to say that I am not going to take a new
> command written only for bash with full bashism, even if it came
> with docs, tests nor Makefile integration, for Git itself.  Then I
> reconsidered, as not everything related to Git is git-core, and all
> of the above traits are sign of this patch _not_ meant for git-core.

Yes, obviously I was not suggesting that this would be mergeable with
the bashims, as I mentioned in my cover letter.

In any case, it sounds like you're not particularly interested in
this, although I certainly appreciate you taking the time to suggest
improvements despite that.  There's some good feedback there.

Cheers-
-ed


[RFC PATCH 1/1] recover: restoration of deleted worktree files

2018-08-04 Thread Edward Thomson
Introduce git-recover, a simple script to aide in restoration of deleted
worktree files.  This will look for unreachable blobs in the object
database and prompt users to restore them to disk, either interactively
or on the command-line.
---
 git-recover.sh | 311 +
 1 file changed, 311 insertions(+)
 create mode 100755 git-recover.sh

diff --git a/git-recover.sh b/git-recover.sh
new file mode 100755
index 0..651d4116f
--- /dev/null
+++ b/git-recover.sh
@@ -0,0 +1,311 @@
+#!/usr/bin/env bash
+#
+# This program helps recover files in your repository that were deleted
+# from the working tree.
+#
+# Copyright (c) 2017-2018 Edward Thomson.
+
+set -e
+
+IFS=$'\n'
+
+PROGNAME=$(echo "$0" | sed -e 's/.*\///')
+GIT_DIR=$(git rev-parse --git-dir)
+
+DO_RECOVER=0
+DO_FULL=0
+DO_INTERACTIVE=0
+BLOBS=()
+FILENAMES=()
+
+function die_usage {
+   echo "usage: $PROGNAME [-a] [-i] [--full] [ [-f ] ...]" 
>&2
+   exit 1
+}
+
+while [[ $# -gt 0 ]]; do
+   case "$1" in
+   -a|--all)
+   DO_RECOVER=1
+   ;;
+   -i|--interactive)
+   DO_INTERACTIVE=1
+   ;;
+   --full)
+   DO_FULL=1
+   ;;
+   *)
+   if [ "${1:0:1}" == "-" ]; then
+   echo "$PROGNAME: unknown argument: $1" >&2
+   die_usage
+   fi
+   BLOBS+=("$1")
+
+   shift
+   if [ "$1" == "-f" ] || [ "$1" == "--filename" ]; then
+   shift
+   if [ $# == 0 ]; then
+   die_usage
+   fi
+   FILENAMES+=("$1")
+   shift
+   else
+   FILENAMES+=("")
+   fi
+   continue
+   ;;
+   esac
+   shift
+done
+
+if [ ${#BLOBS[@]} != 0 ] && [ $DO_RECOVER == 1 ]; then
+   die_usage
+elif [ ${#BLOBS[@]} != 0 ]; then
+   DO_RECOVER=1
+fi
+
+case "$OSTYPE" in
+   darwin*|freebsd*) IS_BSD=1 ;;
+   *) IS_BSD=0 ;;
+esac
+
+function expand_given_blobs() {
+   for i in "${!BLOBS[@]}"; do
+   ID=$(git rev-parse --verify "${BLOBS[$i]}" 2>/dev/null || true)
+
+   if [ -z "$ID" ]; then
+   echo "$PROGNAME: ${BLOBS[$i]} is not a valid object." 
1>&2
+   exit 1
+   fi
+
+   TYPE=$(git cat-file -t "${ID}" 2>/dev/null || true)
+
+   if [ "$TYPE" != "blob" ]; then
+   echo "$PROGNAME: ${BLOBS[$i]} is not a blob." 1>&2
+   exit
+   fi
+
+   BLOBS[$i]=$ID
+   done
+}
+
+# find all the unreachable blobs
+function find_unreachable() {
+   FULLNESS="--no-full"
+
+   if [ $DO_FULL == 1 ]; then FULLNESS="--full"; fi
+
+   BLOBS=($(git fsck --unreachable --no-reflogs \
+   "${FULLNESS}" --no-progress | sed -ne 's/^unreachable blob 
//p'))
+}
+
+function read_one_file {
+   BLOB=$1
+   FILTER_NAME=$2
+   ARGS=()
+
+   if [ -z "$FILTER_NAME" ]; then
+   ARGS+=("blob")
+   else
+   ARGS+=("--filters" "--path=$FILTER_NAME")
+   fi
+
+   git cat-file "${ARGS[@]}" "$BLOB"
+}
+
+function write_one_file {
+   BLOB=$1
+   FILTER_NAME=$2
+   OUTPUT_NAME=$3
+
+   ABBREV=$(git rev-parse --short "${BLOB}")
+
+   echo -n "Writing $ABBREV: "
+   read_one_file "$BLOB" "$FILTER_NAME" > "$OUTPUT_NAME"
+   echo "$OUTPUT_NAME."
+}
+
+function unique_filename {
+   if [ ! -f "${BLOB}" ]; then
+   echo "$BLOB"
+   else
+   cnt=1
+   while true
+   do
+   fn="${BLOB}~${cnt}"
+   if [ ! -f "${fn}" ]; then
+   echo "${fn}"
+   break
+   fi
+   cnt=$((cnt+1))
+   done
+   fi
+}
+
+function write_recoverable {
+   for i in "${!BLOBS[@]}"; do
+   BLOB=${BLOBS[$i]}
+   FILTER_NAME=${FILENAMES[$i]}
+   OUTPUT_NAME=${FILENAMES[$i]:-$(unique_filename)}
+
+   write_one_file "$BLOB" "$FILTER_NAME" "$OUTPUT_NAME"
+   done
+}
+
+function file_time {
+   if [ $IS_BSD == 1 ]; then
+   stat -f %c "$1"
+   else
+   

[RFC PATCH 0/1] Introduce git-recover

2018-08-04 Thread Edward Thomson
Hello-

I created a simple shell script a while back to help people recover
files that they deleted from their working directory (but had been added
to the repository), which looks for unreachable blobs in the object
database and places them in the working directory (either en masse,
interactively, or via command-line arguments).

This has been available at https://github.com/ethomson/git-recover for
about a year, and in that time, someone has suggested that I propose
this as part of git itself.  So I thought I'd see if there's any
interest in this.

If there is, I'd like to get a sense of the amount of work required to
make this suitable for inclusion.  There are some larger pieces of work
required -- at a minimum, I think this requires:

- Tests -- there are none, which is fine with me but probably less fine
  for inclusion here.
- Documentation -- the current README is below but it will need proper
  documentation that can be rendered into manpages, etc, by the tools.
- Remove bashisms -- there are many.

Again, this may not be particularly interesting, but I thought I'd send
it along in case it is.

Cheers-
-ed

---

git-recover allows you to recover some files that you've accidentally
deleted from your working directory.  It helps you find files that exist
in the repository's object database - because you ran git add - but were
never committed.

Getting Started
---
The simplest way to use git-recover is in interactive mode - simply run
`git-recover -i` and it will show you all the files that you can recover
and prompt you to act.

Using git-recover
-
Running git-recover without any arguments will list all the files (git
"blobs") that were recently orphaned, by their ID.  (Their filename is not 
known.)

You can examine these blobs by running `git show `.  If you
find one that you want to recover, you can provide the ID as the argument
to git-recover.  You can specify the `--filename` option to write the
file out and apply any filters that are set up in the repository.  For
example:

git-recover 38762cf7f55934b34d179ae6a4c80cadccbb7f0a \
--filename shattered.pdf

You can also specify multiple files to recover, each with an optional
output filename:

git-recover 38762c --filename one.txt cafebae --filename bae.txt

If you want to recover _all_ the orphaned blobs in your repository, run
`git-recover --all`.  This will write all the orphaned files to the
current working directory, so it's best to run this inside a temporary
directory beneath your working directory.  For example:

mkdir _tmp && cd _tmp && git-recover --all

By default, git-recover limits itself to recently created orphaned blobs.
If you want to see _all_ orphaned files that have been created in your
repository (but haven't yet been garbage collected), you can run:

git-recover --full

Options
---
git-recover [-a] [-i] [--full] [ [-f ] ...]

-a, --all
Write all orphaned blobs to the current working directory.  Each file will
be named using its 40 character object ID.

-i, --interactive
Display information about each orphaned blob and prompt to recover it.

--full
List or recover all orphaned blobs, even those that are in packfiles.  By 
default, `git-recover` will only look at loose object files, which limits
it to the most recently created files.  Examining packfiles may be slow,
especially in large repositories.


The object ID (or its abbreviation) to recover.  The file will be written to
the current working directory and named using its 40 character object ID,
unless the `-f` option is specified.

-f , --filename 
When specified after an object ID, the file written will use this filename.
In addition, any filters (for example: CRLF conversion or Git-LFS) will be
run according to the `gitattributes` configuration.


Edward Thomson (1):
  recover: restoration of deleted worktree files

 git-recover.sh | 311 +
 1 file changed, 311 insertions(+)
 create mode 100755 git-recover.sh

-- 
2.0.0 (libgit2)



Re: Hash algorithm analysis

2018-07-24 Thread Edward Thomson
On Fri, Jul 20, 2018 at 09:52:20PM +, brian m. carlson wrote:
> 
> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
> 
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

Overall, I prefer SHA-256.

I mentioned this at the contributor summit - so this may have been
captured in the notes.  But if not, when I look at this from the
perspective of my day job at Notorious Big Software Company, we would
prefer SHA-256 due to its performance characteristics and the
availability of hardware acceleration.  We think about git object ids
in a few different ways:

Obviously we use git as a version control system - we have a significant
investment in hosting repositories (for both internal Microsoft teams
and our external customers).  What may be less obvious is that often,
git blob ids are used as fingerprints: on a typical Windows machine,
you don't have the command-line hash functions (md5sum and friends),
but every developer has git installed.  So we end up calculating git
object ids in places within the development pipeline that are beyond the
scope of just version control.  Not to dwell too much on implementation
details, but this is especially advantageous for us in (say) labs where
we can ensure that particular hardware is available to speed this up as
necessary.

Switching gears, if I look at this from the perspective of the libgit2
project, I would also prefer SHA-256 or SHA3 over blake2b.  To support
blake2b, we'd have to include - and support - that code ourselves.  But
to support SHA-256, we would simply use the system's crypto libraries
that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or
SecureTransport).  All of those support SHA-256 and none of them include
support for blake2b.  That means if there's a problem with (say)
OpenSSL's SHA-256 implementation, then it will be fixed by their vendor.
If there's a problem with libb2, then that's now my responsibility.

This is not to suggest that one library is of higher or lower quality
than another.  And surely we would try to use the same blake2b library
that git itself is using to minimize some of this risk (so that at least
we're all in the same boat and can leverage each other's communications
to users) but even then, there will be inevitable drift between our
vendored dependencies and the upstream code.  You can see this in action
in xdiff: git's xdiff has deviated from upstream, and libgit2 has taken
git's and ours has deviated from that.

Cheers-
-ed


Re: [PATCH] checkout files in-place

2018-06-12 Thread Edward Thomson
On Tue, Jun 12, 2018 at 09:13:54AM +0300, Orgad Shaneh wrote:
> Some of my colleagues use an ancient version of Source Insight, which also
> locks files for write.

If that application is locking files for writing (that is to say, it did
not specify the `FILE_SHARE_WRITE` bit in the sharing modes during
`CreateFile`) then this patch would not help.

Applications, generally speaking, should be locking files for write.
It's the default in Win32 and .NET's file open APIs because few
applications are prepared to detect and support a file changing out from
underneath them in the middle of a read.

> It's less important than it was before those fixes, but it is still needed
> for users of Qt Creator 4.6 (previous versions just avoided mmap, 4.7 uses
> mmap only for system headers). Other tools on Windows might as well
> misbehave.

I don't understand what mmap'ing via `CreateFileMapping` has to do with
this.  It takes an existing `HANDLE` that was opened with `CreateFile`,
which is where the sharing mode was supplied.

I would be surprised if there are other tools on Windows that have
specified `FILE_SHARE_WRITE` but not `FILE_SHARE_DELETE`.  Generally
speaking, if you don't care about another process changing a file
underneath you then you should specify both.  If you do then you should
specify neither.

I'm not saying that git shouldn't work around a bug in QT Creator -
that's not my call, though I would be loathe to support this
configuration option in libgit2.  But I am saying that it seems like
this patch doesn't have broad applicability beyond that particular tool.

-ed


Re: [PATCH] checkout files in-place

2018-06-11 Thread Edward Thomson
On Sun, Jun 10, 2018 at 09:44:45PM +0200, Clemens Buchacher wrote:
> 
> It is safe to do this on Linux file systems, even if open file handles
> still exist, because unlink only removes the directory reference to the
> file. On Windows, however, a file cannot be deleted until all handles to
> it are closed. If a file cannot be deleted, its name cannot be reused.

I'm nervous about this proposed change, since it feels like it's
addressing an issue that only exists in QT Creator.

You've accurately described the default semantics in Win32.  A file
cannot be deleted until all handles to it are closed, unless it was
opened with `FILE_SHARE_DELETE` as their sharing mode.  This is not the
default sharing mode in either Win32 or .NET.

However, for your patch to have an effect, all processes with a handle
open must have specified `FILE_SHARE_WRITE`.  This is rather uncommon,
since it's also not included in the default Win32 or .NET sharing mode.
This is because it's uncommon that you would want other processes to
change the data underneath you in between ReadFile() calls.

So your patch will benefit people who have processes that have
`FILE_SHARE_WRITE` set but not `FILE_SHARE_DELETE` set, which I think is
generally an uncommon scenario to want to support.

Generally if you're willing to accept files changing underneath you,
then you probably want to allow them to be deleted, too.  So this feels
like something that's very specific to QT Creator.  Or are there other
IDEs or development tools that use these open semantics that I'm not
aware of?

Cheers-
-ed


Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Edward Thomson
On Mon, Mar 26, 2018 at 2:08 PM, Derrick Stolee  wrote:
> Since most heavily-used tools that didn't spawn Git processes use
> LibGit2 to interact with Git repos, I added Ed Thomson to CC to see
> if libgit2 could ever write these bad header comments.

We added the `sorted` capability to our `packed-refs` header relatively
recently (approximately two months ago, v0.27.0 will be the first release
to include it as of today).  So, at the moment, libgit2 writes:

  "# pack-refs with: peeled fully-peeled sorted "

Prior to this change, libgit2's header was stable for the last five years
as:

  "# pack-refs with: peeled fully-peeled "

And prior to that, we advertised only `peeled`:

  "# pack-refs with: peeled "

Thanks for thinking of us!

-ed


Contributor Summit Dinner

2018-03-07 Thread Edward Thomson
Hello-

Microsoft would like to invite the contributors summit to dinner tonight;
it will take place at 8pm:

  La Tagliatella (Placa Catalunya)
  Ronda de la Universitat, 31

It's conveniently right around the corner from the Beers with Bitbucket
event, in case you're going to that.

Hope to see you there.

Thanks-
-ed


[ANNOUNCE] libgit2 summit: 9 March 2018, Barcelona

2018-01-24 Thread Edward Thomson
The libgit2 community is hosting the libgit2 Summit 2018 on March 9th 2018
in Barcelona.

As peff previously announced, Git Merge 2018 is happening in Barcelona on
March 8th (and the Git Contributor's Summit is planned for March 7.)  We
will follow the Git Merge event and have our summit on March 9th so that
people can attend both summits.

We ask that you please register ahead of time.

https://goo.gl/forms/3HdRBeBWixxQE3rf1

Like the Git Contributor's Summit, we will likely start with a business
update on libgit2 and the overall community.  After that, we will open the
floor for discussions and/or presentations.

We will also provide some time to provide a getting started discussion:
if you think you might want to use or contribute to libgit2, then we'll
let you know how easy it is to get started working with the library.

We also hope that we'll have some time to do some hacking.

If you're a prior contributor to libgit2, you use the library, or you think
that you might be interested, we encourage you to attend.

Looking forward to seeing you in Barcelona.

-ed


Re: git merge-tree: bug report and some feature requests

2018-01-23 Thread Edward Thomson
On Tue, Jan 23, 2018 at 7:08 AM, Josh Bleecher Snyder
 wrote:
> Looking over your list above, at a minimum, libgit2 might not have a
> particularly good way to represent submodule/file or
> submodule/directory conflicts, because is-a-submodule is defined
> external to a git_index_entry.

libgit2 should detect submodule/file and submodule/directory
conflicts.

While, yes, some metadata about the submodule is located outside the
index, you can look at the mode to determine that this _is_ a
submodule.  You should be able to reliably detect submodule/file
conflicts because one will be a regular or executable file (mode
0100644 or 0100755), while the other entry at the same path will
be a gitlink (mode 016).

Similarly, submodule/directory conflict detection works just like
regular file/directory conflict detection.  If some path `foo` is
a submodule (or a regular file) then some path `foo/bar` existing
in the other side of the merge causes a conflict.

> Cataloging or special-casing all possible conflict types does seem
> unwise because of the sheer number of kinds of conflicts.
>
> But the alternative appears to be punting entirely, as libgit2 does,
> and merely providing something akin to three index entries.

Indeed, when I added merge to libgit2, we put the higher-level conflict
analysis into application code because there was not much interest in it
at the time.  I've been meaning to add this to `git_status` in libgit2,
but it's not been a high priority.

> This which
> leaves it unclear what exactly the conflict was, at which point any
> user (read: porcelain developer) will end up having to recreate some
> merge logic to figure out what went wrong. And if merge-tree starts
> doing rename detection, the user might then have to emulate that as
> well.

That's not a good idea.  Trying to figure out what merge did would be
painful at best, and likely impossible, since a rename conflict is
recorded in the main index without any way to piece it together.  eg:

100644 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 1   bar.c
100644 cafec4fecafec4fecafec4fecafec4fecafec4fe 2   bar.c
100644 c4cc188a892898e13927dc4a02e7f68814b874b2 1   foo.c
100644 71f5af150b25e3d2d67ff46759311401036f 2   foo.c
100644 351cfbdd55d656edd2c5c995aae3caafb9ec11fa 3   rename1.c
100644 e407c7d138fb457674c3b114fcf47748169ab0c5 3   rename2.c

This is the main index that results when bar.c has a rename/edit
conflict, and foo.c also has a rename/edit conflict.  One was renamed
to rename1.c and the other to rename2.c.  Trying to determine which is
which _after the fact_ would be regrettable.  Especially since rename
detection is not static - you would need to know the thresholds that
were configured at the time merge was performed and try to replay
the rename detection with that.

libgit2 records a `NAME` section in the index that pairs the rename
detection decisions that it performed so that you can analyze them
and display them after the fact.

-ed


[ANNOUNCE] Git London User Group: 16 January 2018

2018-01-11 Thread Edward Thomson
Git London User Group: 16 January, 2018
===

I'm pleased to announce the formation of the Git London User Group, where
Git users and experts from throughout the UK can get together to share
tips, experience and assistance for using Git successfully.

The first meeting takes place Tuesday, 16 January, 2018 at 19:00.

Agenda
--
Extending Git through Scripting: Charles Bailey

Git is the most popular version control system in use today; it is highly
flexible and supports many different workflows.  One of its strengths is
its openness to scripting.  This talk looks at the basic principles that
support best practice for scripting Git and how to avoid some common
pitfalls.

Building Git Tools with libgit2: Edward Thomson

Edward introduces the libgit2 framework (http://libgit2.org), which is a
portable, implementation of Git as a library.  If you're looking for more
advanced programmatic access to working with Git repositories, libgit2 is
a good option, which is why it's used by many Git servers like GitHub and
VSTS and clients like gmaster and GitKraken.  Edward will introduce libgit2
and some of the language bindings like LibGit2Sharp (for .NET) and Rugged
(for Ruby).

Location

General Assembly.  The Relay Building, 1st floor.
114 Whitechapel High Street London, E1 7PT.

We ask that you please RSVP at http://londongit.org/.

Sponsors

A big thank you to the sponsors of the Git London User Group.  Bloomberg
has been kind enough to sponsor the meeting space for us to use.  Microsoft
has sponsored food for dinner.  And All Things Git (the Podcast about Git)
has sponsored meetup and registration fees.

Thanks also to Henry Kleynhans, the other organizer of the group.

Contact
---
Follow us on Meetup by visiting http://londongit.org/, or on Twitter at
@londongit.

If you're in or around London, we hope that you'll join us next Tuesday!

Sincerely,
Edward Thomson (ethom...@edwardthomson.com)


Re: [PATCH] add: add --chmod=+x / --chmod=-x options

2016-06-07 Thread Edward Thomson
On Wed, Jun 01, 2016 at 09:00:34AM -0700, Junio C Hamano wrote:
> 
> Unlike the "something like this" we saw earlier, this draws the
> boundary of responsibility between the caller and the API at a much
> more sensible place.

This makes sense to me - Junio, are you taking (or have you already
taken) dscho's patch, or would you like me to squash it and resend?

Thanks-
-ed
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] add: add --chmod=+x / --chmod=-x options

2016-05-31 Thread Edward Thomson
The executable bit will not be detected (and therefore will not be
set) for paths in a repository with `core.filemode` set to false,
though the users may still wish to add files as executable for
compatibility with other users who _do_ have `core.filemode`
functionality.  For example, Windows users adding shell scripts may
wish to add them as executable for compatibility with users on
non-Windows.

Although this can be done with a plumbing command
(`git update-index --add --chmod=+x foo`), teaching the `git-add`
command allows users to set a file executable with a command that
they're already familiar with.

Signed-off-by: Edward Thomson <ethom...@edwardthomson.com>
---
 builtin/add.c  | 12 +++-
 cache.h|  2 ++
 read-cache.c   | 11 +--
 t/t3700-add.sh | 30 ++
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 145f06e..44b6c97 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -238,6 +238,8 @@ static int ignore_add_errors, intent_to_add, ignore_missing;
 static int addremove = ADDREMOVE_DEFAULT;
 static int addremove_explicit = -1; /* unspecified */
 
+static char *chmod_arg = NULL;
+
 static int ignore_removal_cb(const struct option *opt, const char *arg, int 
unset)
 {
/* if we are told to ignore, we are not adding removals */
@@ -263,6 +265,7 @@ static struct option builtin_add_options[] = {
OPT_BOOL( 0 , "refresh", _only, N_("don't add, only refresh the 
index")),
OPT_BOOL( 0 , "ignore-errors", _add_errors, N_("just skip files 
which cannot be added because of errors")),
OPT_BOOL( 0 , "ignore-missing", _missing, N_("check if - even 
missing - files are ignored in dry run")),
+   OPT_STRING( 0 , "chmod", _arg, N_("(+/-)x"), N_("override the 
executable bit of the listed files")),
OPT_END(),
 };
 
@@ -336,6 +339,11 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
if (!show_only && ignore_missing)
die(_("Option --ignore-missing can only be used together with 
--dry-run"));
 
+   if (chmod_arg) {
+   if (strcmp(chmod_arg, "-x") && strcmp(chmod_arg, "+x"))
+   die(_("--chmod param must be either -x or +x"));
+   }
+
add_new_files = !take_worktree_changes && !refresh_only;
require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
@@ -346,7 +354,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 (intent_to_add ? ADD_CACHE_INTENT : 0) |
 (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) |
 (!(addremove || take_worktree_changes)
- ? ADD_CACHE_IGNORE_REMOVAL : 0));
+ ? ADD_CACHE_IGNORE_REMOVAL : 0)) |
+(chmod_arg && *chmod_arg == '+' ? ADD_CACHE_FORCE_EXECUTABLE : 
0) |
+(chmod_arg && *chmod_arg == '-' ? 
ADD_CACHE_FORCE_NOTEXECUTABLE : 0);
 
if (require_pathspec && argc == 0) {
fprintf(stderr, _("Nothing specified, nothing added.\n"));
diff --git a/cache.h b/cache.h
index 6049f86..da03cd9 100644
--- a/cache.h
+++ b/cache.h
@@ -581,6 +581,8 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 #define ADD_CACHE_IGNORE_ERRORS4
 #define ADD_CACHE_IGNORE_REMOVAL 8
 #define ADD_CACHE_INTENT 16
+#define ADD_CACHE_FORCE_EXECUTABLE 32
+#define ADD_CACHE_FORCE_NOTEXECUTABLE 64
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(struct index_state *, const char *path, int 
flags);
 extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned 
char *sha1, const char *path, int stage, unsigned int refresh_options);
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..d12d143 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -641,6 +641,8 @@ int add_to_index(struct index_state *istate, const char 
*path, struct stat *st,
int intent_only = flags & ADD_CACHE_INTENT;
int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
+   int force_executable = flags & ADD_CACHE_FORCE_EXECUTABLE;
+   int force_notexecutable = flags & ADD_CACHE_FORCE_NOTEXECUTABLE;
 
if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
return error("%s: can only add regular files, symbolic links or 
git-directories", path);
@@ -659,9 +661,14 @@ int add_to_index(struct index_state *istate, const char 
*path, struct stat *st,
else
ce->ce_flags |= CE_INTENT_TO_ADD;
 
-   if (trust_executable_bit && has_symlinks)
+   if (S_ISREG(st_m

Re: [PATCH] add: add --chmod=+x / --chmod=-x options

2016-05-31 Thread Edward Thomson
On Fri, May 27, 2016 at 11:30:40AM -0700, Junio C Hamano wrote:
> 
> Oh, having said all of that, the comments on the implementation
> still stand.

Certainly; sorry for the delay.  I've squashed in your tests and applied
your recommendations.  Resending the patch momentarily.

Thanks-

-ed
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add: add --chmod=+x / --chmod=-x options

2016-05-26 Thread Edward Thomson
On Wed, May 25, 2016 at 12:36:55AM -0700, Junio C Hamano wrote:
> 
> At the design level, I have a few comments.

Thanks, I will submit a new patch that incorporates your (and dscho's)
comments.

>  * This is about a repository with core.filemode=0; I wonder if
>something for a repository with core.symlinks=0 would also help?
>That is, would it be a big help to users if they can prepare a
>text file that holds symbolic link contents and add it as if it
>were a symlink with "git add", instead of having to run two
>commands, "hash-objects && update-index --cacheinfo"?

I think that this is much less common and - speaking only from personal
experience - nobody has ever asked me how to stage a symlink on a
Windows machine.  I think that this is due to the fact that symlinks on
Windows are basically impossible to use, so people doing cross-platform
development wouldn't even try.

On the other hand, it's quite common for cross-platform teams to use
some scripting language since those do work across platforms, and
Windows users would want to add new scripts as executable for the
benefit of their brethren on platforms with an executable bit.

>  * I am not familiar with life on filesystems with core.filemode=0;
>do files people would want to be able to "add --chmod=+x" share
>common trait that can be expressed with .gitattributes mechanism?

Perhaps...  It would not be things like `*.bat` or `*.exe` - Windows
gets those as executable "for free" and would not care about adding the
execute bit on those files (since they're not executable anywhere else).
It would be items like `*.sh` or `*.rb` that should be executable on
POSIX platforms.

However I do not think that this is a common enough action that it needs
to be made automatic such that when I `git add foo.rb` it is
automatically made executable.  I think that the reduced complexity of
having a single mechanism to control executability (that being the
execute mode in the index or a tree) is preferable to a gitattributes
based mechanism, at least until somebody else makes a cogent argument
that the gitattributes approach would be helpful for them.  :)

Thanks again for the comments, an updated patch is forthcoming.

-ed
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] format_commit_message: honor `color=auto` for `%C(auto)`

2016-05-26 Thread Edward Thomson
On Wed, May 25, 2016 at 05:39:04PM -0500, Jeff King wrote:
> Looks like we didn't have any tests at all for %C(auto). And the tests
> for %C(auto,...) were labeled as %C(auto), making it all the more
> confusing. Perhaps it is worth squashing this in:

Thanks, peff.  Indeed I did squash that into my updated patch.

-ed
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`

2016-05-26 Thread Edward Thomson
git-log(1) documents that when specifying the `%C(auto)` format
placeholder will "turn on auto coloring on the next %placeholders
until the color is switched again."

However, when `%C(auto)` is used, the present implementation will turn
colors on unconditionally (even if the color configuration is turned off
for the current context - for example, `--no-color` was specified or the
color is `auto` and the output is not a tty).

Update `format_commit_one` to examine the current context when a format
string of `%C(auto)` is specified, which ensures that we will not
unconditionally write colors.  This brings that behavior in line with
the behavior of `%C(auto,)`, and allows the user the ability
to specify that color should be displayed only when the output is a
tty.

Additionally, add a test for `%C(auto)` and update the existing tests
for `%C(auto,...)` as they were misidentified as being applicable to
`%C(auto)`.

Signed-off-by: Edward Thomson <ethom...@edwardthomson.com>

Tests from Jeff King.

Signed-off-by: Jeff King <p...@peff.net>
---
 pretty.c   |  2 +-
 t/t6006-rev-list-format.sh | 26 +++---
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/pretty.c b/pretty.c
index 87c4497..c3ec430 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
switch (placeholder[0]) {
case 'C':
if (starts_with(placeholder + 1, "(auto)")) {
-   c->auto_color = 1;
+   c->auto_color = want_color(c->pretty_ctx->color);
return 7; /* consumed 7 bytes, "C(auto)" */
} else {
int ret = parse_color(sb, placeholder, c);
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index b77d4c9..a1dcdb8 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -184,38 +184,38 @@ commit $head1
 foo
 EOF
 
-test_expect_success '%C(auto) does not enable color by default' '
+test_expect_success '%C(auto,...) does not enable color by default' '
git log --format=$AUTO_COLOR -1 >actual &&
has_no_color actual
 '
 
-test_expect_success '%C(auto) enables colors for color.diff' '
+test_expect_success '%C(auto,...) enables colors for color.diff' '
git -c color.diff=always log --format=$AUTO_COLOR -1 >actual &&
has_color actual
 '
 
-test_expect_success '%C(auto) enables colors for color.ui' '
+test_expect_success '%C(auto,...) enables colors for color.ui' '
git -c color.ui=always log --format=$AUTO_COLOR -1 >actual &&
has_color actual
 '
 
-test_expect_success '%C(auto) respects --color' '
+test_expect_success '%C(auto,...) respects --color' '
git log --format=$AUTO_COLOR -1 --color >actual &&
has_color actual
 '
 
-test_expect_success '%C(auto) respects --no-color' '
+test_expect_success '%C(auto,...) respects --no-color' '
git -c color.ui=always log --format=$AUTO_COLOR -1 --no-color >actual &&
has_no_color actual
 '
 
-test_expect_success TTY '%C(auto) respects --color=auto (stdout is tty)' '
+test_expect_success TTY '%C(auto,...) respects --color=auto (stdout is tty)' '
test_terminal env TERM=vt100 \
git log --format=$AUTO_COLOR -1 --color=auto >actual &&
has_color actual
 '
 
-test_expect_success '%C(auto) respects --color=auto (stdout not tty)' '
+test_expect_success '%C(auto,...) respects --color=auto (stdout not tty)' '
(
TERM=vt100 && export TERM &&
git log --format=$AUTO_COLOR -1 --color=auto >actual &&
@@ -223,6 +223,18 @@ test_expect_success '%C(auto) respects --color=auto 
(stdout not tty)' '
)
 '
 
+test_expect_success '%C(auto) respects --color' '
+   git log --color --format="%C(auto)%H" -1 >actual &&
+   printf "\\033[33m%s\\033[m\\n" $(git rev-parse HEAD) >expect &&
+   test_cmp expect actual
+'
+
+test_expect_success '%C(auto) respects --no-color' '
+   git log --no-color --format="%C(auto)%H" -1 >actual &&
+   git rev-parse HEAD >expect &&
+   test_cmp expect actual
+'
+
 iconv -f utf-8 -t $test_encoding > commit-msg <http://vger.kernel.org/majordomo-info.html


[PATCH] add: add --chmod=+x / --chmod=-x options

2016-05-24 Thread Edward Thomson
Users on deficient filesystems that lack an execute bit may still
wish to add files to the repository with the appropriate execute
bit set (or not).  Although this can be done in two steps
(`git add foo && git update-index --chmod=+x foo`), providing the
`--chmod=+x` option to the add command allows users to set a file
executable in a single command that they're already familiar with.

Signed-off-by: Edward Thomson <ethom...@edwardthomson.com>
---
 builtin/add.c  | 18 +-
 cache.h|  2 ++
 read-cache.c   |  6 ++
 t/t3700-add.sh | 19 +++
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index 145f06e..2a9abf7 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -238,6 +238,8 @@ static int ignore_add_errors, intent_to_add, ignore_missing;
 static int addremove = ADDREMOVE_DEFAULT;
 static int addremove_explicit = -1; /* unspecified */
 
+static char should_chmod = 0;
+
 static int ignore_removal_cb(const struct option *opt, const char *arg, int 
unset)
 {
/* if we are told to ignore, we are not adding removals */
@@ -245,6 +247,15 @@ static int ignore_removal_cb(const struct option *opt, 
const char *arg, int unse
return 0;
 }
 
+static int chmod_cb(const struct option *opt, const char *arg, int unset)
+{
+   char *flip = opt->value;
+   if ((arg[0] != '-' && arg[0] != '+') || arg[1] != 'x' || arg[2])
+   return error("option 'chmod' expects \"+x\" or \"-x\"");
+   *flip = arg[0];
+   return 0;
+}
+
 static struct option builtin_add_options[] = {
OPT__DRY_RUN(_only, N_("dry run")),
OPT__VERBOSE(, N_("be verbose")),
@@ -263,6 +274,9 @@ static struct option builtin_add_options[] = {
OPT_BOOL( 0 , "refresh", _only, N_("don't add, only refresh the 
index")),
OPT_BOOL( 0 , "ignore-errors", _add_errors, N_("just skip files 
which cannot be added because of errors")),
OPT_BOOL( 0 , "ignore-missing", _missing, N_("check if - even 
missing - files are ignored in dry run")),
+   { OPTION_CALLBACK, 0, "chmod", _chmod, N_("(+/-)x"),
+ N_("override the executable bit of the listed files"),
+ PARSE_OPT_NONEG | PARSE_OPT_LITERAL_ARGHELP, chmod_cb},
OPT_END(),
 };
 
@@ -346,7 +360,9 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 (intent_to_add ? ADD_CACHE_INTENT : 0) |
 (ignore_add_errors ? ADD_CACHE_IGNORE_ERRORS : 0) |
 (!(addremove || take_worktree_changes)
- ? ADD_CACHE_IGNORE_REMOVAL : 0));
+ ? ADD_CACHE_IGNORE_REMOVAL : 0)) |
+(should_chmod == '+' ? ADD_CACHE_FORCE_EXECUTABLE : 0) |
+(should_chmod == '-' ? ADD_CACHE_FORCE_NOTEXECUTABLE : 0);
 
if (require_pathspec && argc == 0) {
fprintf(stderr, _("Nothing specified, nothing added.\n"));
diff --git a/cache.h b/cache.h
index 6049f86..da03cd9 100644
--- a/cache.h
+++ b/cache.h
@@ -581,6 +581,8 @@ extern int remove_file_from_index(struct index_state *, 
const char *path);
 #define ADD_CACHE_IGNORE_ERRORS4
 #define ADD_CACHE_IGNORE_REMOVAL 8
 #define ADD_CACHE_INTENT 16
+#define ADD_CACHE_FORCE_EXECUTABLE 32
+#define ADD_CACHE_FORCE_NOTEXECUTABLE 64
 extern int add_to_index(struct index_state *, const char *path, struct stat *, 
int flags);
 extern int add_file_to_index(struct index_state *, const char *path, int 
flags);
 extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned 
char *sha1, const char *path, int stage, unsigned int refresh_options);
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..81bf186 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -641,6 +641,8 @@ int add_to_index(struct index_state *istate, const char 
*path, struct stat *st,
int intent_only = flags & ADD_CACHE_INTENT;
int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
+   int force_executable = flags & ADD_CACHE_FORCE_EXECUTABLE;
+   int force_notexecutable = flags & ADD_CACHE_FORCE_NOTEXECUTABLE;
 
if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
return error("%s: can only add regular files, symbolic links or 
git-directories", path);
@@ -661,6 +663,10 @@ int add_to_index(struct index_state *istate, const char 
*path, struct stat *st,
 
if (trust_executable_bit && has_symlinks)
ce->ce_mode = create_ce_mode(st_mode);
+   else if (force_executable)
+   ce->ce_mode = create_ce_mode(0777);
+   else if (force_notexecutable)
+   ce->ce_mode = create_ce_mode(0666);
else {
   

[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`

2016-05-24 Thread Edward Thomson
Check that we are configured to display colors in the given context when
the user specifies a format string of `%C(auto)`.  This brings that
behavior in line with the behavior of `%C(auto,)`, which will
display the given color only when the configuration specifies to do so.

This allows the user the ability to specify that color should be
displayed only when the output is a tty, and to use the default color
for the given context (instead of a hardcoded color value).

Signed-off-by: Edward Thomson <ethom...@edwardthomson.com>
---
 pretty.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pretty.c b/pretty.c
index 87c4497..c3ec430 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
switch (placeholder[0]) {
case 'C':
if (starts_with(placeholder + 1, "(auto)")) {
-   c->auto_color = 1;
+   c->auto_color = want_color(c->pretty_ctx->color);
return 7; /* consumed 7 bytes, "C(auto)" */
} else {
int ret = parse_color(sb, placeholder, c);
-- 
2.6.4 (Apple Git-63)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] format_commit_message: honor `color=auto` for `%C(auto)`

2016-05-24 Thread Edward Thomson
On Tue, May 24, 2016 at 05:55:02PM -0700, Junio C Hamano wrote:
> Looks obviously the right thing to do from a cursory read.
> Missing sign-off?

Yes, apologies for the oversight; will re-send.

Thanks-
-ed
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] format_commit_message: honor `color=auto` for `%C(auto)`

2016-05-24 Thread Edward Thomson
Check that we are configured to display colors in the given context when
the user specifies a format string of `%C(auto)`, instead of always
displaying the default color for the given context.  This makes
`%C(auto)` obey the `color=auto` setting and brings its behavior in line
with the behavior of `%C(auto,)`.

This allows the user the ability to specify that color should be
displayed for a string only when the output is a tty, and to use the
default color for the given context without having to hardcode a
color value.
---
 pretty.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pretty.c b/pretty.c
index 87c4497..c3ec430 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1063,7 +1063,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
switch (placeholder[0]) {
case 'C':
if (starts_with(placeholder + 1, "(auto)")) {
-   c->auto_color = 1;
+   c->auto_color = want_color(c->pretty_ctx->color);
return 7; /* consumed 7 bytes, "C(auto)" */
} else {
int ret = parse_color(sb, placeholder, c);
-- 
2.6.4 (Apple Git-63)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] fsck: do not canonicalize modes in trees we are checking

2014-09-23 Thread Edward Thomson
On Tue, Sep 23, 2014 at 11:47:51AM -0400, Jeff King wrote:
 As far as I can tell, fsck's mode-checking has been totally broken
 basically forever. Which makes me a little nervous to fix it. :)
 linux.git does have some bogus modes, but they are 100664, which is
 specifically ignored here unless fsck --strict is in effect.

I'm in favor of checking the mode in fsck, at least when --strict.  
But I would suggest we be lax (by default) about other likely-to-exist
but strictly invalid modes to prevent peoples previously workable
repositories from being now broken.

I have, for example, encountered 100775 in the wild, and would argue that
like 100644, it should probably not fail unless we are in --strict mode.

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] upload-pack: keep poll(2)'s timeout to -1

2014-08-22 Thread Edward Thomson
Keep poll's timeout at -1 when uploadpack.keepalive = 0, instead of
setting it to -1000, since some pedantic old systems (eg HP-UX) and
the gnulib compat/poll will treat only -1 as the valid value for
an infinite timeout.

Signed-off-by: Edward Thomson ethom...@microsoft.com
---
 upload-pack.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/upload-pack.c b/upload-pack.c
index 01de944..433211a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -167,7 +167,9 @@ static void create_pack_file(void)
if (!pollsize)
break;
 
-   ret = poll(pfd, pollsize, 1000 * keepalive);
+   ret = poll(pfd, pollsize,
+   keepalive  0 ? -1 : 1000 * keepalive);
+
if (ret  0) {
if (errno != EINTR) {
error(poll failed, resuming: %s,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] upload-pack: keep poll(2)'s timeout to -1

2014-08-22 Thread Edward Thomson
On Fri, Aug 22, 2014 at 12:03:34PM -0400, Jeff King wrote:
 
 Yeah, I wasn't thinking we would get negative values from the user (we
 don't document them at all), but we should probably do something
 sensible. Let's just leave it at Ed's patch.

Thanks, both.  Apologies for the dumb question: is there anything
additional that I need to do (repost with your Acked-by, for example)
or is this adequate as-is?

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-04-02 Thread Edward Thomson
Edward Thomson [ethom...@microsoft.com] wrote:
 Junio C Hamano [mailto:gis...@pobox.com] wrote:
   * Path A may have only stage #1, while path B and C has only stage
 #2 and stage #3 (the user would have to notice these three
 correspond to each other, and resolve manually).
 
 You would want to annotate B at stage #2 seems to have been at A
 in the original (similarly for C#3) if you choose to do so.
 
 If we're going to make changes to the way conflicts are recorded in the main
 index, then I would prefer this approach.  It is unambiguous and all data 
 about
 all sides are recorded, including the names that items had in their respective
 branches.

Junio, did you have additional thoughts on this?

What would you like from me to proceed?  If the aforementioned seems
reasonable, I can update Documentation/technical/index-format.txt and
we can iron out the details in that fashion?

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-04-02 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  Junio, did you have additional thoughts on this?
 
 Not at this moment.
 
 I think we have covered the principles (do not unnecessarily duplicate
 information, do not break existing implementations unnecessarily, etc.) 
 already,
 and we know how we want to record one side renamed A to B, the other side
 renamed A to C case, but I do not think the discussion covered all cases yet.

Sorry, I'm not sure what you're asking for - would you just like some more
examples of what this looks like with aforementioned exotic conflict types?
Or are you looking for something more strict - BNF format, for example?

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-04-01 Thread Edward Thomson
Junio C Hamano [mailto:gis...@pobox.com] wrote:
 As long as
 the format will be backward compatible to allow existing users use existing 
 tools
 to deal with cases the existing tools can handle, then that is OK.  I didn't 
 get that
 impression which is where my non starter came from.

I see now.  Thank you for the clarification.  I apologize if I was not clear
about this; indeed, the duplication of data in my proposed extension was
specifically to avoid any compatibility problems amongst clients.

In particular, when we have a rename in ours, edit in theirs conflict, we
store the conflict at the new (ours) path.  If, for example, I rename a-b
in my branch and merge a branch that edits a:

mode hash 1 b
mode hash 2 b
mode hash 3 b

This prohibits us from storing anything else in the theirs side at that
path, so if I were to have added b in their branch in addition to modifying
b, I cannot record it.

I was assuming that any change to this behavior would be a breaking one,
which is where the new section came from.

  * Path A may have only stage #1, while path B and C has only stage
#2 and stage #3 (the user would have to notice these three
correspond to each other, and resolve manually).
 
You would want to annotate B at stage #2 seems to have been at A
in the original (similarly for C#3) if you choose to do so.

If we're going to make changes to the way conflicts are recorded in the
main index, then I would prefer this approach.  It is unambiguous and all
data about all sides are recorded, including the names that items had in
their respective branches.

I would think that this might be a burden on current tools, however.
Now if I rename a-b my just my branch, my conflict will be recorded as:

mode hash 1 a
mode hash 2 b
mode hash 3 a

And current git-status will not look at any rename annotations to know
how to report this.

However, maybe this is not as big a problem as I'm concerned it would be.

  * You can choose to favor our choice, and have path B with three
stages (if we guessed wrong and the user wants to move it to C,
the user can resolve and then git mv the path).

I think this approach suffers from the drawback that the current approach
has, wherein this conflicts when they had path B, also, as noted above.

I think that if you were to put both B and C with all three stages, this
would be problematic for the same reason.

 *1* Instead of a three-way merge that inspects only the endpoints,
 you might get a better rename trail if you looked at the
 histories of both branches.  It would be a lot more expensive
 than the simple three-way, but burning CPU cycles is better than
 burning human neurons.

For the record, I like this approach very much.  It's not something that
libgit2 will be able to tackle in the near future; we're in a sort of
walk-before-you-can-run situation with merge at the moment, as you can
probably see.  But any improvement that avoids burning neurons is a
valuable one.

Thanks-

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-27 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  I would propose that this not simply track rename conflicts, but all
  conflicts.
 
 That is a no starter.

So.  Can you explain to me why this would be a non starter?  Can you suggest
some alternate strategy here?

Maybe there's something I'm fundamentally misunderstanding.  It seems that
at present, git will:

1. Detect rename conflicts when performing a merge (at least,
   git-merge-recursive will, which is the default.)

2. If the rename itself caused a conflict (eg, renamed in one side, added in
   the other) then the merge cannot succeed.

3. The resultant index is written as if renames were not detected, which
   means - at best - records the files that went in to the name conflict
   and git status reports an added in ours conflict, which is a pretty
   disappointing conflict.  Often, though, many of the files will not
   exist at higher stage entries, since without rename detection, they
   would have not been conflicts.  At worst, one side is staged, there are
   no conflicts in the index and the user can commit (and thus lose the
   other side.)

Thus it's not like we could add some extension that merely records the names
that produced the rename conflicts and point them at the higher stage entries
in the index.  That would require that they actually be in the index.

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-27 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 We do not gratuitously break existing implementations.  If no conflict is 
 stored
 as higher-stage index entries in an index that has your index extension, no
 existing implementation can read a conflicted index written by your
 implementation and have users resolve conflicts.

I'm not suggesting that anybody stop writing 0 stage entries.

 When a path originally at A is moved to B on only one branch, and there are
 content-level conflicts between the changes made by one branch (while going
 from A to B) and by the other branch (while keeping it at A), we would end up
 having three stages for path B without any trace of path A.  I do not offhand
 know how much it helps to learn A in such a situation in the real life, but 
 we are
 indeed losing information, and I do not have any problem with an extension 
 that
 records in the index the fact that in the two (of the
 three) commits involved in the merge, the path was at A.

What you've described is true only for a certain class of rename conflicts,
for example the rename/edit conflict you've described above.

It's also true if you were to rename some item 'a' to 'b' in both branches.
But when 'b' is sufficiently dissimilar to become a rewrite, then I end up
with a rename of a-b on one side and deleting a and adding b on the other.
The result is a mysterious added by us conflict:

100644 e2dd530c9f31550a2b0c90773ccde056929d6d66 2   b

Worse yet is if I don't do the rename in my side, but I just add a new b so
that in theirs I've renamed a to b and in mine I have both a and b.  When I
do the merge, I'm told I have conflicts, except that I don't:

100644 08d4f831774aed5d4c6cb496affefd4020dce40c 0   b

The other branch's b is long gone and exists only as a dirty file in the
workdir.

 But people have been successfully using existing versions of Git without that
 information to merge branches with renames, and resolving the content-level
 conflicts.

But you aren't afforded the option to resolve content-level conflicts if you
don't know where the conflict came from.  For example, in a rename 1-2
conflict, we dutifully detect that a was renamed to both b and c and fail,
but that fact is never given to the index.  This conflict could be fed into
a merge tool or, better, automerged, with the user only needing to pick a
path:

100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 1   a
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 3   b
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 2   c

I hate to sound like a broken record here, but without some more data in the
index - anywhere, really - any tool that doesn't have the luxury of emitting
data about what happened to stdout certainly can't infer anything about what
happened in the merge.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-26 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
 
  I would propose that we store the data about the file in conflict as
  it occurred through the renames.  For example, in a rename 1-2
  conflict where A was renamed to both B and C, you would have a single
  conflict entry containing the data for A, B and C.  This would allow
  us to provide more detailed information to the user - and allow them
  to (say) choose a single name to proceed with.
 
  Is this something that has value to core git as well?  Alternately, is
  there something particularly stupid about this proposal?
 
 I do not offhand see anything particularly stupid; a new optional index 
 extension
 section CACHE_EXT_RENAME_CONFLICT might be a good addition.
 
 Is one side moves A to B while the other side moves it to C the only case, 
 or is
 it just an example?  Off the top of my head, one side moves A to x while the
 other side moves B to x/y would also be something we would want to know.  I
 am sure there are other cases that need to be considered.
 
 I do not think we can discuss the design at the concrete level until the 
 proposal
 spells out to cover all interesting cases in order for implementations to 
 agree on
 the common semantics.

Sorry about the delay here:  besides getting busy with some other things,
I wanted both a complete writeup and to have taken a pass at a test
implementation this in libgit2 to make sure seemed like a reasonably sensible
approach.

I would propose a new extension, 'CONF', to handle conflict data, differing
from the stage 0 entries in the index in that this extension tracks the
conflicting file across names if the underlying merge engine has support
for renames.

I made an attempt to keep the entry data similar to other entries in the
index.  I would propose that entries in the conflict are as follows:

Flags
  Four octets that describe the conflict.  Data includes:

  0x01  HAS_ANCESTOR
There is a file in the common ancestor branch that contributes
to this conflict.  Its data will follow.
  0x02  HAS_OURS
There is a file in our branch that contributes to this conflict.
Its data will follow.
  0x04  HAS_THEIRS
There is a file in their branch that contributes to this conflict.
Its data will follow.

  0x08  NAME_CONFLICT_OURS
This item has a path in our branch that overlaps a different
item in their branch.  (Eg, this conflict represents the our
side of a rename/add conflict.)
  0x10  NAME_CONFLICT_THEIRS
This item has a path in their branch that overlaps a different
item in our branch.  (Eg, this conflict represents the theirs
side of a rename/add conflict.)

  0x20  DF_CONFLICT_FILE
This is the file involved in a directory/file conflict.
  0x40  DF_CONFLICT_CHILD
This is a child of a directory involved in a directory/file conflict.

  Other bits are reserved.

Conflict Sides
  The data about one side of a conflict will contain:
  mode (ASCII string representation of octal, null-terminated)
  path (null terminated)
  sha1 (raw bytes)

The conflict sides will be written in this order:
  Ancestor (if HAS_ANCESTOR is set)
  Ours (if HAS_OURS is set)
  Theirs (if HAS_THEIRS is set)

I would propose that this not simply track rename conflicts, but all
conflicts.  Having a single canonical location is preferable - if the index
contains a CONF section (and the client supports it), it would use that.
Otherwise, the client would look at stage 0 entries.

I would propose that another extension, 'RSVD', track these conflicts once
they are resolved.  The format would be the same - when a conflict is
resolved from the CONF the entry will be placed as-is in the RSVD.

Examples are not an exhaustive list, but should help elucidate the name
and d/f conflicts:

Normal edit / edit conflict, where A is edited in ours and theirs:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS
Entry 1 = A [Ancestor]
Entry 2 = B [Ancestor]
Entry 3 = C [Ancestor]

Rename / add conflict, where A is renamed to B in ours and B is added in
theirs:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|NAME_CONFLICT_OURS
Entry 1 = A [Ancestor]
Entry 2 = B [Ours]
Entry 3 = A [Theirs]
  Conflict two:
Flags = HAS_THEIRS|NAME_CONFLICT_THEIRS
Entry 1 = File B [Theirs]

D/F conflict, where some file A is deleted in theirs, and a directory
A is created with file child:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS|DF_CONFLICT_FILE
Entry 1 = A [Ancestor]
Entry 2 = A [Ours]
  Conflict two:
Flags = HAS_THEIRS|DF_CONFLICT_CHILD
Entry 1 = A/child [Theirs]

Thanks for your input on this.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-26 Thread Edward Thomson

Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  I would propose a new extension, 'CONF', to handle conflict data,
  differing from the stage 0 entries in the index in that this
  extension tracks the conflicting file across names if the underlying
  merge engine has support for renames.
 
  I made an attempt to keep the entry data similar to other entries in
  the index.  I would propose that entries in the conflict are as follows:
 
  Flags
Four octets that describe the conflict.  Data includes:
 
0x01  HAS_ANCESTOR
  There is a file in the common ancestor branch that contributes
  to this conflict.  Its data will follow.
0x02  HAS_OURS
  There is a file in our branch that contributes to this conflict.
  Its data will follow.
0x04  HAS_THEIRS
  There is a file in their branch that contributes to this conflict.
  Its data will follow.
 
0x08  NAME_CONFLICT_OURS
  This item has a path in our branch that overlaps a different
  item in their branch.  (Eg, this conflict represents the our
  side of a rename/add conflict.)
0x10  NAME_CONFLICT_THEIRS
  This item has a path in their branch that overlaps a different
  item in our branch.  (Eg, this conflict represents the theirs
  side of a rename/add conflict.)
 
0x20  DF_CONFLICT_FILE
  This is the file involved in a directory/file conflict.
0x40  DF_CONFLICT_CHILD
  This is a child of a directory involved in a directory/file conflict.
 
Other bits are reserved.
 
  Conflict Sides
The data about one side of a conflict will contain:
mode (ASCII string representation of octal, null-terminated)
path (null terminated)
sha1 (raw bytes)
 
  The conflict sides will be written in this order:
Ancestor (if HAS_ANCESTOR is set)
Ours (if HAS_OURS is set)
Theirs (if HAS_THEIRS is set)
 
 Puzzled.  Most of the above, except NAME_CONFLICT_{OURS,THEIRS} bits, look
 totally pointless duplication.

Obviously HAS_ANCESTOR / HAS_OURS / HAS_THEIRS is to indicate to a reader
whether there is data to be read or not.  Similar to how a mode of 0
in the REUC indicates that the rest of the record should not be read.)

 When you are working with Git, you have to be prepared to read from the
 datafile like the index that other people (and your previous
 version) created, and you also have to make sure you do not make what you
 write out unusable by other people without a good reason.

I'm acutely aware that you need to be able to read an index that other
people created - that's the problem at hand.  git does not produce an
index that allows anyone (including itself) to reason about rename
conflicts.  It doesn't even bother to write high-stage conflict entries
in many instances, so you can have an instance where git tells you that
a conflict occurred but one of those files is staged anyway, the other
is just dirty in the workdir and you can commit immediately thereafter.

While obviously it's possible to handle this situation (is file A
in conflict?  Look in the rename conflict extension.  Not there?  Okay,
look in the index.)  That's not exactly elegant.  My goal here was
to have a single source for conflicts.

 (by the way CONF sounds as if it is some sort of configuration data).

There's only four letters, and not everything's as easy as TREE.  REUC,
for example, sounds like a donkey, though I suppose it depends on the
language in question.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rename conflicts in the index

2013-03-13 Thread Edward Thomson
When a rename conflict occurs, the information about the conflict is
written to stdout and the index is updated as if the conflict were a
simpler conflict that did not involve renames.  This doesn't give a lot of
information to users after the fact - a status of added in ours does not
provide a lot of details about what occurred.

In reimplementations, we face a similar challenge.  Unfortunately, we can't
simply print the output to the console, we would like to provide this
information back to the caller by some mechanism.  My preference would be to
return this information in the index - so that the results of unpack trees
(if you will) was a single data structure (the index) and so that we could
persist this information to disk.  I've been considering the idea of an
extension that contains more detailed data about conflicts, but I certainly
wouldn't want to proceed without discussing this here further.

I would propose that we store the data about the file in conflict as it
occurred through the renames.  For example, in a rename 1-2 conflict where
A was renamed to both B and C, you would have a single conflict entry
containing the data for A, B and C.  This would allow us to provide more
detailed information to the user - and allow them to (say) choose a single
name to proceed with.

Is this something that has value to core git as well?  Alternately, is
there something particularly stupid about this proposal?

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-13 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 I do not offhand see anything particularly stupid; a new optional index 
 extension
 section CACHE_EXT_RENAME_CONFLICT might be a good addition.
 
 Is one side moves A to B while the other side moves it to C the only case, 
 or is
 it just an example?  Off the top of my head, one side moves A to x while the
 other side moves B to x/y would also be something we would want to know.  I
 am sure there are other cases that need to be considered.

Yes, that was just an example.  Certainly I was intending that all conflicts
that arose from renames would end up here since one can't really reason
why the merge tool created a conflict by looking at the index alone - even
knowing the merge tool's similarity algorithms, this would be awfully
expensive to piece back together, even if the index did contain non-zero
stage entries for all the items that were involved in the conflicts.

That said, my rather naive initial thought was that we could repeat *all*
conflicts in this area.  This would give tools that knew how to understand
this the ability to go to a single place for conflict data, rather than
producing some merge of high-stage entries that comprise non-rename
conflicts and data from the rename conflict area for rename conflicts.

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Merge with staged and unstaged changes

2013-02-20 Thread Edward Thomson
Hi-

I've been investigating the cases where merge is allowed to proceed when
there are staged changes in the index or unstaged files in the working
directory.  There are cases where I find the behavior surprising and I
hope I can get clarification.  There are also two cases that I will report
as bugs, where it appears that the unstaged file contents are deleted.

For these cases below, please consider the contents of a single path.
In the tables below, we will show the contents of a file across each input
and output of the merge - consider that we're merging a single file in
some branch theirs into the current branch ours and that these two
branches have a common ancestor anc.  The state of that file in our
index and workdir are represented by idx and wd, respectively.
Unless otherwise noted, these cases are true for both git-merge-resolve
and git-merge-recursive.


For completeness and illustration purposes, I'll included the cases where
there are no changes staged or unstaged.  These succeed, as expected:

   input result
   anc ours theirs idx wd  merge result  idx wd
1  A   AB  A   A   take BB   B
2  A   BA  B   B   take AA   A

Merge is also expected to proceed if the contents of our branch are the
merge result, and there are unstaged changes for that file in the workdir.
In this case, the file remains unstaged:

   input result
   anc ours theirs idx wd  merge result  idx wd
3  A   BA  B   C   take BB   C


What was surprising to me was that my merge can proceed if I stage a change
that is identical to the merge result.  That is, if my merge result would
be to take the contents from theirs, then my merge can proceed if I've
already staged the same contents:

   input result
   anc ours theirs idx wd  merge result  idx wd
4  A   AB  B   B   take BB   B
5  A   AB  B   C   take BB   C

This seems unexpected - is there a use-case that this enables or is
this accidental?


Another surprising result was that if I have deleted a file (and staged
the deletion or not) then the merge will proceed and the file in question
will be recreated.  Consider X to be a missing file:

   input result
   anc ours theirs idx wd  merge result  idx wd
6  A   AB  A   X   take BB   B
7  A   AB  X   X   take BB   B

I wouldn't have expected a file I deleted to be recreated with the other
branch's contents.  Is this behavior also intentional?


Finally, there are cases when you have staged a deletion of the file and
you have unstaged changes in your workdir where the merge will silently
delete the unstaged data.  If there is a conflict, the xdiff output will
overwrite the unstaged file:

   input result
   anc ours theirs idx wd  merge result  idx wd
8  A   BC  X   D   conflict  X   diff3_file

And similarly, while git-merge-recursive (only) will also remove my
untracked file when there are no changes in our branch but the file was
deleted in their branch:

   input result
   anc ours theirs idx wd  merge result  idx wd
9  A   AX  X   B   delete file   X   X


I trust the last two cases, where data is lost, are bugs to report, but
could I get clarification on the other situations?

Thanks-

-ed

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with staged and unstaged changes

2013-02-20 Thread Edward Thomson
On 2/20/13 2:21 PM, Junio C Hamano gits...@pobox.com wrote:

Both are very much on purpose. The integrator may have seen the
patch on the list, ran git apply [--index] on it to contemplate on
it, and before commiting the result, saw a pull request for a branch
that contains the change.  The above two allow the pull from such a
state to succeed without losing any information.

I think we have a similar table in Documentation/technical area that
explains these things, by the way.

I believe you are referring to trivial-merge.txt which has been
exceptionally helpful in understanding what unpack trees does.
I appreciate this detailed explanation in providing the why.

I also appreciate your explanation of the affect of the workdir,
and that makes sense.  I would have expected that the default was
to presume the workdir files were existent, rather than the
other way around, but we can agree that is an implementation detail.

My biggest concern, of course, was having the unstaged files in my
workdir overwritten or deleted.

Thanks again-

-ed

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html