Re: [PATCH] diff-tree: obey the color.ui configuration

2017-12-29 Thread Jeff King
On Fri, Dec 29, 2017 at 06:16:31PM -0500, Todd Zullinger wrote:

> Ævar Arnfjörð Bjarmason wrote:
> > No idea how to test this, in particular trying to pipe the output of
> > color.ui=never v.s. color.ui=auto to a file as "auto" will disable
> > coloring when it detects a pipe, but this fixes the issue.
> 
> You might be able to use similar methods as those Jeff used
> in the series merged from jk/ui-color-always-to-auto:
> 
> https://github.com/gitster/git/tree/jk/ui-color-always-to-auto

Yeah, test_terminal is the solution to testing. But...

> He may also have some ideas about this issue in general.
> (Or they could be tramatic memories, depending on how
> painful it was to dig into the color code.)

Yep. If we make diff-tree support color.ui, it's going to break a bunch
of other stuff (like add--interactive) for people who set color.ui=always.
I know this empirically, because we did that in v2.13, and a bunch of
people complained. ;)

The root of the problem is that the plumbing diff-tree defaults its
internal color variable to "auto" in the first place. In theory the best
way forward is fixing that, but it's likely to have a bunch of fallouts
itself (scripts which use plumbing and where the user _does_ want color
will stop showing it). This bug has been around since v1.8.4, I think,
so it's hard to say how many people are depending on it at this point.

A hackier option which would probably make most people happy would be to
have plumbing respect "color.ui=never", but not any other values.

I think the history of the back and forth is:

  - 4c7f1819b3 (make color.ui default to 'auto', 2013-06-10) introduced
the problem of plumbing defaulting to "auto". This was in v1.8.4.

  - we did something similar to Ævar's patch in 136c8c8b8f (color: check
color.ui in git_default_config(), 2017-07-13). That shipped in
v2.14.2, and people with color.ui=always complained, because things
like add--interactive broke for them.

  - we tried fixing it with 6be4595edb (color: make "always" the same as
"auto" in config, 2017-10-03), but that broke people doing "git -c
color.ui=always" as an equivalent of "--color". We talked about
making the "-c" config behave differently from on-disk config, but
got pretty disgusted at the weird hacks. And so...

  - we ended up with 33c643bb08 (Revert "color: check color.ui in
git_default_config()", 2017-10-13), which just reverts the whole
mess back to the pre-v2.14 state. This shipped in v2.15.

So I don't think we want to go down that road again. If anything, we
want to either fix the original sin from 4c7f1819b3, or we want to do
the "respect only never" hack.

-Peff


Re: [PATCH] Add shell completion for git remote rm

2017-12-29 Thread Todd Zullinger
Ævar Arnfjörð Bjarmason wrote:
>> I think adding 'rm' to completion definitely counts as advertisement.
>> It doesn't have much practical use, after all: typing 'rm' with
>> completion is actually one more keystroke than without (rm vs. rm).
> 
> This is only one use of the completion interface, maybe you only use it
> like that, but not everyone does.
> 
> The completion interface has two uses, one is to actually save you
> typing, the other is subcommand discovery, and maybe someone who has a
> use neither you or I have thought of is about to point out a third.
> 
> I'll type "git $whatever $subcommand" as *validation* that I've
> found the right command, not to complete it for me. This is a thing
> that's in my muscle memory for everything.

Is that meant to be in favor of including rm in the
completion results or against? :)

> Since I've been typing "git remote rm" for a while (started before
> this deprecation happened) I've actually been meaning to submit
> completion for "rm" since it works, not knowing about Duy's patch until
> now.
> 
> Now, even if someone disagrees that we should have "rm" at all I think
> that in general we should not conflate two different things, one is
> whether:
> 
> git remote 
> 
> shows both "rm" and "remove" in the list, and the other is whether:
> 
> git remote rm
> 
> Should yield:
> 
> git remote rm
> 
> Or, as now happens:
> 
> git remote rm
> 
> I can see why we'd, in general, we'd like to not advertise certain
> options for completion (due to deprecation, or just to avoid verbosity),
> but complete them once they're unambiguously typed out.
> 
> I don't know whether the bash completion interface supports making that
> distinction, but it would be useful.

It can be done, though I think that it's probably better to
subtly lead people to use 'git remote remove' going forward,
to keep things consistent.  I don't have a strong preference
for or against the rm -> remove change, but since that's
been done I think there's a benefit to keeping things
consistent in the UI.  And I think that should also apply to
not offering completion for commands/subcommands/options
which are only kept for backward compatibility.

Here's one way to make 'git remote rm ' work without
including it in the output of 'git remote ':

diff --git i/contrib/completion/git-completion.bash 
w/contrib/completion/git-completion.bash
index 3683c772c5..aa63f028ab 100644
--- i/contrib/completion/git-completion.bash
+++ w/contrib/completion/git-completion.bash
@@ -2668,7 +2668,9 @@ _git_remote ()
add rename remove set-head set-branches
get-url set-url show prune update
"
-   local subcommand="$(__git_find_on_cmdline "$subcommands")"
+   # Don't advertise rm by including it in subcommands, but complete
+   # remotes if it is used.
+   local subcommand="$(__git_find_on_cmdline "$subcommands rm")"
if [ -z "$subcommand" ]; then
case "$cur" in
--*)

Keeping 'git remote rm' working to avoid breaking scripts is
one thing, but having it in the completion code makes it
more likely that it will continue to be seen as a
recommended subcommand.

This leads to patches like this one, where it's presumed
that the lack of completion is simply an oversight or a bug.
Of course, the lack of completion hasn't caused everyone to
forget that 'remote rm' was changed to 'remote remove', so
that reasoning may be full of hot air (or worse). ;)

The current result of 'git remote rm ' isn't so great.
It's arguably worse to have it pretend that no subcommand
was given than to list the remotes.

$ git remote rm 
addremove set-head   update
get-urlrename set-url
prune  set-branches   show

I think completing nothing or completing the remotes
(without offering rm in the subcommand list) would be
better, after looking at it a bit.

I don't know how to disable file completion, but I'm not
intimately familiar with the git completion script (thanks
to it working so damn well).  I'm guessing there's a way to
do that, if there's a strong desire to not complete the
remotes at all.

I don't think we should include rm in 'git remote '
completion, but I don't care much either way what 'git
remote rm ' includes.  But it should be better than
including the other subcommands.

-- 
Todd
~~
There, I've gone and soiled myself, are you happy now?!
-- Stewie Griffin



possible completion bug with --set-upstream-to=

2017-12-29 Thread Ernesto Alfonso


Whenever I type the last  to complete origin/master, as in below:

> git branch --set-upstream-to=orig

what I get is:

> git branch origin/master

instead of the expected:

> git branch --set-upstream-to=origin/master

git version and OS:

>git version 2.1.4
>
>Distributor ID:Debian
>Description:   Debian GNU/Linux 8.7 (jessie)
>Release:   8.7
>Codename:  jessie
>

Thanks,

Ernesto


Re: [PATCH] Add shell completion for git remote rm

2017-12-29 Thread Ævar Arnfjörð Bjarmason

On Fri, Dec 29 2017, SZEDER Gábor jotted:

>> Keith Smiley wrote:
>> > It looks like that was just about preferring remove in documentation
>> > and the like, I think it would still make sense to have both for
>> > completion since rm is still supported.
>>
>> I read it as a first step in a long process to eventually
>> remove 'remote rm', but if that's never intended, then sure,
>> restoring completion for it seems reasonable.
>>
>> It would be good to hear from those who know or recall the
>> intention.
>>
>> I think we should only complete the preferred subcommand.
>> That encourages use of 'remote remove' even if 'remote rm'
>> will stay forever to avoid breaking existing scripts.
>
> Quoting from the commit message of e17dba8fe1 ("remote: prefer
> subcommand name 'remove' to 'rm'", 2012-09-06):
>
>   'rm' is still supported and used in the test suite. It's just not
>   widely advertised.

I think we made the wrong choice in standardizing on remove instead of
rm, it should be rm for consistency with git-rm, and "remote rename"
should be "remote mv" etc., just like we have git-mv.

Maybe if we didn't have the Unix legacy we'd pick rename and remove to
be consitent for both, but since that's not going to happen this bit of
inconsistency is worse.

Now with that showing of my biases out of the way...

> I think adding 'rm' to completion definitely counts as advertisement.
> It doesn't have much practical use, after all: typing 'rm' with
> completion is actually one more keystroke than without (rm vs. rm).

This is only one use of the completion interface, maybe you only use it
like that, but not everyone does.

The completion interface has two uses, one is to actually save you
typing, the other is subcommand discovery, and maybe someone who has a
use neither you or I have thought of is about to point out a third.

I'll type "git $whatever $subcommand" as *validation* that I've
found the right command, not to complete it for me. This is a thing
that's in my muscle memory for everything.

The post-hoc reason is because if you're a fast typist you don't
actually save time on typing the command (usually I just use reverse
shell search anyway), but rather on not screwing up the command itself
via a typo.

Sometimes commands take a while to fail, and even if it's immediate
re-editing them takes longer than getting it right in the first place.

Since I've been typing "git remote rm" for a while (started before
this deprecation happened) I've actually been meaning to submit
completion for "rm" since it works, not knowing about Duy's patch until
now.

Now, even if someone disagrees that we should have "rm" at all I think
that in general we should not conflate two different things, one is
whether:

git remote 

shows both "rm" and "remove" in the list, and the other is whether:

git remote rm

Should yield:

git remote rm

Or, as now happens:

git remote rm

I can see why we'd, in general, we'd like to not advertise certain
options for completion (due to deprecation, or just to avoid verbosity),
but complete them once they're unambiguously typed out.

I don't know whether the bash completion interface supports making that
distinction, but it would be useful.


Re: [PATCH] Add shell completion for git remote rm

2017-12-29 Thread Keith Smiley
The goal of this fix isn't to complete rm itself (although that is a 
side effect), it's to complete the remote names after you type rm.


Without this patch doing this:

git remote rm 

Attempts to complete the options for `git remote` instead of the remote 
names.


--
Keith Smiley

On 12/29, SZEDER Gábor wrote:

Keith Smiley wrote:
> It looks like that was just about preferring remove in documentation
> and the like, I think it would still make sense to have both for
> completion since rm is still supported.

I read it as a first step in a long process to eventually
remove 'remote rm', but if that's never intended, then sure,
restoring completion for it seems reasonable.

It would be good to hear from those who know or recall the
intention.

I think we should only complete the preferred subcommand.
That encourages use of 'remote remove' even if 'remote rm'
will stay forever to avoid breaking existing scripts.


Quoting from the commit message of e17dba8fe1 ("remote: prefer
subcommand name 'remove' to 'rm'", 2012-09-06):

 'rm' is still supported and used in the test suite. It's just not
 widely advertised.

I think adding 'rm' to completion definitely counts as advertisement.
It doesn't have much practical use, after all: typing 'rm' with
completion is actually one more keystroke than without (rm vs. rm).


Gábor



Re: [PATCH] diff-tree: obey the color.ui configuration

2017-12-29 Thread Todd Zullinger
Ævar Arnfjörð Bjarmason wrote:
> No idea how to test this, in particular trying to pipe the output of
> color.ui=never v.s. color.ui=auto to a file as "auto" will disable
> coloring when it detects a pipe, but this fixes the issue.

You might be able to use similar methods as those Jeff used
in the series merged from jk/ui-color-always-to-auto:

https://github.com/gitster/git/tree/jk/ui-color-always-to-auto

He may also have some ideas about this issue in general.
(Or they could be tramatic memories, depending on how
painful it was to dig into the color code.)

-- 
Todd
~~
Subtlety is the art of saying what you think and getting out of the
way before it is understood.
-- Anonymous



[PATCH] perf: amend the grep tests to test grep.threads

2017-12-29 Thread Ævar Arnfjörð Bjarmason
Ever since 5b594f457a ("Threaded grep", 2010-01-25) the number of
threads git-grep uses under PTHREADS has been hardcoded to 8, but
there's no performance test to check whether this is an optimal
setting.

Amend the existing tests for the grep engines to support a mode where
this can be tested, e.g.:

GIT_PERF_GREP_THREADS='1 8 16' GIT_PERF_LARGE_REPO=~/g/linux ./run p782*

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

This looks less scary under diff -w.

 t/perf/p7820-grep-engines.sh   | 52 ---
 t/perf/p7821-grep-engines-fixed.sh | 55 ++
 2 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh
index 62aba19e76..8b09c5bf32 100755
--- a/t/perf/p7820-grep-engines.sh
+++ b/t/perf/p7820-grep-engines.sh
@@ -12,6 +12,9 @@ e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try:
-vi
-vw
-viw
+
+If GIT_PERF_GREP_THREADS is set to a list of threads (e.g. '1 4 8'
+etc.) we will test the patterns under those numbers of threads.
 "
 
 . ./perf-lib.sh
@@ -19,6 +22,11 @@ e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try:
 test_perf_large_repo
 test_checkout_worktree
 
+if test -n "$GIT_PERF_GREP_THREADS"
+then
+   test_set_prereq PERF_GREP_ENGINES_THREADS
+fi
+
 for pattern in \
'how.to' \
'^how to' \
@@ -39,18 +47,42 @@ do
else
prereq=""
fi
-   test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_OPTS 
'$pattern'" "
-   git -c grep.patternType=$engine 
grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine' || :
-   "
-   done
-
-   test_expect_success "assert that all engines found the same 
for$GIT_PERF_7820_GREP_OPTS '$pattern'" '
-   test_cmp out.basic out.extended &&
-   if test_have_prereq PCRE
+   if ! test_have_prereq PERF_GREP_ENGINES_THREADS
then
-   test_cmp out.basic out.perl
+   test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_OPTS 
'$pattern'" "
+   git -c grep.patternType=$engine 
grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine' || :
+   "
+   else
+   for threads in $GIT_PERF_GREP_THREADS
+   do
+   test_perf PTHREADS,$prereq "$engine 
grep$GIT_PERF_7820_GREP_OPTS '$pattern' with $threads threads" "
+   git -c grep.patternType=$engine -c 
grep.threads=$threads grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' 
>'out.$engine.$threads' || :
+   "
+   done
fi
-   '
+   done
+
+   if ! test_have_prereq PERF_GREP_ENGINES_THREADS
+   then
+   test_expect_success "assert that all engines found the same 
for$GIT_PERF_7820_GREP_OPTS '$pattern'" '
+   test_cmp out.basic out.extended &&
+   if test_have_prereq PCRE
+   then
+   test_cmp out.basic out.perl
+   fi
+   '
+   else
+   for threads in $GIT_PERF_GREP_THREADS
+   do
+   test_expect_success PTHREADS "assert that all engines 
found the same for$GIT_PERF_7820_GREP_OPTS '$pattern' under threading" "
+   test_cmp out.basic.$threads 
out.extended.$threads &&
+   if test_have_prereq PCRE
+   then
+   test_cmp out.basic.$threads 
out.perl.$threads
+   fi
+   "
+   done
+   fi
 done
 
 test_done
diff --git a/t/perf/p7821-grep-engines-fixed.sh 
b/t/perf/p7821-grep-engines-fixed.sh
index c7ef1e198f..61e41b82cf 100755
--- a/t/perf/p7821-grep-engines-fixed.sh
+++ b/t/perf/p7821-grep-engines-fixed.sh
@@ -6,6 +6,9 @@ Set GIT_PERF_7821_GREP_OPTS in the environment to pass options 
to
 git-grep. Make sure to include a leading space,
 e.g. GIT_PERF_7821_GREP_OPTS=' -w'. See p7820-grep-engines.sh for more
 options to try.
+
+If GIT_PERF_7821_THREADS is set to a list of threads (e.g. '1 4 8'
+etc.) we will test the patterns under those numbers of threads.
 "
 
 . ./perf-lib.sh
@@ -13,6 +16,11 @@ options to try.
 test_perf_large_repo
 test_checkout_worktree
 
+if test -n "$GIT_PERF_GREP_THREADS"
+then
+   test_set_prereq PERF_GREP_ENGINES_THREADS
+fi
+
 for pattern in 'int' 'uncommon' 'æ'
 do
for engine in fixed basic extended perl
@@ -23,19 +31,44 @@ do
else
prereq=""
fi
-   test_perf $prereq "$engine grep$GIT_PERF_7821_GREP_OPTS 
$pattern" "
-   git -c 

[PATCH] diff-tree: obey the color.ui configuration

2017-12-29 Thread Ævar Arnfjörð Bjarmason
Before git-bisect exits it calls `diff-tree --pretty --stat $commit`
on the bad commit. This would always print the "commit" line with
coloring despite color.ui being set to "never".

Teach diff-tree to look at the git_color_config() configuration. I
initially tried to add this to git_diff_basic_config itself, but it
makes other unrelated things fail, and this is a more isolated change
that solves the issue.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

No idea how to test this, in particular trying to pipe the output of
color.ui=never v.s. color.ui=auto to a file as "auto" will disable
coloring when it detects a pipe, but this fixes the issue.

 builtin/diff-tree.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index b775a75647..0311c01a87 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -97,6 +97,15 @@ static void diff_tree_tweak_rev(struct rev_info *rev, struct 
setup_revision_opt
}
 }
 
+
+static int diff_tree_config(const char *var, const char *value, void *cb)
+{
+   if (git_color_config(var, value, cb) < 0)
+   return -1;
+
+   return git_diff_basic_config(var, value, cb);
+}
+
 int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 {
char line[1000];
@@ -108,7 +117,7 @@ int cmd_diff_tree(int argc, const char **argv, const char 
*prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(diff_tree_usage);
 
-   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
+   git_config(diff_tree_config, NULL); /* no "diff" UI options */
init_revisions(opt, prefix);
if (read_cache() < 0)
die(_("index file corrupt"));
-- 
2.15.1.424.g9478a66081



Re: [PATCH] Add shell completion for git remote rm

2017-12-29 Thread SZEDER Gábor
> Keith Smiley wrote:
> > It looks like that was just about preferring remove in documentation
> > and the like, I think it would still make sense to have both for
> > completion since rm is still supported.
> 
> I read it as a first step in a long process to eventually
> remove 'remote rm', but if that's never intended, then sure,
> restoring completion for it seems reasonable.
> 
> It would be good to hear from those who know or recall the
> intention.
> 
> I think we should only complete the preferred subcommand.
> That encourages use of 'remote remove' even if 'remote rm'
> will stay forever to avoid breaking existing scripts.

Quoting from the commit message of e17dba8fe1 ("remote: prefer
subcommand name 'remove' to 'rm'", 2012-09-06):

  'rm' is still supported and used in the test suite. It's just not
  widely advertised.

I think adding 'rm' to completion definitely counts as advertisement.
It doesn't have much practical use, after all: typing 'rm' with
completion is actually one more keystroke than without (rm vs. rm).


Gábor



Re: [BUG] git bisect colour output contrary to configuration

2017-12-29 Thread Ævar Arnfjörð Bjarmason

On Fri, Dec 29 2017, zef...@fysh.org jotted:

> My ~/.gitconfig sets color.ui=never, which should prevent attempts
> at colouring output from all git commands.  I do not have any git
> configuration enabling colour in any situation (such as for specific
> commands).  But when a git bisect completes, the output identifying
> the first bad commit includes escape sequences to colour the "commit
> 3e6..." line yellow.  Excerpt of strace output (with many irrelevant
> lines omitted):
>
> 23851 write(1, "3e6fc602e433dbd76941ac0ef7a438a77fbe9a05 is the first bad 
> commit\n", 65) = 65
> 23851 open("/home/zefram/.gitconfig", O_RDONLY) = 3
> 23851 read(3, "[user]\n\tname = Zefram\n\temail = 
> zef...@fysh.org\n\tsigningkey = 0x8E1E1EC1\n\n[color]\n\tui = never\n", 4096) 
> = 93
> 23851 write(1, "\33[33mcommit 
> 3e6fc602e433dbd76941ac0ef7a438a77fbe9a05\33[m\n", 56) = 56
>
> Given the configuration, that line should be free of escape sequences.
>
> I'm mainly using git 2.1.4 via Debian, but I've also
> reproduced this problem with the latest from git.git (commit
> 1eaabe34fc6f486367a176207420378f587d3b48, tagged v2.16.0-rc0).

This issue is a bug, but has nothing do do with bisect per-se, but is a
bug in diff-tree, compare these two:

git -c color.ui=never diff-tree --pretty --stat HEAD
git -c color.ui=never show --pretty --stat HEAD

diff-tree will incorrectly show colored output here despite
ui.color=never.


Re: [PATCH 2/2] travis-ci: record and skip successfully built trees

2017-12-29 Thread SZEDER Gábor
On Fri, Dec 29, 2017 at 9:03 PM, SZEDER Gábor  wrote:

 Or print it in a different color? Maybe red?

 See: https://travis-ci.org/szeder/git/jobs/322247836#L622-L625
>>>
>>> I considered using color for the first line, but then didn't do it,
>>> because I didn't want to decide the color :)
>>> Anyway, red is the general error/failure color, but this is neither.  It
>>> could either be green, to match the color of the build job's result, or
>>> something neutral like blue or yellow.
>>
>> You are right about red. I think I like yellow to express "warning".
>> But this is just a nit.
>
> OK, yellow it will be then.

Oh, hang on!  Have a look:

  https://travis-ci.org/szeder/git/jobs/322841285#L629

That yellow is barely different from default text color.
Bold stands out much better, but notice how Travis CI uses bold for all
its messages.  Therefore I think we should not use bold and leave it
exclusive to Travis CI's messages.

Green looks good:

  https://travis-ci.org/szeder/git/jobs/322372326#L629


Gábor


[BUG] git bisect colour output contrary to configuration

2017-12-29 Thread Zefram
My ~/.gitconfig sets color.ui=never, which should prevent attempts
at colouring output from all git commands.  I do not have any git
configuration enabling colour in any situation (such as for specific
commands).  But when a git bisect completes, the output identifying
the first bad commit includes escape sequences to colour the "commit
3e6..." line yellow.  Excerpt of strace output (with many irrelevant
lines omitted):

23851 write(1, "3e6fc602e433dbd76941ac0ef7a438a77fbe9a05 is the first bad 
commit\n", 65) = 65
23851 open("/home/zefram/.gitconfig", O_RDONLY) = 3
23851 read(3, "[user]\n\tname = Zefram\n\temail = zef...@fysh.org\n\tsigningkey 
= 0x8E1E1EC1\n\n[color]\n\tui = never\n", 4096) = 93
23851 write(1, "\33[33mcommit 3e6fc602e433dbd76941ac0ef7a438a77fbe9a05\33[m\n", 
56) = 56

Given the configuration, that line should be free of escape sequences.

I'm mainly using git 2.1.4 via Debian, but I've also
reproduced this problem with the latest from git.git (commit
1eaabe34fc6f486367a176207420378f587d3b48, tagged v2.16.0-rc0).

-zefram


Re: [PATCH 2/2] travis-ci: record and skip successfully built trees

2017-12-29 Thread SZEDER Gábor
On Thu, Dec 28, 2017 at 12:16 PM, Lars Schneider
 wrote:
>
>> On 28 Dec 2017, at 00:00, SZEDER Gábor  wrote:
>>
>> On Wed, Dec 27, 2017 at 8:15 PM, Lars Schneider
>>  wrote:
>>>
 On 27 Dec 2017, at 17:49, SZEDER Gábor  wrote:
 +# Skip the build job if the same tree has already been built and tested
 +# successfully before (e.g. because the branch got rebased, changing only
 +# the commit messages).
 +skip_good_tree () {
 + if ! good_tree_info="$(grep "^$(git rev-parse $TRAVIS_COMMIT^{tree}) 
 " "$good_trees_file")"
 + then
 + # haven't seen this tree yet; continue the build job
 + return
 + fi
 +
 + echo "$good_tree_info" | {
 + read tree prev_good_commit prev_good_job_number 
 prev_good_job_id
 +
 + if test "$TRAVIS_JOB_ID" =  "$prev_good_job_id"
>>>
>>> Under what circumstances would that be true?
>>
>> When the user hits 'Restart job' on the Travis CI web interface,
>> $TRAVI_JOB_NUMBER and _ID remain the same in the restarted build job as
>> they were in the original.
>> So the condition is true when the user hits 'Restart job' on a build job
>> that was the first to successfully build and test the current tree.
>
> I think I would prefer it if Travis would rerun all jobs if I hit the
> "refresh" button. What is your intention here?

I considered that and don't think it's worth the effort.

First, I think it's a rather rare use case.  I don't know what others
are doing, but I only hit 'Restart job' to restart timeouted OSX build
jobs (and to test this patch :), and that already works with this patch.
I don't really see any reason to restart old successful build jobs,
except perhaps when a new version of one of the dependencies becomes
available (e.g. P4 and Git LFS versions are not hardcoded on OSX, we use
whatever homebrew delivers), to see that an older version still works
with the new dependencies.  Has anyone ever done something like that? :)

Second, we need to know when a build job is run after the user hit
'Restart job'.  Unless I overlooked something, Travis CI doesn't
indicate this.  I'm not sure this is documented explicitly, but it seems
that a restarted build job gets the same $TRAVIS_JOB_{NUMBER,ID}
variables as the original.  We could use this to identify restarted
build jobs, but to do that we would have to save this information at the
end of every successful build, too, and add additional checks to this
function, of course.

 + then
 + cat <<-EOF
 + Skipping build job for commit $TRAVIS_COMMIT.
 + This commit has already been built and tested 
 successfully by this build job.
 + To force a re-build delete the branch's cache and 
 then hit 'Restart job'.
 + EOF
 + else
 + cat <<-EOF
 + Skipping build job for commit $TRAVIS_COMMIT.
 + This commit's tree has already been built and tested 
 successfully in build job $prev_good_job_number for commit 
 $prev_good_commit.
 + The log of that build job is available at 
 https://travis-ci.org/$TRAVIS_REPO_SLUG/jobs/$prev_good_job_id
 + To force a re-build delete the branch's cache and 
 then hit 'Restart job'.
 + EOF
>>>
>>> Maybe add a few newlines before and after EOF to make the text more stand 
>>> out?
>>> Or print it in a different color? Maybe red?
>>>
>>> See: https://travis-ci.org/szeder/git/jobs/322247836#L622-L625
>>
>> I considered using color for the first line, but then didn't do it,
>> because I didn't want to decide the color :)
>> Anyway, red is the general error/failure color, but this is neither.  It
>> could either be green, to match the color of the build job's result, or
>> something neutral like blue or yellow.
>
> You are right about red. I think I like yellow to express "warning".
> But this is just a nit.

OK, yellow it will be then.


> "skip_branch_tip_with_tag" could print its output yellow, too.
>
> - Lars


Re: [PATCH v2 4/5] convert: add support for 'checkout-encoding' attribute

2017-12-29 Thread Torsten Bögershausen
I probably need to look at convert.c more closer, some other comments inline.


On Fri, Dec 29, 2017 at 04:22:21PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).

UTF-8 is too strict, the text from below is more correct:
 +Git recognizes files encoded with ASCII or one of its supersets (e.g.
 +UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
 +interpreted as binary and consequently built-in Git text processing
 +tools (e.g. 'git diff') as well as most Git web front ends do not
 +visualize the content.


> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will

Minor question about "canonical":
Would this mean the same ?
 ...then Git converts the content into  UTF-8.


> reverse the conversion.
> 
> Signed-off-by: Lars Schneider 
> ---
>  Documentation/gitattributes.txt |  59 
>  apply.c |   2 +-
>  blame.c |   2 +-
>  combine-diff.c  |   2 +-
>  convert.c   | 196 ++-
>  convert.h   |   8 +-
>  diff.c  |   2 +-
>  sha1_file.c |   5 +-
>  t/t0028-checkout-encoding.sh| 197 
> 
>  9 files changed, 460 insertions(+), 13 deletions(-)
>  create mode 100755 t/t0028-checkout-encoding.sh
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 30687de81a..0039bd38c3 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -272,6 +272,65 @@ few exceptions.  Even though...
>catch potential problems early, safety triggers.
>  
>  
> +`checkout-encoding`
> +^^^
> +
> +Git recognizes files encoded with ASCII or one of its supersets (e.g.
> +UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
> +interpreted as binary and consequently built-in Git text processing
> +tools (e.g. 'git diff') as well as most Git web front ends do not
> +visualize the content.
> +

> +In these cases you can teach Git the encoding of a file in the working
  teach ? tell ? 
> +directory with the `checkout-encoding` attribute. If a file with this
> +attributes is added to Git, then Git reencodes the content from the
> +specified encoding to UTF-8 and stores the result in its internal data
> +structure.

Minor Q:
> its internal data structure.
Should we simply write "stores the result in the index" ?

> On checkout the content is encoded back to the specified
> +encoding.

> +
> +Please note that using the `checkout-encoding` attribute has a number
> +of drawbacks:
> +
> +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
> +  errors as the conversion might not be round trip safe.
> +
> +- Reencoding content requires resources that might slow down certain
> +  Git operations (e.g 'git checkout' or 'git add').
> +
> +- Git clients that do not support the `checkout-encoding` attribute or
> +  the used encoding will checkout the respective files as UTF-8 encoded.
> +  That means the content appears to be different which could cause
> +  trouble. Affected clients are older Git versions and alternative Git
> +  implementations such as JGit or libgit2 (as of January 2018).
> +
> +Use the `checkout-encoding` attribute only if you cannot store a file in
> +UTF-8 encoding and if you want Git to be able to process the content as
> +text.
> +

I would maybe rephrase a little bit (first things first):

Please note that using the `checkout-encoding` attribute may have a number
of pitfalls:

- Git clients that do not support the `checkout-encoding` attribute
  will checkout the respective files as UTF-8 encoded,  which typically
  causes trouble.
  Escpecialy when older Git versions are used or alternative Git
  implementations such as JGit or libgit2 (as of January 2018).

- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
  errors as the conversion might not be round trip safe.

- Reencoding content requires resources that might slow down certain
  Git operations (e.g 'git checkout' or 'git add').

Use the `checkout-encoding` attribute only if you cannot store a file in
UTF-8 encoding and if you want Git to be able to process the content as
text.

-
Side question: What happens if "the used encoding" is not supported by
the underlying iconv lib ?
Will Git fail, delete the file from the working tree ?
That may be worth to mention. (And I need to do the code-review)

> +Use the 

Re: [ANNOUNCE] Git v2.16.0-rc0

2017-12-29 Thread Paul Smith
On Thu, 2017-12-28 at 20:30 -0800, Junio C Hamano wrote:
>  * The way "git worktree add" determines what branch to create from
>where and checkout in the new worktree has been updated a bit.

Does this include the enhancements published a few weeks ago to allow
worktrees to be created directly from remote branches without first
checking out the branch locally? I'm really looking forward to that
change...

Thanks!


Re: [PATCH v2 1/5] strbuf: add xstrdup_toupper()

2017-12-29 Thread Lars Schneider

> On 29 Dec 2017, at 16:56, Torsten Bögershausen  wrote:
> 
> On Fri, Dec 29, 2017 at 04:22:18PM +0100, lars.schnei...@autodesk.com wrote:
>> From: Lars Schneider 
>> 
>> Create a copy of an existing string and make all characters upper case.
>> Similar xstrdup_tolower().
>> 
>> This function is used in a subsequent commit.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>> strbuf.c | 13 +
>> strbuf.h |  1 +
>> 2 files changed, 14 insertions(+)
>> 
>> diff --git a/strbuf.c b/strbuf.c
>> index 323c49ceb3..54276e96e7 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string)
>>  return result;
>> }
>> 
>> +char *xstrdup_toupper(const char *string)
>> +{
>> +char *result;
>> +size_t len, i;
>> +
>> +len = strlen(string);
>> +result = xmallocz(len);
>> +for (i = 0; i < len; i++)
>> +result[i] = toupper(string[i]);
>> +result[i] = '\0';
>
>   Isn't this already done by xmallocz()

I copied that code from xstrdup_tolower().

The original implementation [1] and its refactored version [2]
used xmalloc(). Later on xmallocz [3] was introduced.

[3] states "we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer."

As far as I understand it, the content should always fill the
buffer in the upper/lower case conversion. Therefore, I agree 
with you that the assignment is not necessary.

- Lars


[1] d4770964d5 (config: "git config --get-urlmatch" parses section..key, 
2013-07-31)
[2] 88d5a6f6cd (daemon/config: factor out duplicate xstrdup_tolower, 2014-05-22)
[3] 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22)



Re: [PATCH] completion: restore 'remote rm'

2017-12-29 Thread Keith Smiley

Updated:

e17dba8fe1 ("remote: prefer subcommand name 'remove' to
'rm'", 2012-09-06) removed the 'rm' subcommand from
completion.  The 'remote rm' subcommand is still supported
and not planned to be removed.  Offer completions for it.

Signed-off-by: Keith Smiley 
---
contrib/completion/git-completion.bash | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3683c772c5586..3e9044087e6ba 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2665,7 +2665,7 @@ _git_config ()
_git_remote ()
{
local subcommands="
-   add rename remove set-head set-branches
+   add rename remove rm set-head set-branches
get-url set-url show prune update
"
local subcommand="$(__git_find_on_cmdline "$subcommands")"

--
Keith Smiley

On 12/29, Todd Zullinger wrote:

Keith Smiley wrote:

It looks like that was just about preferring remove in documentation
and the like, I think it would still make sense to have both for
completion since rm is still supported.


I read it as a first step in a long process to eventually
remove 'remote rm', but if that's never intended, then sure,
restoring completion for it seems reasonable.

It would be good to hear from those who know or recall the
intention.

I think we should only complete the preferred subcommand.
That encourages use of 'remote remove' even if 'remote rm'
will stay forever to avoid breaking existing scripts.

If it does make sense to restore completion, adding a link
back to e17dba8fe1 and explaining why the completion is
being restored would be good.  Reading the commit message
now makes it sound like 'remote rm' was never present and is
being added to correct an oversight.

Maybe something like:

   completion: restore 'remote rm'

   e17dba8fe1 ("remote: prefer subcommand name 'remove' to
   'rm'", 2012-09-06) removed the 'rm' subcommand from
   completion.  The 'remote rm' subcommand is still supported
   and not planned to be removed.  Offer completions for it.

I also noticed that in your original commit that you say
"list of removes as remove does." That should be "remotes"
instead of "removes" there. -- I've made that typo myself
quite often. :)

--
Todd
~~
There are no stupid questions, but there are a LOT of inquisitive
idiots.
   -- Demotivators (www.despair.com)



Re: [PATCH v2 1/5] strbuf: add xstrdup_toupper()

2017-12-29 Thread Torsten Bögershausen
On Fri, Dec 29, 2017 at 04:22:18PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Create a copy of an existing string and make all characters upper case.
> Similar xstrdup_tolower().
> 
> This function is used in a subsequent commit.
> 
> Signed-off-by: Lars Schneider 
> ---
>  strbuf.c | 13 +
>  strbuf.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index 323c49ceb3..54276e96e7 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string)
>   return result;
>  }
>  
> +char *xstrdup_toupper(const char *string)
> +{
> + char *result;
> + size_t len, i;
> +
> + len = strlen(string);
> + result = xmallocz(len);
> + for (i = 0; i < len; i++)
> + result[i] = toupper(string[i]);
> + result[i] = '\0';

Isn't this already done by xmallocz()

> + return result;
> +}
> +
>  char *xstrvfmt(const char *fmt, va_list ap)
>  {
>   struct strbuf buf = STRBUF_INIT;
> diff --git a/strbuf.h b/strbuf.h
> index 0a74acb236..2bc148526f 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -616,6 +616,7 @@ __attribute__((format (printf,2,3)))
>  extern int fprintf_ln(FILE *fp, const char *fmt, ...);
>  
>  char *xstrdup_tolower(const char *);
> +char *xstrdup_toupper(const char *);
>  
>  /**
>   * Create a newly allocated string using printf format. You can do this 
> easily
> -- 
> 2.15.1
> 


[PATCH v2 1/5] strbuf: add xstrdup_toupper()

2017-12-29 Thread lars . schneider
From: Lars Schneider 

Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 13 +
 strbuf.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index 323c49ceb3..54276e96e7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string)
return result;
 }
 
+char *xstrdup_toupper(const char *string)
+{
+   char *result;
+   size_t len, i;
+
+   len = strlen(string);
+   result = xmallocz(len);
+   for (i = 0; i < len; i++)
+   result[i] = toupper(string[i]);
+   result[i] = '\0';
+   return result;
+}
+
 char *xstrvfmt(const char *fmt, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
diff --git a/strbuf.h b/strbuf.h
index 0a74acb236..2bc148526f 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -616,6 +616,7 @@ __attribute__((format (printf,2,3)))
 extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
+char *xstrdup_toupper(const char *);
 
 /**
  * Create a newly allocated string using printf format. You can do this easily
-- 
2.15.1



[PATCH v2 2/5] utf8: add function to detect prohibited UTF-16/32 BOM

2017-12-29 Thread lars . schneider
From: Lars Schneider 

Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#bom10

Signed-off-by: Lars Schneider 
---
 utf8.c | 24 
 utf8.h |  9 +
 2 files changed, 33 insertions(+)

diff --git a/utf8.c b/utf8.c
index 2c27ce0137..776660ee12 100644
--- a/utf8.c
+++ b/utf8.c
@@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz,
 }
 #endif
 
+static int has_bom_prefix(const char *data, size_t len,
+ const char *bom, size_t bom_len)
+{
+   return (len >= bom_len) && !memcmp(data, bom, bom_len);
+}
+
+const char utf16_be_bom[] = {0xFE, 0xFF};
+const char utf16_le_bom[] = {0xFF, 0xFE};
+const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
+const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+ (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) &&
+ (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+  has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+ (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) &&
+ (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+  has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 6bbcf31a83..4711429af9 100644
--- a/utf8.h
+++ b/utf8.h
@@ -70,4 +70,13 @@ typedef enum {
 void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int 
width,
   const char *s);
 
+/*
+ * Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
+ * or UTF-32LE a BOM must not be used [1]. The function returns true if
+ * this is the case.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#bom10
+ */
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.15.1



[PATCH v2 5/5] convert: add tracing for checkout-encoding

2017-12-29 Thread lars . schneider
From: Lars Schneider 

Add the GIT_TRACE_CHECKOUT_ENCODING environment variable to enable
tracing for content that is reencoded with the checkout-encoding
attribute.

Signed-off-by: Lars Schneider 
---
 convert.c| 28 
 t/t0028-checkout-encoding.sh |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/convert.c b/convert.c
index fc8c96b670..ca7b2f3e5c 100644
--- a/convert.c
+++ b/convert.c
@@ -257,6 +257,29 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 
 }
 
+static void trace_encoding(const char *context, const char *path,
+  const char *encoding, const char *buf, size_t len)
+{
+   static struct trace_key coe = TRACE_KEY_INIT(CHECKOUT_ENCODING);
+   struct strbuf trace = STRBUF_INIT;
+   int i;
+
+   strbuf_addf(, "%s (%s, considered %s):\n", context, path, 
encoding);
+   for (i = 0; i < len && buf; ++i) {
+   strbuf_addf(
+   ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c",
+   i,
+   (unsigned char) buf[i],
+   (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '),
+   ((i+1) % 8 && (i+1) < len ? ' ' : '\n')
+   );
+   }
+   strbuf_addchars(, '\n', 1);
+
+   trace_strbuf(, );
+   strbuf_release();
+}
+
 static struct encoding {
const char *name;
struct encoding *next;
@@ -316,6 +339,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
error(error_msg, path, enc->name);
}
 
+   trace_encoding("source", path, enc->name, src, src_len);
dst = reencode_string_len(src, src_len, default_encoding, enc->name,
  _len);
if (!dst) {
@@ -331,6 +355,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
else
error(msg, path, enc->name, default_encoding);
}
+   trace_encoding("destination", path, default_encoding, dst, dst_len);
 
/*
 * UTF supports lossless round tripping [1]. UTF to other encoding are
@@ -356,6 +381,9 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
 enc->name, default_encoding,
 _src_len);
 
+   trace_encoding("reencoded source", path, enc->name,
+  re_src, re_src_len);
+
if (!re_src || src_len != re_src_len ||
memcmp(src, re_src, src_len)) {
const char* msg = _("encoding '%s' from %s to %s and "
diff --git a/t/t0028-checkout-encoding.sh b/t/t0028-checkout-encoding.sh
index 1a329ab933..df3cc91269 100755
--- a/t/t0028-checkout-encoding.sh
+++ b/t/t0028-checkout-encoding.sh
@@ -4,6 +4,8 @@ test_description='checkout-encoding conversion via 
gitattributes'
 
 . ./test-lib.sh
 
+GIT_TRACE_CHECKOUT_ENCODING=1 && export GIT_TRACE_CHECKOUT_ENCODING
+
 test_expect_success 'setup test repo' '
 
text="hallo there!\ncan you read me?" &&
-- 
2.15.1



[PATCH v2 4/5] convert: add support for 'checkout-encoding' attribute

2017-12-29 Thread lars . schneider
From: Lars Schneider 

Git and its tools (e.g. git diff) expect all text files in UTF-8
encoding. Git will happily accept content in all other encodings, too,
but it might not be able to process the text (e.g. viewing diffs or
changing line endings).

Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git converts the
content to a canonical UTF-8 representation. On checkout Git will
reverse the conversion.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt |  59 
 apply.c |   2 +-
 blame.c |   2 +-
 combine-diff.c  |   2 +-
 convert.c   | 196 ++-
 convert.h   |   8 +-
 diff.c  |   2 +-
 sha1_file.c |   5 +-
 t/t0028-checkout-encoding.sh| 197 
 9 files changed, 460 insertions(+), 13 deletions(-)
 create mode 100755 t/t0028-checkout-encoding.sh

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81a..0039bd38c3 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -272,6 +272,65 @@ few exceptions.  Even though...
   catch potential problems early, safety triggers.
 
 
+`checkout-encoding`
+^^^
+
+Git recognizes files encoded with ASCII or one of its supersets (e.g.
+UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
+interpreted as binary and consequently built-in Git text processing
+tools (e.g. 'git diff') as well as most Git web front ends do not
+visualize the content.
+
+In these cases you can teach Git the encoding of a file in the working
+directory with the `checkout-encoding` attribute. If a file with this
+attributes is added to Git, then Git reencodes the content from the
+specified encoding to UTF-8 and stores the result in its internal data
+structure. On checkout the content is encoded back to the specified
+encoding.
+
+Please note that using the `checkout-encoding` attribute has a number
+of drawbacks:
+
+- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
+  errors as the conversion might not be round trip safe.
+
+- Reencoding content requires resources that might slow down certain
+  Git operations (e.g 'git checkout' or 'git add').
+
+- Git clients that do not support the `checkout-encoding` attribute or
+  the used encoding will checkout the respective files as UTF-8 encoded.
+  That means the content appears to be different which could cause
+  trouble. Affected clients are older Git versions and alternative Git
+  implementations such as JGit or libgit2 (as of January 2018).
+
+Use the `checkout-encoding` attribute only if you cannot store a file in
+UTF-8 encoding and if you want Git to be able to process the content as
+text.
+
+Use the following attributes if your '*.txt' files are UTF-16 encoded
+with byte order mark (BOM) and you want Git to perform automatic line
+ending conversion based on your platform.
+
+
+*.txt  text checkout-encoding=UTF-16
+
+
+Use the following attributes if your '*.txt' files are UTF-16 little
+endian encoded without BOM and you want Git to use Windows line endings
+in the working directory.
+
+
+*.txt  checkout-encoding=UTF-16LE text eol=CRLF
+
+
+You can get a list of all available encodings on your platform with the
+following command:
+
+
+iconv --list
+
+
+
 `ident`
 ^^^
 
diff --git a/apply.c b/apply.c
index 321a9fa68d..c4bd5cf1f2 100644
--- a/apply.c
+++ b/apply.c
@@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch 
*patch,
 * should never look at the index when explicit crlf option
 * is given.
 */
-   convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
+   convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf, 
0);
return 0;
default:
return -1;
diff --git a/blame.c b/blame.c
index 2893f3c103..388b66897b 100644
--- a/blame.c
+++ b/blame.c
@@ -229,7 +229,7 @@ static struct commit *fake_working_tree_commit(struct 
diff_options *opt,
if (strbuf_read(, 0, 0) < 0)
die_errno("failed to read from stdin");
}
-   convert_to_git(_index, path, buf.buf, buf.len, , 0);
+   convert_to_git(_index, path, buf.buf, buf.len, , 0, 0);
origin->file.ptr = buf.buf;
origin->file.size = buf.len;
pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_oid.hash);
diff --git a/combine-diff.c b/combine-diff.c
index 2505de119a..4555e49b5f 100644
--- a/combine-diff.c
+++ 

[PATCH v2 3/5] utf8: add function to detect a missing UTF-16/32 BOM

2017-12-29 Thread lars . schneider
From: Lars Schneider 

If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
has_missing_utf_bom() function returns true if a required BOM is
missing.

The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#gen6
[2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
 Section 3.10, D98, page 132
[3] https://encoding.spec.whatwg.org/#utf-16le

Signed-off-by: Lars Schneider 
---
 utf8.c | 13 +
 utf8.h | 16 
 2 files changed, 29 insertions(+)

diff --git a/utf8.c b/utf8.c
index 776660ee12..1978d6c42a 100644
--- a/utf8.c
+++ b/utf8.c
@@ -562,6 +562,19 @@ int has_prohibited_utf_bom(const char *enc, const char 
*data, size_t len)
);
 }
 
+int has_missing_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+  !strcmp(enc, "UTF-16") &&
+  !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+  !strcmp(enc, "UTF-32") &&
+  !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 4711429af9..26b5e91852 100644
--- a/utf8.h
+++ b/utf8.h
@@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type 
position, unsigned int wid
  */
 int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
 
+/*
+ * If the endianness is not defined in the encoding name, then we
+ * require a BOM. The function returns true if a required BOM is missing.
+ *
+ * The Unicode standard instructs to assume big-endian if there
+ * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG
+ * encoding standard used in HTML5 recommends to assume
+ * little-endian to "deal with deployed content" [3].
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#gen6
+ * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
+ * Section 3.10, D98, page 132
+ * [3] https://encoding.spec.whatwg.org/#utf-16le
+ */
+int has_missing_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.15.1



[PATCH v2 0/5] convert: add support for different encodings

2017-12-29 Thread lars . schneider
From: Lars Schneider 

Hi,

notable changes since v1:

* In [1] Peff described a situation where you "couldn't checkout _away_
  from" problems because of "die() in convert_to_git()". I fixed this and
  tried to replicate the situation with the test "error if encoding
  garbage is already in Git". To achieve that I had to pass the
  'write_obj' variable through all the functions to let the encoding
  code know if Git actually writes content. If Git actually writes
  content then we die. Otherwise we just print an error. This ensures
  no garbage is added to Git.

* I renamed the attribute to "checkout-encoding" to avoid trouble with
  the existing "encoding" attribute. I also considered
  "working-tree-encoding". We also discussed "worktree-encoding" as
  attribute name but I am was worried that people could (wrongly)
  associated that with the 'git worktree' feature. I am happy to hear
  arguments for/against any of the different options.

* I read a bit on the Internet and as far as I understand it to/from
  UTF-8 reencodings are mostly round trip safe. Notable exceptions
  are some characters in the SHIFT-JIS character set:
  
https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

  However, I was not able to replicate the issue. Consider this:
  X="\x87\x90"
  printf "$X" | od -h
  printf "$X" | iconv -f SHIFT-JIS -t UTF-8 | iconv -f UTF-8 -t SHIFT-JIS | 
od -h

  ... the reencoding from SHIFT-JIS to UTF-8 would always fail.
  Therefore, I was not able to generate a test case that produces a
  different result after the round trip. Anyone an idea how to do that?

* Patch 1, 2, 3, and 5 are helper functions. Patch 4 is the real change.

Thanks,
Lars

 RFC: 
ttps://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.comh
  v1: 
https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/

Base Ref: master
Web-Diff: https://github.com/larsxschneider/git/commit/e11e375fdb
Checkout: git fetch https://github.com/larsxschneider/git encoding-v2 && git 
checkout e11e375fdb


### Interdiff (v1..v2):

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 84de2fa49c..0039bd38c3 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -146,33 +146,6 @@ Unspecified::
 Any other value causes Git to act as if `text` has been left
 unspecified.

-`encoding`
-^^
-
-By default Git assumes UTF-8 encoding for text files.  The `encoding`
-attribute sets the encoding to be used in the working directory.
-If the path is added to the index, then Git encodes the content to
-UTF-8.  On checkout the content is encoded back to the original
-encoding.  Consequently, you can use all built-in Git text processing
-tools (e.g. git diff, line ending conversions, etc.) with your
-non-UTF-8 encoded file.
-
-Please note that re-encoding content can cause errors and requires
-resources. Use the `encoding` attribute only if you cannot store
-a file in UTF-8 encoding and if you want Git to be able to process
-the text.
-
-
-*.txt  text encoding=UTF-16
-
-
-All `iconv` encodings with a stable round-trip conversion to and from
-UTF-8 are supported.  You can see a full list with the following command:
-
-
-iconv --list
-
-
 `eol`
 ^

@@ -299,6 +272,65 @@ few exceptions.  Even though...
   catch potential problems early, safety triggers.


+`checkout-encoding`
+^^^
+
+Git recognizes files encoded with ASCII or one of its supersets (e.g.
+UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
+interpreted as binary and consequently built-in Git text processing
+tools (e.g. 'git diff') as well as most Git web front ends do not
+visualize the content.
+
+In these cases you can teach Git the encoding of a file in the working
+directory with the `checkout-encoding` attribute. If a file with this
+attributes is added to Git, then Git reencodes the content from the
+specified encoding to UTF-8 and stores the result in its internal data
+structure. On checkout the content is encoded back to the specified
+encoding.
+
+Please note that using the `checkout-encoding` attribute has a number
+of drawbacks:
+
+- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
+  errors as the conversion might not be round trip safe.
+
+- Reencoding content requires resources that might slow down certain
+  Git operations (e.g 'git checkout' or 'git add').
+
+- Git clients that do not support the `checkout-encoding` attribute or
+  the used encoding will checkout the respective files as UTF-8 encoded.
+  That means the content appears to be different which could cause
+  trouble. Affected clients are older Git versions and alternative Git
+  implementations such as JGit or libgit2 (as of January 2018).
+
+Use the `checkout-encoding` 

[PATCH] git-archive: accept --owner and --group like GNU tar

2017-12-29 Thread suzuki toshiya
The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

Signed-off-by: suzuki toshiya 
---
 Documentation/git-archive.txt |  13 +++
 archive-tar.c |   8 +-
 archive.c | 224 ++
 archive.h |   4 +
 t/t5005-archive-uid-gid.sh| 140 ++
 t/t5005/parse-tar-file.py |  60 +++
 tar.h |   2 +
 7 files changed, 447 insertions(+), 4 deletions(-)
 create mode 100755 t/t5005-archive-uid-gid.sh
 create mode 100755 t/t5005/parse-tar-file.py

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index cfa1e4ebe..0d156f6c1 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git archive' [--format=] [--list] [--prefix=/] []
  [-o  | --output=] [--worktree-attributes]
+ [--owner [username[:uid]] [--group [groupname[:gid]]
  [--remote= [--exec=]] 
  [...]
 
@@ -63,6 +64,18 @@ OPTIONS
This can be any options that the archiver backend understands.
See next section.
 
+--owner=[:]::
+   Force  as owner and  as uid for the files in the tar
+   archive.  If  is not supplied,  can be either a user
+   name or numeric UID.  In this case the missing part (UID or
+   name) will be inferred from the current host's user database.
+
+--group=[:]::
+   Force  as group and  as gid for the files in the tar
+   archive.  If  is not supplied,  can be either a group
+   name or numeric GID.  In this case the missing part (GID or
+   name) will be inferred from the current host's group database.
+
 --remote=::
Instead of making a tar archive from the local repository,
retrieve a tar archive from a remote repository. Note that the
diff --git a/archive-tar.c b/archive-tar.c
index c6ed96ee7..ca6471870 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -204,10 +204,10 @@ static void prepare_header(struct archiver_args *args,
xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? 
size : 0);
xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned 
long) args->time);
 
-   xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
-   xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
-   strlcpy(header->uname, "root", sizeof(header->uname));
-   strlcpy(header->gname, "root", sizeof(header->gname));
+   xsnprintf(header->uid, sizeof(header->uid), "%07lo", args->uid);
+   xsnprintf(header->gid, sizeof(header->gid), "%07lo", args->gid);
+   strlcpy(header->uname, args->uname, sizeof(header->uname));
+   strlcpy(header->gname, args->gname, sizeof(header->gname));
xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0);
xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0);
 
diff --git a/archive.c b/archive.c
index 0b7b62af0..aa4b16b75 100644
--- a/archive.c
+++ b/archive.c
@@ -8,6 +8,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "tar.h"
 
 static char const * const archive_usage[] = {
N_("git archive []  [...]"),
@@ -417,6 +418,223 @@ static void parse_treeish_arg(const char **argv,
{ OPTION_SET_INT, (s), NULL, (v), NULL, "", \
  PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) }
 
+/*
+ * GNU tar --owner, --group options reject hexdigit, signed int values.
+ * strtol(), atoi() are too permissive to simulate the behaviour.
+ */
+#define STR_IS_DIGIT_OK 0
+#define STR_IS_NOT_DIGIT -1
+#define STR_IS_DIGIT_TOO_LARGE -2
+
+static int try_as_simple_digit(const char *s, unsigned long *dst)
+{
+   unsigned long ul;
+   char *endptr;
+
+   if (strlen(s) != strspn(s, "0123456789"))
+   return STR_IS_NOT_DIGIT;
+
+   errno = 0;
+   ul = strtoul(s, , 10);
+
+   /* catch ERANGE */
+   if (errno) {
+   errno = 0;
+   return STR_IS_DIGIT_TOO_LARGE;
+   }
+
+#if ULONG_MAX > 0xUL
+   /*
+* --owner, --group rejects uid/gid greater than 32-bit
+* limits, even on 64-bit platforms.
+*/
+   if (ul > 0xUL)
+   return STR_IS_DIGIT_TOO_LARGE;
+#endif
+
+   if (dst)
+   *dst = ul;
+   return STR_IS_DIGIT_OK;
+}
+
+static const char *skip_leading_colon(const char *s)
+{
+   const char *col_pos;
+
+   col_pos = strchr(s, ':');
+   if (!col_pos)
+   return s;
+
+   return (col_pos + 1);
+}
+
+#define STR_IS_NAME_COLON_DIGIT 0
+#define STR_HAS_NO_COLON -1
+#define STR_HAS_DIGIT_BROKEN -2
+#define STR_HAS_DIGIT_TOO_LARGE -3
+
+static int try_as_name_colon_digit(const char *s, const char **dst_s,
+   unsigned 

Re: Rewrite cat-file.c : need help to find a bug

2017-12-29 Thread Оля Тележная
2017-12-29 16:22 GMT+03:00 Jeff King :
> On Fri, Dec 29, 2017 at 01:05:50PM +0300, Оля Тележная wrote:
>
>> Hi everyone,
>> I am trying to reuse formatting logic from ref-filter in cat-file
>> command. Now cat-file uses its own formatting code.
>> I am trying to achieve that step-by-step, now I want to invoke
>> populate_value function, and I have a bug somewhere.
>> My code is here.
>> https://github.com/telezhnaya/git/commits/catfile
>> All commits that starts from "cat-file: " are successfully passing all tests.
>> So for now my last commit fails, and I am tired of searching for the
>> error. Actually, I just copied some values to another variable and
>> move some code to other place. All "intelligent" work will go further,
>> but now I need to fix current situation.
>
> The problem is that "cat_file_info" is NULL when you get to
> populate_value(), so the moved code doesn't trigger.
>

 Thanks a lot!


Re: [PATCH] git-archive: accepts --owner --group aslike GNU tar.

2017-12-29 Thread suzuki toshiya

Dear Junio, Ævar

Thank you very much for your reviews, in spite of my many
overlooking of the requirements written in the documents.

To classify various cases, I modified my patch heavily.
It would be posted soon.

I found that some Python scripts are included in the git
repository, but nothing in t/ directories. A helper written
in Python is not welcomed? If so, I would rewrite the
helper in C (perl's standard library does not have a parse
of tarfile). So, please do not review parse-tar-file.py.

Regards,
mpsuzuki

Junio C Hamano wrote:

suzuki toshiya  writes:


Current tar output by git-archive has always root:root.


Thanks for a patch.  On top of Ævar's comments...


...
* t/t5005-archive-uid-gid.sh: a test script comparing
  uid, gid, uname, gname between the options and
  generated tar file.
---


Before the "---" line, we need to get the patch signed off by the
author (see Documentation/SubmittingPatches).


diff --git a/archive-tar.c b/archive-tar.c
index c6ed96ee7..8546a6229 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -204,10 +204,10 @@ static void prepare_header(struct archiver_args *args,
xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? 
size : 0);
xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) 
args->time);
 
-	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);

-   xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
-   strlcpy(header->uname, "root", sizeof(header->uname));
-   strlcpy(header->gname, "root", sizeof(header->gname));
+   xsnprintf(header->uid, sizeof(header->uid), "%07o", args->uid);
+   xsnprintf(header->gid, sizeof(header->gid), "%07o", args->gid);
+   strlcpy(header->uname, args->uname ? args->uname : "root", 
sizeof(header->uname));
+   strlcpy(header->gname, args->gname ? args->gname : "root", 
sizeof(header->gname));


Would it be cleaner to make sure aregs->[gu]name is always set
(i.e. stuff "root" when it is not given)?


xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0);
xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0);
 
diff --git a/archive.c b/archive.c

index 0b7b62af0..db69041f1 100644
--- a/archive.c
+++ b/archive.c
@@ -8,6 +8,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "git-compat-util.h"


The coding guideline says that "git-compat-util.h" (or one of the
well-known header that includes it) should be the first file to be
included, and we already include "cache.h" as the first thing, so
I do not think you want this addition here.


@@ -417,6 +418,57 @@ static void parse_treeish_arg(const char **argv,
{ OPTION_SET_INT, (s), NULL, (v), NULL, "", \
  PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) }
 
+static void set_args_uname_uid(struct archiver_args *args,

+   const char* tar_owner, int set_gid_too)


The asterisk sticks to the variable name, not the type name, i.e.

const char *tar_owner


+{
+   if (!args || !tar_owner)
+   return;
+
+   const char* col_pos = strchr(tar_owner, ':');
+   struct passwd* pw = NULL;


Decl after statement.


+   if (col_pos) {
+   args->uname = xstrndup(tar_owner, col_pos - tar_owner);
+   args->uid = atoi(col_pos + 1);
+   return;
+   }
+
+   args->uname = xstrndup(tar_owner, strlen(tar_owner));
+   pw = getpwnam(tar_owner);
+   if (!pw)
+   return;


This means that upon error, the caller gets a half-filled args
structure and without any indication.


diff --git a/t/parse-tar-file.py b/t/parse-tar-file.py


Hmph.  Do we still use Python around here?





Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-29 Thread Lars Schneider

> On 29 Dec 2017, at 13:59, Torsten Bögershausen  wrote:
> 
> On 2017-12-28 17:14, Lars Schneider wrote:> 
>>> On 17 Dec 2017, at 18:14, Torsten Bögershausen  wrote:
>>> 
>>> On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote:
 From: Lars Schneider 
 
>> 
 +`encoding`
 +^^
 +
 +By default Git assumes UTF-8 encoding for text files.  The `encoding`
>>> 
>>> This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8.
>> 
>> I am not sure I understand what you want to tell me here.
>> Do you think UTF-8 encoding is not correct in the sentence above?
> 
> Git itself was designed to handle source code in ASCII.
> Text files which are encoded in ISO-8859-1, UTF-8 or any
> super-set of ASCII are handled as well, and what exactly to do
> with bytes >0x80 is outside the responsibility of Git.
> E.g. the interpretation and rendering on the screen may be
> dependent on the locale or being guessed.
> All in all, saying that Git expects UTF-8 may be "overdriven".
> Any kind of file that uses an '\n' as an end of line
> and has no NUL bytes '\0' works as a text file in Git.
> (End of BlaBla ;-)
> 
> We can probably replace:
> "By default Git assumes UTF-8 encoding for text files"
> 
> with something like
> "Git handles UTF-8 encoded files as text files, but UTF-16 encoded
> files are handled as binary files"

Well, UTF-32 are handled as binary file, too :-)

How about this?

Git recognizes files encoded with ASCII or one of its supersets
(e.g. UTF-8 or ISO-8859-1) as text files.  All other...


> 
>>> 
 +attribute sets the encoding to be used in the working directory.
 +If the path is added to the index, then Git encodes the content to
 +UTF-8.  On checkout the content is encoded back to the original
 +encoding.  Consequently, you can use all built-in Git text processing
 +tools (e.g. git diff, line ending conversions, etc.) with your
 +non-UTF-8 encoded file.
 +
 +Please note that re-encoding content can cause errors and requires
 +resources. Use the `encoding` attribute only if you cannot store
 +a file in UTF-8 encoding and if you want Git to be able to process
 +the text.
 +
 +
 +*.txt text encoding=UTF-16
 +
>>> 
>>> I think that encoding (or worktree-encoding as said elsewhere) must imply 
>>> text.
>>> (That is not done in convert.c, but assuming binary or even auto
>>> does not make much sense to me)
>> 
>> "text" only means "LF". "-text" means CRLF which would be perfectly fine
>> for UTF-16. Therefore I don't think we need to imply text.
>> Does this make sense?
> Yes and now.
> 
> "text" means convert CRLF at "git add" (or commit) into LF,
> And potentially LF into CRLF at checkout,
> depending on the EOL attribute (if set), core.autocrlf and/or core.eol.
> 
> "-text" means don't touch CRLF or LF at all. Files with CRLF are commited
> with CRLF.
> Running a Unix like "diff" tool will often show "^M" to indicate that there
> is a CR before the LF, which may be distracting or confusing.
> 
> If someone ever wants to handle the UTF-16 files on a Linux/Mac/Unix system,
> it makes sense to convert the line endings into LF before storing them
> into the index (at least this is my experience).
> 
> (Not specifying "text" or "-text" at all will trigger the auto-handling,
> which is not needed at all).
> So if we want to be sure that line endings are CRLF in the worktree we
> should ask the user to specify things like this:
> 
> *.ext worktree-encoding=UTF-16 text eol=CRLF
> 
> It may be enough to give this example in the documentation.
> and I would rather be over-careful here, to avoid future problems.

Agreed. I'll add that example!


 +
 +All `iconv` encodings with a stable round-trip conversion to and from
 +UTF-8 are supported.  You can see a full list with the following command:
 +
 +
 +iconv --list
 +
 +
 `eol`
 ^
 
 diff --git a/convert.c b/convert.c
 index 20d7ab67bd..ee19c17104 100644
 --- a/convert.c
 +++ b/convert.c
 @@ -7,6 +7,7 @@
 #include "sigchain.h"
 #include "pkt-line.h"
 #include "sub-process.h"
 +#include "utf8.h"
 
 /*
 * convert.c - convert a file when checking it out and checking it in.
 @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, 
 struct text_stat *stats,
 
 }
>>> 
>>> I would avoid to use these #ifdefs here.
>>> All functions can be in strbuf.c (and may have #ifdefs there), but not here.
>> 
>> I'll try that. But wouldn't it make more sense to move the functions to 
>> utf.c?
> 
> May be.
> Originally utf8.c was about encoding and all kind of UTF-8 related stuff.
> Especially it didn't know anything about strbuf.
> I don't know why strbuf.h and other functions 

Re: [PATCH] Add shell completion for git remote rm

2017-12-29 Thread Todd Zullinger
Keith Smiley wrote:
> It looks like that was just about preferring remove in documentation
> and the like, I think it would still make sense to have both for
> completion since rm is still supported.

I read it as a first step in a long process to eventually
remove 'remote rm', but if that's never intended, then sure,
restoring completion for it seems reasonable.

It would be good to hear from those who know or recall the
intention.

I think we should only complete the preferred subcommand.
That encourages use of 'remote remove' even if 'remote rm'
will stay forever to avoid breaking existing scripts.

If it does make sense to restore completion, adding a link
back to e17dba8fe1 and explaining why the completion is
being restored would be good.  Reading the commit message
now makes it sound like 'remote rm' was never present and is
being added to correct an oversight.

Maybe something like:

completion: restore 'remote rm'

e17dba8fe1 ("remote: prefer subcommand name 'remove' to
'rm'", 2012-09-06) removed the 'rm' subcommand from
completion.  The 'remote rm' subcommand is still supported
and not planned to be removed.  Offer completions for it.

I also noticed that in your original commit that you say
"list of removes as remove does." That should be "remotes"
instead of "removes" there. -- I've made that typo myself
quite often. :)

-- 
Todd
~~
There are no stupid questions, but there are a LOT of inquisitive
idiots.
-- Demotivators (www.despair.com)



[PATCH/RFC 0/2] git diff --UTF-8

2017-12-29 Thread tboegi
From: Torsten Bögershausen 

RFC patch: convert files from e.g. UTF-16 into UTF-8 while running
"git diff".
The diff must be called with "git diff --UTF-8" and the "encoding"
attribute must be set for the file(s).

The commit messages may need some improvements, and a closer look
at diff.c, how command line options are forwared, is appreciated.

It may even be possible to integrate t4066 somewhere...

Torsten Bögershausen (2):
  convert_to_git(): checksafe becomes an integer
  git diff: Allow to reencode into UTF-8

 Documentation/diff-options.txt  |  4 ++
 Documentation/gitattributes.txt |  9 +
 apply.c |  4 +-
 convert.c   | 60 +++-
 convert.h   | 20 +-
 diff.c  | 40 +--
 diff.h  |  1 +
 diffcore.h  |  3 ++
 environment.c   |  2 +-
 sha1_file.c |  6 +--
 t/t4066-diff-encoding.sh| 86 +
 11 files changed, 205 insertions(+), 30 deletions(-)
 create mode 100755 t/t4066-diff-encoding.sh

-- 
2.15.1.271.g1a4e40aa5d



[PATCH/RFC 2/2] git diff: Allow to reencode into UTF-8

2017-12-29 Thread tboegi
From: Torsten Bögershausen 

When blobs are encoded in UTF-16, `git diff` will treat them as binary.
Make it possible to show a user readable diff encoded in UTF-8.
This allows to run git diff and feed the into a web sever.

Improve Git to look at the "encodig" attribute and to reencode the
content into UTF-8 before running the diff itself.

Signed-off-by: Torsten Bögershausen 
---
 Documentation/diff-options.txt  |  4 ++
 Documentation/gitattributes.txt |  9 +
 convert.c   | 40 +++
 convert.h   |  2 +
 diff.c  | 38 --
 diff.h  |  1 +
 diffcore.h  |  3 ++
 t/t4066-diff-encoding.sh| 86 +
 8 files changed, 180 insertions(+), 3 deletions(-)
 create mode 100755 t/t4066-diff-encoding.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 9d1586b956..bf2f115f11 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -629,6 +629,10 @@ endif::git-format-patch[]
linkgit:git-log[1], but not for linkgit:git-format-patch[1] or
diff plumbing commands.
 
+--UTF-8::
+   Git converts the content into UTF-8 before running the diff when the
+   "encoding" attribute is defined. See linkgit:gitattributes[5]
+
 --ignore-submodules[=]::
Ignore changes to submodules in the diff generation.  can be
either "none", "untracked", "dirty" or "all", which is the default.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81a..753a7c39b7 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -881,6 +881,15 @@ advantages to choosing this method:
 3. Caching. Textconv caching can speed up repeated diffs, such as those
you might trigger by running `git log -p`.
 
+Running diff on UTF-16 encoded files
+
+
+Git can convert UTF-16 encoded into UTF-8 before they are feed
+into the diff machinery: `diff --UTF-8 file.xxx`.
+
+
+file.xxx encoding=UTF-16
+
 
 Marking files as binary
 ^^^
diff --git a/convert.c b/convert.c
index 5efcc3b73b..45577ce504 100644
--- a/convert.c
+++ b/convert.c
@@ -7,6 +7,7 @@
 #include "sigchain.h"
 #include "pkt-line.h"
 #include "sub-process.h"
+#include "utf8.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -734,6 +735,34 @@ static struct convert_driver {
int required;
 } *user_convert, **user_convert_tail;
 
+const char *get_encoding_attr(const char *path)
+{
+   static struct attr_check *check;
+   if (!check)
+   check = attr_check_initl("encoding", NULL);
+   if (!git_check_attr(path, check)) {
+   struct attr_check_item *ccheck = check->items;
+   const char *value;
+   value = ccheck->value;
+   if (ATTR_UNSET(value))
+   return NULL;
+   return value;
+   }
+   return NULL;
+}
+
+static int reencode_into_strbuf(const char *path, const char *src, size_t len,
+   struct strbuf *dst, const char *encoding)
+{
+   int outsz = 0;
+   char *buf;
+   buf = reencode_string_len(src, (int)len, "UTF-8", encoding, );
+   if (!buf)
+   return 0;
+   strbuf_attach(dst, buf, outsz, outsz);
+   return SAFE_CRLF_REENCODE;
+}
+
 static int apply_filter(const char *path, const char *src, size_t len,
int fd, struct strbuf *dst, struct convert_driver *drv,
const unsigned int wanted_capability,
@@ -1136,6 +1165,17 @@ int convert_to_git(const struct index_state *istate,
 
convert_attrs(, path);
 
+   if (checksafe & SAFE_CRLF_REENCODE) {
+   const char *encoding = get_encoding_attr(path);
+   if (encoding) {
+   ret |= reencode_into_strbuf(path, src, len, dst,
+   encoding);
+   if (ret && dst) {
+   src = dst->buf;
+   len = dst->len;
+   }
+   }
+   }
ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL);
if (!ret && ca.drv && ca.drv->required)
die("%s: clean filter '%s' failed", path, ca.drv->name);
diff --git a/convert.h b/convert.h
index 532af00423..0b093715c9 100644
--- a/convert.h
+++ b/convert.h
@@ -13,6 +13,7 @@ struct index_state;
 #define SAFE_CRLF_WARN(1<<1)
 #define SAFE_CRLF_RENORMALIZE (1<<2)
 #define SAFE_CRLF_KEEP_CRLF   (1<<3)
+#define SAFE_CRLF_REENCODE(1<<4)
 
 extern int safe_crlf;
 
@@ -60,6 +61,7 @@ extern const char *get_cached_convert_stats_ascii(const 
struct index_state *ista
 

[PATCH/RFC 1/2] convert_to_git(): checksafe becomes an integer

2017-12-29 Thread tboegi
From: Torsten Bögershausen 

When calling convert_to_git(), the checksafe parameter has been used to
check if commit would give a non-roundtrip conversion of EOL.

When checksafe was introduced, 3 values had been in use:
SAFE_CRLF_FALSE: no warning
SAFE_CRLF_FAIL:  reject the commit if EOL do not roundtrip
SAFE_CRLF_WARN:  warn the user if EOL do not roundtrip

Today a small flaw is found in the code base:
An integer with the value 0 is passed as the parameter checksafe
instead of the correct enum value SAFE_CRLF_FALSE.

In the next commit there is a need to turn checksafe into a bitmap, which
allows to tell convert_to_git() to obey the encoding attribute or not.

Signed-off-by: Torsten Bögershausen 
---
 apply.c   |  4 ++--
 convert.c | 20 ++--
 convert.h | 18 --
 diff.c|  4 ++--
 environment.c |  2 +-
 sha1_file.c   |  6 +++---
 6 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/apply.c b/apply.c
index 321a9fa68d..a422516062 100644
--- a/apply.c
+++ b/apply.c
@@ -2263,7 +2263,7 @@ static void show_stats(struct apply_state *state, struct 
patch *patch)
 static int read_old_data(struct stat *st, struct patch *patch,
 const char *path, struct strbuf *buf)
 {
-   enum safe_crlf safe_crlf = patch->crlf_in_old ?
+   int checksafe = patch->crlf_in_old ?
SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
switch (st->st_mode & S_IFMT) {
case S_IFLNK:
@@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch 
*patch,
 * should never look at the index when explicit crlf option
 * is given.
 */
-   convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
+   convert_to_git(NULL, path, buf->buf, buf->len, buf, checksafe);
return 0;
default:
return -1;
diff --git a/convert.c b/convert.c
index 1a41a48e15..5efcc3b73b 100644
--- a/convert.c
+++ b/convert.c
@@ -195,13 +195,13 @@ static enum eol output_eol(enum crlf_action crlf_action)
 
 static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
struct text_stat *old_stats, struct text_stat 
*new_stats,
-   enum safe_crlf checksafe)
+   int checksafe)
 {
if (old_stats->crlf && !new_stats->crlf ) {
/*
 * CRLFs would not be restored by checkout
 */
-   if (checksafe == SAFE_CRLF_WARN)
+   if (checksafe & SAFE_CRLF_WARN)
warning(_("CRLF will be replaced by LF in %s.\n"
  "The file will have its original line"
  " endings in your working directory."), path);
@@ -211,7 +211,7 @@ static void check_safe_crlf(const char *path, enum 
crlf_action crlf_action,
/*
 * CRLFs would be added by checkout
 */
-   if (checksafe == SAFE_CRLF_WARN)
+   if (checksafe & SAFE_CRLF_WARN)
warning(_("LF will be replaced by CRLF in %s.\n"
  "The file will have its original line"
  " endings in your working directory."), path);
@@ -268,7 +268,7 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 static int crlf_to_git(const struct index_state *istate,
   const char *path, const char *src, size_t len,
   struct strbuf *buf,
-  enum crlf_action crlf_action, enum safe_crlf checksafe)
+  enum crlf_action crlf_action, int checksafe)
 {
struct text_stat stats;
char *dst;
@@ -298,12 +298,12 @@ static int crlf_to_git(const struct index_state *istate,
 * unless we want to renormalize in a merge or
 * cherry-pick.
 */
-   if ((checksafe != SAFE_CRLF_RENORMALIZE) &&
+   if ((!(checksafe & SAFE_CRLF_RENORMALIZE)) &&
has_crlf_in_index(istate, path))
convert_crlf_into_lf = 0;
}
-   if ((checksafe == SAFE_CRLF_WARN ||
-   (checksafe == SAFE_CRLF_FAIL)) && len) {
+   if (((checksafe & SAFE_CRLF_WARN) ||
+((checksafe & SAFE_CRLF_FAIL) && len))) {
struct text_stat new_stats;
memcpy(_stats, , sizeof(new_stats));
/* simulate "git add" */
@@ -1129,7 +1129,7 @@ const char *get_convert_attr_ascii(const char *path)
 
 int convert_to_git(const struct index_state *istate,
   const char *path, const char *src, size_t len,
-   struct strbuf *dst, enum safe_crlf checksafe)
+  struct strbuf *dst, int checksafe)
 {
int ret = 0;
struct conv_attrs ca;
@@ 

Re: Rewrite cat-file.c : need help to find a bug

2017-12-29 Thread Jeff King
On Fri, Dec 29, 2017 at 01:05:50PM +0300, Оля Тележная wrote:

> Hi everyone,
> I am trying to reuse formatting logic from ref-filter in cat-file
> command. Now cat-file uses its own formatting code.
> I am trying to achieve that step-by-step, now I want to invoke
> populate_value function, and I have a bug somewhere.
> My code is here.
> https://github.com/telezhnaya/git/commits/catfile
> All commits that starts from "cat-file: " are successfully passing all tests.
> So for now my last commit fails, and I am tired of searching for the
> error. Actually, I just copied some values to another variable and
> move some code to other place. All "intelligent" work will go further,
> but now I need to fix current situation.

The problem is that "cat_file_info" is NULL when you get to
populate_value(), so the moved code doesn't trigger.

That variable is usually pulled from format->cat_file_data during
verify_ref_format(). But it triggers only when there's an actual format
atom to parse, and in this particular test the format is empty! But
cat-file is somewhat special here, in that even without any atoms it
needs to make sure the object exists.

So the patch below makes the test pass, though I think in the long run
all of this cat_file_info stuff would just get folded into regular
atoms, and you'd want cat-file to pass a special flag to ask ref-filter
to make sure the object exists.

-Peff

---
diff --git a/ref-filter.c b/ref-filter.c
index 62ca83807f..3acea4d2ef 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -742,6 +742,10 @@ int verify_ref_format(struct ref_format *format)
 {
const char *cp, *sp;
 
+   /* do this unconditionally, even if we have no atoms! */
+   if (format->cat_file_data)
+   cat_file_info = format->cat_file_data;
+
format->need_color_reset_at_eol = 0;
for (cp = format->format; *cp && (sp = find_next(cp)); ) {
const char *color, *ep = strchr(sp, ')');
@@ -753,7 +757,6 @@ int verify_ref_format(struct ref_format *format)
 
if (format->cat_file_data)
{
-   cat_file_info = format->cat_file_data;
at = parse_ref_filter_atom(format, valid_cat_file_atoms,
   
ARRAY_SIZE(valid_cat_file_atoms), sp + 2, ep);
} else


Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-29 Thread Torsten Bögershausen
On 2017-12-28 17:14, Lars Schneider wrote:> 
>> On 17 Dec 2017, at 18:14, Torsten Bögershausen  wrote:
>>
>> On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote:
>>> From: Lars Schneider 
>>>
> 
>>> +`encoding`
>>> +^^
>>> +
>>> +By default Git assumes UTF-8 encoding for text files.  The `encoding`
>>
>> This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8.
> 
> I am not sure I understand what you want to tell me here.
> Do you think UTF-8 encoding is not correct in the sentence above?

Git itself was designed to handle source code in ASCII.
Text files which are encoded in ISO-8859-1, UTF-8 or any
super-set of ASCII are handled as well, and what exactly to do
with bytes >0x80 is outside the responsibility of Git.
E.g. the interpretation and rendering on the screen may be
dependent on the locale or being guessed.
All in all, saying that Git expects UTF-8 may be "overdriven".
Any kind of file that uses an '\n' as an end of line
and has no NUL bytes '\0' works as a text file in Git.
(End of BlaBla ;-)

We can probably replace:
"By default Git assumes UTF-8 encoding for text files"

with something like
"Git handles UTF-8 encoded files as text files, but UTF-16 encoded
 files are handled as binary files"

>>
>>> +attribute sets the encoding to be used in the working directory.
>>> +If the path is added to the index, then Git encodes the content to
>>> +UTF-8.  On checkout the content is encoded back to the original
>>> +encoding.  Consequently, you can use all built-in Git text processing
>>> +tools (e.g. git diff, line ending conversions, etc.) with your
>>> +non-UTF-8 encoded file.
>>> +
>>> +Please note that re-encoding content can cause errors and requires
>>> +resources. Use the `encoding` attribute only if you cannot store
>>> +a file in UTF-8 encoding and if you want Git to be able to process
>>> +the text.
>>> +
>>> +
>>> +*.txt  text encoding=UTF-16
>>> +
>>
>> I think that encoding (or worktree-encoding as said elsewhere) must imply 
>> text.
>> (That is not done in convert.c, but assuming binary or even auto
>> does not make much sense to me)
> 
> "text" only means "LF". "-text" means CRLF which would be perfectly fine
> for UTF-16. Therefore I don't think we need to imply text.
> Does this make sense?
Yes and now.

"text" means convert CRLF at "git add" (or commit) into LF,
And potentially LF into CRLF at checkout,
depending on the EOL attribute (if set), core.autocrlf and/or core.eol.

"-text" means don't touch CRLF or LF at all. Files with CRLF are commited
with CRLF.
Running a Unix like "diff" tool will often show "^M" to indicate that there
is a CR before the LF, which may be distracting or confusing.

If someone ever wants to handle the UTF-16 files on a Linux/Mac/Unix system,
it makes sense to convert the line endings into LF before storing them
into the index (at least this is my experience).

(Not specifying "text" or "-text" at all will trigger the auto-handling,
 which is not needed at all).
So if we want to be sure that line endings are CRLF in the worktree we
should ask the user to specify things like this:

*.ext worktree-encoding=UTF-16 text eol=CRLF

It may be enough to give this example in the documentation.
and I would rather be over-careful here, to avoid future problems.

> 
>>
>>
>>> +
>>> +All `iconv` encodings with a stable round-trip conversion to and from
>>> +UTF-8 are supported.  You can see a full list with the following command:
>>> +
>>> +
>>> +iconv --list
>>> +
>>> +
>>> `eol`
>>> ^
>>>
>>> diff --git a/convert.c b/convert.c
>>> index 20d7ab67bd..ee19c17104 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -7,6 +7,7 @@
>>> #include "sigchain.h"
>>> #include "pkt-line.h"
>>> #include "sub-process.h"
>>> +#include "utf8.h"
>>>
>>> /*
>>>  * convert.c - convert a file when checking it out and checking it in.
>>> @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct 
>>> text_stat *stats,
>>>
>>> }
>>
>> I would avoid to use these #ifdefs here.
>> All functions can be in strbuf.c (and may have #ifdefs there), but not here.
> 
> I'll try that. But wouldn't it make more sense to move the functions to utf.c?

May be.
Originally utf8.c was about encoding and all kind of UTF-8 related stuff.
Especially it didn't know anything about strbuf.
I don't know why strbuf.h and other functions had been added here,

I once moved them into strbuf.c without any problems, but never send out
a patch, because of possible merge conflicts in ongoing patches.

In any case, if it is about strbuf, I would try to put it into strbuf.c

>>
>>>
>>> +#ifdef NO_ICONV
>>> +#ifndef _ICONV_T
>>> +/* The type is just a placeholder and not actually used. */
>>> +typedef void* iconv_t;
>>> +#endif
>>> +#endif
>>> +
>>> +static struct encoding {
>>> +   const char *name;
>>> +   iconv_t 

Rewrite cat-file.c : need help to find a bug

2017-12-29 Thread Оля Тележная
Hi everyone,
I am trying to reuse formatting logic from ref-filter in cat-file
command. Now cat-file uses its own formatting code.
I am trying to achieve that step-by-step, now I want to invoke
populate_value function, and I have a bug somewhere.
My code is here.
https://github.com/telezhnaya/git/commits/catfile
All commits that starts from "cat-file: " are successfully passing all tests.
So for now my last commit fails, and I am tired of searching for the
error. Actually, I just copied some values to another variable and
move some code to other place. All "intelligent" work will go further,
but now I need to fix current situation.

You could write here or leave comments in github if you have any ideas.

Thank you in advance for your help!
Olga


Segfault

2017-12-29 Thread Andrew Tsykhonya
git stash pop resulted in crash:
/usr/local/Cellar/git/2.10.2/libexec/git-core/git-stash: line 470:
14946 Segmentation fault: 11 git merge-recursive $b_tree -- $c_tree
$w_tree

although, the changes have been applied successfully.