Re: [PATCH] diff-tree: obey the color.ui configuration
On Fri, Dec 29, 2017 at 06:16:31PM -0500, Todd Zullinger wrote: > Ævar Arnfjörð Bjarmason wrote: > > No idea how to test this, in particular trying to pipe the output of > > color.ui=never v.s. color.ui=auto to a file as "auto" will disable > > coloring when it detects a pipe, but this fixes the issue. > > You might be able to use similar methods as those Jeff used > in the series merged from jk/ui-color-always-to-auto: > > https://github.com/gitster/git/tree/jk/ui-color-always-to-auto Yeah, test_terminal is the solution to testing. But... > He may also have some ideas about this issue in general. > (Or they could be tramatic memories, depending on how > painful it was to dig into the color code.) Yep. If we make diff-tree support color.ui, it's going to break a bunch of other stuff (like add--interactive) for people who set color.ui=always. I know this empirically, because we did that in v2.13, and a bunch of people complained. ;) The root of the problem is that the plumbing diff-tree defaults its internal color variable to "auto" in the first place. In theory the best way forward is fixing that, but it's likely to have a bunch of fallouts itself (scripts which use plumbing and where the user _does_ want color will stop showing it). This bug has been around since v1.8.4, I think, so it's hard to say how many people are depending on it at this point. A hackier option which would probably make most people happy would be to have plumbing respect "color.ui=never", but not any other values. I think the history of the back and forth is: - 4c7f1819b3 (make color.ui default to 'auto', 2013-06-10) introduced the problem of plumbing defaulting to "auto". This was in v1.8.4. - we did something similar to Ævar's patch in 136c8c8b8f (color: check color.ui in git_default_config(), 2017-07-13). That shipped in v2.14.2, and people with color.ui=always complained, because things like add--interactive broke for them. - we tried fixing it with 6be4595edb (color: make "always" the same as "auto" in config, 2017-10-03), but that broke people doing "git -c color.ui=always" as an equivalent of "--color". We talked about making the "-c" config behave differently from on-disk config, but got pretty disgusted at the weird hacks. And so... - we ended up with 33c643bb08 (Revert "color: check color.ui in git_default_config()", 2017-10-13), which just reverts the whole mess back to the pre-v2.14 state. This shipped in v2.15. So I don't think we want to go down that road again. If anything, we want to either fix the original sin from 4c7f1819b3, or we want to do the "respect only never" hack. -Peff
Re: [PATCH] Add shell completion for git remote rm
Ævar Arnfjörð Bjarmason wrote: >> I think adding 'rm' to completion definitely counts as advertisement. >> It doesn't have much practical use, after all: typing 'rm' with >> completion is actually one more keystroke than without (rm vs. rm). > > This is only one use of the completion interface, maybe you only use it > like that, but not everyone does. > > The completion interface has two uses, one is to actually save you > typing, the other is subcommand discovery, and maybe someone who has a > use neither you or I have thought of is about to point out a third. > > I'll type "git $whatever $subcommand" as *validation* that I've > found the right command, not to complete it for me. This is a thing > that's in my muscle memory for everything. Is that meant to be in favor of including rm in the completion results or against? :) > Since I've been typing "git remote rm" for a while (started before > this deprecation happened) I've actually been meaning to submit > completion for "rm" since it works, not knowing about Duy's patch until > now. > > Now, even if someone disagrees that we should have "rm" at all I think > that in general we should not conflate two different things, one is > whether: > > git remote > > shows both "rm" and "remove" in the list, and the other is whether: > > git remote rm > > Should yield: > > git remote rm > > Or, as now happens: > > git remote rm > > I can see why we'd, in general, we'd like to not advertise certain > options for completion (due to deprecation, or just to avoid verbosity), > but complete them once they're unambiguously typed out. > > I don't know whether the bash completion interface supports making that > distinction, but it would be useful. It can be done, though I think that it's probably better to subtly lead people to use 'git remote remove' going forward, to keep things consistent. I don't have a strong preference for or against the rm -> remove change, but since that's been done I think there's a benefit to keeping things consistent in the UI. And I think that should also apply to not offering completion for commands/subcommands/options which are only kept for backward compatibility. Here's one way to make 'git remote rm ' work without including it in the output of 'git remote ': diff --git i/contrib/completion/git-completion.bash w/contrib/completion/git-completion.bash index 3683c772c5..aa63f028ab 100644 --- i/contrib/completion/git-completion.bash +++ w/contrib/completion/git-completion.bash @@ -2668,7 +2668,9 @@ _git_remote () add rename remove set-head set-branches get-url set-url show prune update " - local subcommand="$(__git_find_on_cmdline "$subcommands")" + # Don't advertise rm by including it in subcommands, but complete + # remotes if it is used. + local subcommand="$(__git_find_on_cmdline "$subcommands rm")" if [ -z "$subcommand" ]; then case "$cur" in --*) Keeping 'git remote rm' working to avoid breaking scripts is one thing, but having it in the completion code makes it more likely that it will continue to be seen as a recommended subcommand. This leads to patches like this one, where it's presumed that the lack of completion is simply an oversight or a bug. Of course, the lack of completion hasn't caused everyone to forget that 'remote rm' was changed to 'remote remove', so that reasoning may be full of hot air (or worse). ;) The current result of 'git remote rm ' isn't so great. It's arguably worse to have it pretend that no subcommand was given than to list the remotes. $ git remote rm addremove set-head update get-urlrename set-url prune set-branches show I think completing nothing or completing the remotes (without offering rm in the subcommand list) would be better, after looking at it a bit. I don't know how to disable file completion, but I'm not intimately familiar with the git completion script (thanks to it working so damn well). I'm guessing there's a way to do that, if there's a strong desire to not complete the remotes at all. I don't think we should include rm in 'git remote ' completion, but I don't care much either way what 'git remote rm ' includes. But it should be better than including the other subcommands. -- Todd ~~ There, I've gone and soiled myself, are you happy now?! -- Stewie Griffin
possible completion bug with --set-upstream-to=
Whenever I type the last to complete origin/master, as in below: > git branch --set-upstream-to=orig what I get is: > git branch origin/master instead of the expected: > git branch --set-upstream-to=origin/master git version and OS: >git version 2.1.4 > >Distributor ID:Debian >Description: Debian GNU/Linux 8.7 (jessie) >Release: 8.7 >Codename: jessie > Thanks, Ernesto
Re: [PATCH] Add shell completion for git remote rm
On Fri, Dec 29 2017, SZEDER Gábor jotted: >> Keith Smiley wrote: >> > It looks like that was just about preferring remove in documentation >> > and the like, I think it would still make sense to have both for >> > completion since rm is still supported. >> >> I read it as a first step in a long process to eventually >> remove 'remote rm', but if that's never intended, then sure, >> restoring completion for it seems reasonable. >> >> It would be good to hear from those who know or recall the >> intention. >> >> I think we should only complete the preferred subcommand. >> That encourages use of 'remote remove' even if 'remote rm' >> will stay forever to avoid breaking existing scripts. > > Quoting from the commit message of e17dba8fe1 ("remote: prefer > subcommand name 'remove' to 'rm'", 2012-09-06): > > 'rm' is still supported and used in the test suite. It's just not > widely advertised. I think we made the wrong choice in standardizing on remove instead of rm, it should be rm for consistency with git-rm, and "remote rename" should be "remote mv" etc., just like we have git-mv. Maybe if we didn't have the Unix legacy we'd pick rename and remove to be consitent for both, but since that's not going to happen this bit of inconsistency is worse. Now with that showing of my biases out of the way... > I think adding 'rm' to completion definitely counts as advertisement. > It doesn't have much practical use, after all: typing 'rm' with > completion is actually one more keystroke than without (rm vs. rm). This is only one use of the completion interface, maybe you only use it like that, but not everyone does. The completion interface has two uses, one is to actually save you typing, the other is subcommand discovery, and maybe someone who has a use neither you or I have thought of is about to point out a third. I'll type "git $whatever $subcommand" as *validation* that I've found the right command, not to complete it for me. This is a thing that's in my muscle memory for everything. The post-hoc reason is because if you're a fast typist you don't actually save time on typing the command (usually I just use reverse shell search anyway), but rather on not screwing up the command itself via a typo. Sometimes commands take a while to fail, and even if it's immediate re-editing them takes longer than getting it right in the first place. Since I've been typing "git remote rm" for a while (started before this deprecation happened) I've actually been meaning to submit completion for "rm" since it works, not knowing about Duy's patch until now. Now, even if someone disagrees that we should have "rm" at all I think that in general we should not conflate two different things, one is whether: git remote shows both "rm" and "remove" in the list, and the other is whether: git remote rm Should yield: git remote rm Or, as now happens: git remote rm I can see why we'd, in general, we'd like to not advertise certain options for completion (due to deprecation, or just to avoid verbosity), but complete them once they're unambiguously typed out. I don't know whether the bash completion interface supports making that distinction, but it would be useful.
Re: [PATCH] Add shell completion for git remote rm
The goal of this fix isn't to complete rm itself (although that is a side effect), it's to complete the remote names after you type rm. Without this patch doing this: git remote rm Attempts to complete the options for `git remote` instead of the remote names. -- Keith Smiley On 12/29, SZEDER Gábor wrote: Keith Smiley wrote: > It looks like that was just about preferring remove in documentation > and the like, I think it would still make sense to have both for > completion since rm is still supported. I read it as a first step in a long process to eventually remove 'remote rm', but if that's never intended, then sure, restoring completion for it seems reasonable. It would be good to hear from those who know or recall the intention. I think we should only complete the preferred subcommand. That encourages use of 'remote remove' even if 'remote rm' will stay forever to avoid breaking existing scripts. Quoting from the commit message of e17dba8fe1 ("remote: prefer subcommand name 'remove' to 'rm'", 2012-09-06): 'rm' is still supported and used in the test suite. It's just not widely advertised. I think adding 'rm' to completion definitely counts as advertisement. It doesn't have much practical use, after all: typing 'rm' with completion is actually one more keystroke than without (rm vs. rm). Gábor
Re: [PATCH] diff-tree: obey the color.ui configuration
Ævar Arnfjörð Bjarmason wrote: > No idea how to test this, in particular trying to pipe the output of > color.ui=never v.s. color.ui=auto to a file as "auto" will disable > coloring when it detects a pipe, but this fixes the issue. You might be able to use similar methods as those Jeff used in the series merged from jk/ui-color-always-to-auto: https://github.com/gitster/git/tree/jk/ui-color-always-to-auto He may also have some ideas about this issue in general. (Or they could be tramatic memories, depending on how painful it was to dig into the color code.) -- Todd ~~ Subtlety is the art of saying what you think and getting out of the way before it is understood. -- Anonymous
[PATCH] perf: amend the grep tests to test grep.threads
Ever since 5b594f457a ("Threaded grep", 2010-01-25) the number of threads git-grep uses under PTHREADS has been hardcoded to 8, but there's no performance test to check whether this is an optimal setting. Amend the existing tests for the grep engines to support a mode where this can be tested, e.g.: GIT_PERF_GREP_THREADS='1 8 16' GIT_PERF_LARGE_REPO=~/g/linux ./run p782* Signed-off-by: Ævar Arnfjörð Bjarmason--- This looks less scary under diff -w. t/perf/p7820-grep-engines.sh | 52 --- t/perf/p7821-grep-engines-fixed.sh | 55 ++ 2 files changed, 86 insertions(+), 21 deletions(-) diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh index 62aba19e76..8b09c5bf32 100755 --- a/t/perf/p7820-grep-engines.sh +++ b/t/perf/p7820-grep-engines.sh @@ -12,6 +12,9 @@ e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try: -vi -vw -viw + +If GIT_PERF_GREP_THREADS is set to a list of threads (e.g. '1 4 8' +etc.) we will test the patterns under those numbers of threads. " . ./perf-lib.sh @@ -19,6 +22,11 @@ e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try: test_perf_large_repo test_checkout_worktree +if test -n "$GIT_PERF_GREP_THREADS" +then + test_set_prereq PERF_GREP_ENGINES_THREADS +fi + for pattern in \ 'how.to' \ '^how to' \ @@ -39,18 +47,42 @@ do else prereq="" fi - test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_OPTS '$pattern'" " - git -c grep.patternType=$engine grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine' || : - " - done - - test_expect_success "assert that all engines found the same for$GIT_PERF_7820_GREP_OPTS '$pattern'" ' - test_cmp out.basic out.extended && - if test_have_prereq PCRE + if ! test_have_prereq PERF_GREP_ENGINES_THREADS then - test_cmp out.basic out.perl + test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_OPTS '$pattern'" " + git -c grep.patternType=$engine grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine' || : + " + else + for threads in $GIT_PERF_GREP_THREADS + do + test_perf PTHREADS,$prereq "$engine grep$GIT_PERF_7820_GREP_OPTS '$pattern' with $threads threads" " + git -c grep.patternType=$engine -c grep.threads=$threads grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine.$threads' || : + " + done fi - ' + done + + if ! test_have_prereq PERF_GREP_ENGINES_THREADS + then + test_expect_success "assert that all engines found the same for$GIT_PERF_7820_GREP_OPTS '$pattern'" ' + test_cmp out.basic out.extended && + if test_have_prereq PCRE + then + test_cmp out.basic out.perl + fi + ' + else + for threads in $GIT_PERF_GREP_THREADS + do + test_expect_success PTHREADS "assert that all engines found the same for$GIT_PERF_7820_GREP_OPTS '$pattern' under threading" " + test_cmp out.basic.$threads out.extended.$threads && + if test_have_prereq PCRE + then + test_cmp out.basic.$threads out.perl.$threads + fi + " + done + fi done test_done diff --git a/t/perf/p7821-grep-engines-fixed.sh b/t/perf/p7821-grep-engines-fixed.sh index c7ef1e198f..61e41b82cf 100755 --- a/t/perf/p7821-grep-engines-fixed.sh +++ b/t/perf/p7821-grep-engines-fixed.sh @@ -6,6 +6,9 @@ Set GIT_PERF_7821_GREP_OPTS in the environment to pass options to git-grep. Make sure to include a leading space, e.g. GIT_PERF_7821_GREP_OPTS=' -w'. See p7820-grep-engines.sh for more options to try. + +If GIT_PERF_7821_THREADS is set to a list of threads (e.g. '1 4 8' +etc.) we will test the patterns under those numbers of threads. " . ./perf-lib.sh @@ -13,6 +16,11 @@ options to try. test_perf_large_repo test_checkout_worktree +if test -n "$GIT_PERF_GREP_THREADS" +then + test_set_prereq PERF_GREP_ENGINES_THREADS +fi + for pattern in 'int' 'uncommon' 'æ' do for engine in fixed basic extended perl @@ -23,19 +31,44 @@ do else prereq="" fi - test_perf $prereq "$engine grep$GIT_PERF_7821_GREP_OPTS $pattern" " - git -c
[PATCH] diff-tree: obey the color.ui configuration
Before git-bisect exits it calls `diff-tree --pretty --stat $commit` on the bad commit. This would always print the "commit" line with coloring despite color.ui being set to "never". Teach diff-tree to look at the git_color_config() configuration. I initially tried to add this to git_diff_basic_config itself, but it makes other unrelated things fail, and this is a more isolated change that solves the issue. Signed-off-by: Ævar Arnfjörð Bjarmason--- No idea how to test this, in particular trying to pipe the output of color.ui=never v.s. color.ui=auto to a file as "auto" will disable coloring when it detects a pipe, but this fixes the issue. builtin/diff-tree.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c index b775a75647..0311c01a87 100644 --- a/builtin/diff-tree.c +++ b/builtin/diff-tree.c @@ -97,6 +97,15 @@ static void diff_tree_tweak_rev(struct rev_info *rev, struct setup_revision_opt } } + +static int diff_tree_config(const char *var, const char *value, void *cb) +{ + if (git_color_config(var, value, cb) < 0) + return -1; + + return git_diff_basic_config(var, value, cb); +} + int cmd_diff_tree(int argc, const char **argv, const char *prefix) { char line[1000]; @@ -108,7 +117,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix) if (argc == 2 && !strcmp(argv[1], "-h")) usage(diff_tree_usage); - git_config(git_diff_basic_config, NULL); /* no "diff" UI options */ + git_config(diff_tree_config, NULL); /* no "diff" UI options */ init_revisions(opt, prefix); if (read_cache() < 0) die(_("index file corrupt")); -- 2.15.1.424.g9478a66081
Re: [PATCH] Add shell completion for git remote rm
> Keith Smiley wrote: > > It looks like that was just about preferring remove in documentation > > and the like, I think it would still make sense to have both for > > completion since rm is still supported. > > I read it as a first step in a long process to eventually > remove 'remote rm', but if that's never intended, then sure, > restoring completion for it seems reasonable. > > It would be good to hear from those who know or recall the > intention. > > I think we should only complete the preferred subcommand. > That encourages use of 'remote remove' even if 'remote rm' > will stay forever to avoid breaking existing scripts. Quoting from the commit message of e17dba8fe1 ("remote: prefer subcommand name 'remove' to 'rm'", 2012-09-06): 'rm' is still supported and used in the test suite. It's just not widely advertised. I think adding 'rm' to completion definitely counts as advertisement. It doesn't have much practical use, after all: typing 'rm' with completion is actually one more keystroke than without (rm vs. rm). Gábor
Re: [BUG] git bisect colour output contrary to configuration
On Fri, Dec 29 2017, zef...@fysh.org jotted: > My ~/.gitconfig sets color.ui=never, which should prevent attempts > at colouring output from all git commands. I do not have any git > configuration enabling colour in any situation (such as for specific > commands). But when a git bisect completes, the output identifying > the first bad commit includes escape sequences to colour the "commit > 3e6..." line yellow. Excerpt of strace output (with many irrelevant > lines omitted): > > 23851 write(1, "3e6fc602e433dbd76941ac0ef7a438a77fbe9a05 is the first bad > commit\n", 65) = 65 > 23851 open("/home/zefram/.gitconfig", O_RDONLY) = 3 > 23851 read(3, "[user]\n\tname = Zefram\n\temail = > zef...@fysh.org\n\tsigningkey = 0x8E1E1EC1\n\n[color]\n\tui = never\n", 4096) > = 93 > 23851 write(1, "\33[33mcommit > 3e6fc602e433dbd76941ac0ef7a438a77fbe9a05\33[m\n", 56) = 56 > > Given the configuration, that line should be free of escape sequences. > > I'm mainly using git 2.1.4 via Debian, but I've also > reproduced this problem with the latest from git.git (commit > 1eaabe34fc6f486367a176207420378f587d3b48, tagged v2.16.0-rc0). This issue is a bug, but has nothing do do with bisect per-se, but is a bug in diff-tree, compare these two: git -c color.ui=never diff-tree --pretty --stat HEAD git -c color.ui=never show --pretty --stat HEAD diff-tree will incorrectly show colored output here despite ui.color=never.
Re: [PATCH 2/2] travis-ci: record and skip successfully built trees
On Fri, Dec 29, 2017 at 9:03 PM, SZEDER Gáborwrote: Or print it in a different color? Maybe red? See: https://travis-ci.org/szeder/git/jobs/322247836#L622-L625 >>> >>> I considered using color for the first line, but then didn't do it, >>> because I didn't want to decide the color :) >>> Anyway, red is the general error/failure color, but this is neither. It >>> could either be green, to match the color of the build job's result, or >>> something neutral like blue or yellow. >> >> You are right about red. I think I like yellow to express "warning". >> But this is just a nit. > > OK, yellow it will be then. Oh, hang on! Have a look: https://travis-ci.org/szeder/git/jobs/322841285#L629 That yellow is barely different from default text color. Bold stands out much better, but notice how Travis CI uses bold for all its messages. Therefore I think we should not use bold and leave it exclusive to Travis CI's messages. Green looks good: https://travis-ci.org/szeder/git/jobs/322372326#L629 Gábor
[BUG] git bisect colour output contrary to configuration
My ~/.gitconfig sets color.ui=never, which should prevent attempts at colouring output from all git commands. I do not have any git configuration enabling colour in any situation (such as for specific commands). But when a git bisect completes, the output identifying the first bad commit includes escape sequences to colour the "commit 3e6..." line yellow. Excerpt of strace output (with many irrelevant lines omitted): 23851 write(1, "3e6fc602e433dbd76941ac0ef7a438a77fbe9a05 is the first bad commit\n", 65) = 65 23851 open("/home/zefram/.gitconfig", O_RDONLY) = 3 23851 read(3, "[user]\n\tname = Zefram\n\temail = zef...@fysh.org\n\tsigningkey = 0x8E1E1EC1\n\n[color]\n\tui = never\n", 4096) = 93 23851 write(1, "\33[33mcommit 3e6fc602e433dbd76941ac0ef7a438a77fbe9a05\33[m\n", 56) = 56 Given the configuration, that line should be free of escape sequences. I'm mainly using git 2.1.4 via Debian, but I've also reproduced this problem with the latest from git.git (commit 1eaabe34fc6f486367a176207420378f587d3b48, tagged v2.16.0-rc0). -zefram
Re: [PATCH 2/2] travis-ci: record and skip successfully built trees
On Thu, Dec 28, 2017 at 12:16 PM, Lars Schneiderwrote: > >> On 28 Dec 2017, at 00:00, SZEDER Gábor wrote: >> >> On Wed, Dec 27, 2017 at 8:15 PM, Lars Schneider >> wrote: >>> On 27 Dec 2017, at 17:49, SZEDER Gábor wrote: +# Skip the build job if the same tree has already been built and tested +# successfully before (e.g. because the branch got rebased, changing only +# the commit messages). +skip_good_tree () { + if ! good_tree_info="$(grep "^$(git rev-parse $TRAVIS_COMMIT^{tree}) " "$good_trees_file")" + then + # haven't seen this tree yet; continue the build job + return + fi + + echo "$good_tree_info" | { + read tree prev_good_commit prev_good_job_number prev_good_job_id + + if test "$TRAVIS_JOB_ID" = "$prev_good_job_id" >>> >>> Under what circumstances would that be true? >> >> When the user hits 'Restart job' on the Travis CI web interface, >> $TRAVI_JOB_NUMBER and _ID remain the same in the restarted build job as >> they were in the original. >> So the condition is true when the user hits 'Restart job' on a build job >> that was the first to successfully build and test the current tree. > > I think I would prefer it if Travis would rerun all jobs if I hit the > "refresh" button. What is your intention here? I considered that and don't think it's worth the effort. First, I think it's a rather rare use case. I don't know what others are doing, but I only hit 'Restart job' to restart timeouted OSX build jobs (and to test this patch :), and that already works with this patch. I don't really see any reason to restart old successful build jobs, except perhaps when a new version of one of the dependencies becomes available (e.g. P4 and Git LFS versions are not hardcoded on OSX, we use whatever homebrew delivers), to see that an older version still works with the new dependencies. Has anyone ever done something like that? :) Second, we need to know when a build job is run after the user hit 'Restart job'. Unless I overlooked something, Travis CI doesn't indicate this. I'm not sure this is documented explicitly, but it seems that a restarted build job gets the same $TRAVIS_JOB_{NUMBER,ID} variables as the original. We could use this to identify restarted build jobs, but to do that we would have to save this information at the end of every successful build, too, and add additional checks to this function, of course. + then + cat <<-EOF + Skipping build job for commit $TRAVIS_COMMIT. + This commit has already been built and tested successfully by this build job. + To force a re-build delete the branch's cache and then hit 'Restart job'. + EOF + else + cat <<-EOF + Skipping build job for commit $TRAVIS_COMMIT. + This commit's tree has already been built and tested successfully in build job $prev_good_job_number for commit $prev_good_commit. + The log of that build job is available at https://travis-ci.org/$TRAVIS_REPO_SLUG/jobs/$prev_good_job_id + To force a re-build delete the branch's cache and then hit 'Restart job'. + EOF >>> >>> Maybe add a few newlines before and after EOF to make the text more stand >>> out? >>> Or print it in a different color? Maybe red? >>> >>> See: https://travis-ci.org/szeder/git/jobs/322247836#L622-L625 >> >> I considered using color for the first line, but then didn't do it, >> because I didn't want to decide the color :) >> Anyway, red is the general error/failure color, but this is neither. It >> could either be green, to match the color of the build job's result, or >> something neutral like blue or yellow. > > You are right about red. I think I like yellow to express "warning". > But this is just a nit. OK, yellow it will be then. > "skip_branch_tip_with_tag" could print its output yellow, too. > > - Lars
Re: [PATCH v2 4/5] convert: add support for 'checkout-encoding' attribute
I probably need to look at convert.c more closer, some other comments inline. On Fri, Dec 29, 2017 at 04:22:21PM +0100, lars.schnei...@autodesk.com wrote: > From: Lars Schneider> > Git and its tools (e.g. git diff) expect all text files in UTF-8 > encoding. Git will happily accept content in all other encodings, too, > but it might not be able to process the text (e.g. viewing diffs or > changing line endings). UTF-8 is too strict, the text from below is more correct: +Git recognizes files encoded with ASCII or one of its supersets (e.g. +UTF-8 or ISO-8859-1) as text files. All other encodings are usually +interpreted as binary and consequently built-in Git text processing +tools (e.g. 'git diff') as well as most Git web front ends do not +visualize the content. > > Add an attribute to tell Git what encoding the user has defined for a > given file. If the content is added to the index, then Git converts the > content to a canonical UTF-8 representation. On checkout Git will Minor question about "canonical": Would this mean the same ? ...then Git converts the content into UTF-8. > reverse the conversion. > > Signed-off-by: Lars Schneider > --- > Documentation/gitattributes.txt | 59 > apply.c | 2 +- > blame.c | 2 +- > combine-diff.c | 2 +- > convert.c | 196 ++- > convert.h | 8 +- > diff.c | 2 +- > sha1_file.c | 5 +- > t/t0028-checkout-encoding.sh| 197 > > 9 files changed, 460 insertions(+), 13 deletions(-) > create mode 100755 t/t0028-checkout-encoding.sh > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 30687de81a..0039bd38c3 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -272,6 +272,65 @@ few exceptions. Even though... >catch potential problems early, safety triggers. > > > +`checkout-encoding` > +^^^ > + > +Git recognizes files encoded with ASCII or one of its supersets (e.g. > +UTF-8 or ISO-8859-1) as text files. All other encodings are usually > +interpreted as binary and consequently built-in Git text processing > +tools (e.g. 'git diff') as well as most Git web front ends do not > +visualize the content. > + > +In these cases you can teach Git the encoding of a file in the working teach ? tell ? > +directory with the `checkout-encoding` attribute. If a file with this > +attributes is added to Git, then Git reencodes the content from the > +specified encoding to UTF-8 and stores the result in its internal data > +structure. Minor Q: > its internal data structure. Should we simply write "stores the result in the index" ? > On checkout the content is encoded back to the specified > +encoding. > + > +Please note that using the `checkout-encoding` attribute has a number > +of drawbacks: > + > +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause > + errors as the conversion might not be round trip safe. > + > +- Reencoding content requires resources that might slow down certain > + Git operations (e.g 'git checkout' or 'git add'). > + > +- Git clients that do not support the `checkout-encoding` attribute or > + the used encoding will checkout the respective files as UTF-8 encoded. > + That means the content appears to be different which could cause > + trouble. Affected clients are older Git versions and alternative Git > + implementations such as JGit or libgit2 (as of January 2018). > + > +Use the `checkout-encoding` attribute only if you cannot store a file in > +UTF-8 encoding and if you want Git to be able to process the content as > +text. > + I would maybe rephrase a little bit (first things first): Please note that using the `checkout-encoding` attribute may have a number of pitfalls: - Git clients that do not support the `checkout-encoding` attribute will checkout the respective files as UTF-8 encoded, which typically causes trouble. Escpecialy when older Git versions are used or alternative Git implementations such as JGit or libgit2 (as of January 2018). - Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause errors as the conversion might not be round trip safe. - Reencoding content requires resources that might slow down certain Git operations (e.g 'git checkout' or 'git add'). Use the `checkout-encoding` attribute only if you cannot store a file in UTF-8 encoding and if you want Git to be able to process the content as text. - Side question: What happens if "the used encoding" is not supported by the underlying iconv lib ? Will Git fail, delete the file from the working tree ? That may be worth to mention. (And I need to do the code-review) > +Use the
Re: [ANNOUNCE] Git v2.16.0-rc0
On Thu, 2017-12-28 at 20:30 -0800, Junio C Hamano wrote: > * The way "git worktree add" determines what branch to create from >where and checkout in the new worktree has been updated a bit. Does this include the enhancements published a few weeks ago to allow worktrees to be created directly from remote branches without first checking out the branch locally? I'm really looking forward to that change... Thanks!
Re: [PATCH v2 1/5] strbuf: add xstrdup_toupper()
> On 29 Dec 2017, at 16:56, Torsten Bögershausenwrote: > > On Fri, Dec 29, 2017 at 04:22:18PM +0100, lars.schnei...@autodesk.com wrote: >> From: Lars Schneider >> >> Create a copy of an existing string and make all characters upper case. >> Similar xstrdup_tolower(). >> >> This function is used in a subsequent commit. >> >> Signed-off-by: Lars Schneider >> --- >> strbuf.c | 13 + >> strbuf.h | 1 + >> 2 files changed, 14 insertions(+) >> >> diff --git a/strbuf.c b/strbuf.c >> index 323c49ceb3..54276e96e7 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string) >> return result; >> } >> >> +char *xstrdup_toupper(const char *string) >> +{ >> +char *result; >> +size_t len, i; >> + >> +len = strlen(string); >> +result = xmallocz(len); >> +for (i = 0; i < len; i++) >> +result[i] = toupper(string[i]); >> +result[i] = '\0'; > > Isn't this already done by xmallocz() I copied that code from xstrdup_tolower(). The original implementation [1] and its refactored version [2] used xmalloc(). Later on xmallocz [3] was introduced. [3] states "we can stop manually placing NUL at the end of the allocated buffer. But that's only safe if it's clear that the contents will always fill the buffer." As far as I understand it, the content should always fill the buffer in the upper/lower case conversion. Therefore, I agree with you that the assignment is not necessary. - Lars [1] d4770964d5 (config: "git config --get-urlmatch" parses section..key, 2013-07-31) [2] 88d5a6f6cd (daemon/config: factor out duplicate xstrdup_tolower, 2014-05-22) [3] 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22)
Re: [PATCH] completion: restore 'remote rm'
Updated: e17dba8fe1 ("remote: prefer subcommand name 'remove' to 'rm'", 2012-09-06) removed the 'rm' subcommand from completion. The 'remote rm' subcommand is still supported and not planned to be removed. Offer completions for it. Signed-off-by: Keith Smiley--- contrib/completion/git-completion.bash | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 3683c772c5586..3e9044087e6ba 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -2665,7 +2665,7 @@ _git_config () _git_remote () { local subcommands=" - add rename remove set-head set-branches + add rename remove rm set-head set-branches get-url set-url show prune update " local subcommand="$(__git_find_on_cmdline "$subcommands")" -- Keith Smiley On 12/29, Todd Zullinger wrote: Keith Smiley wrote: It looks like that was just about preferring remove in documentation and the like, I think it would still make sense to have both for completion since rm is still supported. I read it as a first step in a long process to eventually remove 'remote rm', but if that's never intended, then sure, restoring completion for it seems reasonable. It would be good to hear from those who know or recall the intention. I think we should only complete the preferred subcommand. That encourages use of 'remote remove' even if 'remote rm' will stay forever to avoid breaking existing scripts. If it does make sense to restore completion, adding a link back to e17dba8fe1 and explaining why the completion is being restored would be good. Reading the commit message now makes it sound like 'remote rm' was never present and is being added to correct an oversight. Maybe something like: completion: restore 'remote rm' e17dba8fe1 ("remote: prefer subcommand name 'remove' to 'rm'", 2012-09-06) removed the 'rm' subcommand from completion. The 'remote rm' subcommand is still supported and not planned to be removed. Offer completions for it. I also noticed that in your original commit that you say "list of removes as remove does." That should be "remotes" instead of "removes" there. -- I've made that typo myself quite often. :) -- Todd ~~ There are no stupid questions, but there are a LOT of inquisitive idiots. -- Demotivators (www.despair.com)
Re: [PATCH v2 1/5] strbuf: add xstrdup_toupper()
On Fri, Dec 29, 2017 at 04:22:18PM +0100, lars.schnei...@autodesk.com wrote: > From: Lars Schneider> > Create a copy of an existing string and make all characters upper case. > Similar xstrdup_tolower(). > > This function is used in a subsequent commit. > > Signed-off-by: Lars Schneider > --- > strbuf.c | 13 + > strbuf.h | 1 + > 2 files changed, 14 insertions(+) > > diff --git a/strbuf.c b/strbuf.c > index 323c49ceb3..54276e96e7 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string) > return result; > } > > +char *xstrdup_toupper(const char *string) > +{ > + char *result; > + size_t len, i; > + > + len = strlen(string); > + result = xmallocz(len); > + for (i = 0; i < len; i++) > + result[i] = toupper(string[i]); > + result[i] = '\0'; Isn't this already done by xmallocz() > + return result; > +} > + > char *xstrvfmt(const char *fmt, va_list ap) > { > struct strbuf buf = STRBUF_INIT; > diff --git a/strbuf.h b/strbuf.h > index 0a74acb236..2bc148526f 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -616,6 +616,7 @@ __attribute__((format (printf,2,3))) > extern int fprintf_ln(FILE *fp, const char *fmt, ...); > > char *xstrdup_tolower(const char *); > +char *xstrdup_toupper(const char *); > > /** > * Create a newly allocated string using printf format. You can do this > easily > -- > 2.15.1 >
[PATCH v2 1/5] strbuf: add xstrdup_toupper()
From: Lars SchneiderCreate a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 13 + strbuf.h | 1 + 2 files changed, 14 insertions(+) diff --git a/strbuf.c b/strbuf.c index 323c49ceb3..54276e96e7 100644 --- a/strbuf.c +++ b/strbuf.c @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string) return result; } +char *xstrdup_toupper(const char *string) +{ + char *result; + size_t len, i; + + len = strlen(string); + result = xmallocz(len); + for (i = 0; i < len; i++) + result[i] = toupper(string[i]); + result[i] = '\0'; + return result; +} + char *xstrvfmt(const char *fmt, va_list ap) { struct strbuf buf = STRBUF_INIT; diff --git a/strbuf.h b/strbuf.h index 0a74acb236..2bc148526f 100644 --- a/strbuf.h +++ b/strbuf.h @@ -616,6 +616,7 @@ __attribute__((format (printf,2,3))) extern int fprintf_ln(FILE *fp, const char *fmt, ...); char *xstrdup_tolower(const char *); +char *xstrdup_toupper(const char *); /** * Create a newly allocated string using printf format. You can do this easily -- 2.15.1
[PATCH v2 2/5] utf8: add function to detect prohibited UTF-16/32 BOM
From: Lars SchneiderWhenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars Schneider --- utf8.c | 24 utf8.h | 9 + 2 files changed, 33 insertions(+) diff --git a/utf8.c b/utf8.c index 2c27ce0137..776660ee12 100644 --- a/utf8.c +++ b/utf8.c @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz, } #endif +static int has_bom_prefix(const char *data, size_t len, + const char *bom, size_t bom_len) +{ + return (len >= bom_len) && !memcmp(data, bom, bom_len); +} + +const char utf16_be_bom[] = {0xFE, 0xFF}; +const char utf16_le_bom[] = {0xFF, 0xFE}; +const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF}; +const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00}; + +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) && + (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) && + (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 6bbcf31a83..4711429af9 100644 --- a/utf8.h +++ b/utf8.h @@ -70,4 +70,13 @@ typedef enum { void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int width, const char *s); +/* + * Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE + * or UTF-32LE a BOM must not be used [1]. The function returns true if + * this is the case. + * + * [1] http://unicode.org/faq/utf_bom.html#bom10 + */ +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.15.1
[PATCH v2 5/5] convert: add tracing for checkout-encoding
From: Lars SchneiderAdd the GIT_TRACE_CHECKOUT_ENCODING environment variable to enable tracing for content that is reencoded with the checkout-encoding attribute. Signed-off-by: Lars Schneider --- convert.c| 28 t/t0028-checkout-encoding.sh | 2 ++ 2 files changed, 30 insertions(+) diff --git a/convert.c b/convert.c index fc8c96b670..ca7b2f3e5c 100644 --- a/convert.c +++ b/convert.c @@ -257,6 +257,29 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } +static void trace_encoding(const char *context, const char *path, + const char *encoding, const char *buf, size_t len) +{ + static struct trace_key coe = TRACE_KEY_INIT(CHECKOUT_ENCODING); + struct strbuf trace = STRBUF_INIT; + int i; + + strbuf_addf(, "%s (%s, considered %s):\n", context, path, encoding); + for (i = 0; i < len && buf; ++i) { + strbuf_addf( + ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c", + i, + (unsigned char) buf[i], + (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '), + ((i+1) % 8 && (i+1) < len ? ' ' : '\n') + ); + } + strbuf_addchars(, '\n', 1); + + trace_strbuf(, ); + strbuf_release(); +} + static struct encoding { const char *name; struct encoding *next; @@ -316,6 +339,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, error(error_msg, path, enc->name); } + trace_encoding("source", path, enc->name, src, src_len); dst = reencode_string_len(src, src_len, default_encoding, enc->name, _len); if (!dst) { @@ -331,6 +355,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, else error(msg, path, enc->name, default_encoding); } + trace_encoding("destination", path, default_encoding, dst, dst_len); /* * UTF supports lossless round tripping [1]. UTF to other encoding are @@ -356,6 +381,9 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, enc->name, default_encoding, _src_len); + trace_encoding("reencoded source", path, enc->name, + re_src, re_src_len); + if (!re_src || src_len != re_src_len || memcmp(src, re_src, src_len)) { const char* msg = _("encoding '%s' from %s to %s and " diff --git a/t/t0028-checkout-encoding.sh b/t/t0028-checkout-encoding.sh index 1a329ab933..df3cc91269 100755 --- a/t/t0028-checkout-encoding.sh +++ b/t/t0028-checkout-encoding.sh @@ -4,6 +4,8 @@ test_description='checkout-encoding conversion via gitattributes' . ./test-lib.sh +GIT_TRACE_CHECKOUT_ENCODING=1 && export GIT_TRACE_CHECKOUT_ENCODING + test_expect_success 'setup test repo' ' text="hallo there!\ncan you read me?" && -- 2.15.1
[PATCH v2 4/5] convert: add support for 'checkout-encoding' attribute
From: Lars SchneiderGit and its tools (e.g. git diff) expect all text files in UTF-8 encoding. Git will happily accept content in all other encodings, too, but it might not be able to process the text (e.g. viewing diffs or changing line endings). Add an attribute to tell Git what encoding the user has defined for a given file. If the content is added to the index, then Git converts the content to a canonical UTF-8 representation. On checkout Git will reverse the conversion. Signed-off-by: Lars Schneider --- Documentation/gitattributes.txt | 59 apply.c | 2 +- blame.c | 2 +- combine-diff.c | 2 +- convert.c | 196 ++- convert.h | 8 +- diff.c | 2 +- sha1_file.c | 5 +- t/t0028-checkout-encoding.sh| 197 9 files changed, 460 insertions(+), 13 deletions(-) create mode 100755 t/t0028-checkout-encoding.sh diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81a..0039bd38c3 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -272,6 +272,65 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`checkout-encoding` +^^^ + +Git recognizes files encoded with ASCII or one of its supersets (e.g. +UTF-8 or ISO-8859-1) as text files. All other encodings are usually +interpreted as binary and consequently built-in Git text processing +tools (e.g. 'git diff') as well as most Git web front ends do not +visualize the content. + +In these cases you can teach Git the encoding of a file in the working +directory with the `checkout-encoding` attribute. If a file with this +attributes is added to Git, then Git reencodes the content from the +specified encoding to UTF-8 and stores the result in its internal data +structure. On checkout the content is encoded back to the specified +encoding. + +Please note that using the `checkout-encoding` attribute has a number +of drawbacks: + +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause + errors as the conversion might not be round trip safe. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +- Git clients that do not support the `checkout-encoding` attribute or + the used encoding will checkout the respective files as UTF-8 encoded. + That means the content appears to be different which could cause + trouble. Affected clients are older Git versions and alternative Git + implementations such as JGit or libgit2 (as of January 2018). + +Use the `checkout-encoding` attribute only if you cannot store a file in +UTF-8 encoding and if you want Git to be able to process the content as +text. + +Use the following attributes if your '*.txt' files are UTF-16 encoded +with byte order mark (BOM) and you want Git to perform automatic line +ending conversion based on your platform. + + +*.txt text checkout-encoding=UTF-16 + + +Use the following attributes if your '*.txt' files are UTF-16 little +endian encoded without BOM and you want Git to use Windows line endings +in the working directory. + + +*.txt checkout-encoding=UTF-16LE text eol=CRLF + + +You can get a list of all available encodings on your platform with the +following command: + + +iconv --list + + + `ident` ^^^ diff --git a/apply.c b/apply.c index 321a9fa68d..c4bd5cf1f2 100644 --- a/apply.c +++ b/apply.c @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf, 0); return 0; default: return -1; diff --git a/blame.c b/blame.c index 2893f3c103..388b66897b 100644 --- a/blame.c +++ b/blame.c @@ -229,7 +229,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt, if (strbuf_read(, 0, 0) < 0) die_errno("failed to read from stdin"); } - convert_to_git(_index, path, buf.buf, buf.len, , 0); + convert_to_git(_index, path, buf.buf, buf.len, , 0, 0); origin->file.ptr = buf.buf; origin->file.size = buf.len; pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_oid.hash); diff --git a/combine-diff.c b/combine-diff.c index 2505de119a..4555e49b5f 100644 --- a/combine-diff.c +++
[PATCH v2 3/5] utf8: add function to detect a missing UTF-16/32 BOM
From: Lars SchneiderIf the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The has_missing_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if there in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used in HTML5 recommends to assume little-endian to "deal with deployed content" [3]. Strictly requiring a BOM seems to be the safest option for content in Git. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#gen6 [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf Section 3.10, D98, page 132 [3] https://encoding.spec.whatwg.org/#utf-16le Signed-off-by: Lars Schneider --- utf8.c | 13 + utf8.h | 16 2 files changed, 29 insertions(+) diff --git a/utf8.c b/utf8.c index 776660ee12..1978d6c42a 100644 --- a/utf8.c +++ b/utf8.c @@ -562,6 +562,19 @@ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) ); } +int has_missing_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + !strcmp(enc, "UTF-16") && + !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || +has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + !strcmp(enc, "UTF-32") && + !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || +has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 4711429af9..26b5e91852 100644 --- a/utf8.h +++ b/utf8.h @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid */ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); +/* + * If the endianness is not defined in the encoding name, then we + * require a BOM. The function returns true if a required BOM is missing. + * + * The Unicode standard instructs to assume big-endian if there + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG + * encoding standard used in HTML5 recommends to assume + * little-endian to "deal with deployed content" [3]. + * + * [1] http://unicode.org/faq/utf_bom.html#gen6 + * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf + * Section 3.10, D98, page 132 + * [3] https://encoding.spec.whatwg.org/#utf-16le + */ +int has_missing_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.15.1
[PATCH v2 0/5] convert: add support for different encodings
From: Lars SchneiderHi, notable changes since v1: * In [1] Peff described a situation where you "couldn't checkout _away_ from" problems because of "die() in convert_to_git()". I fixed this and tried to replicate the situation with the test "error if encoding garbage is already in Git". To achieve that I had to pass the 'write_obj' variable through all the functions to let the encoding code know if Git actually writes content. If Git actually writes content then we die. Otherwise we just print an error. This ensures no garbage is added to Git. * I renamed the attribute to "checkout-encoding" to avoid trouble with the existing "encoding" attribute. I also considered "working-tree-encoding". We also discussed "worktree-encoding" as attribute name but I am was worried that people could (wrongly) associated that with the 'git worktree' feature. I am happy to hear arguments for/against any of the different options. * I read a bit on the Internet and as far as I understand it to/from UTF-8 reencodings are mostly round trip safe. Notable exceptions are some characters in the SHIFT-JIS character set: https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode However, I was not able to replicate the issue. Consider this: X="\x87\x90" printf "$X" | od -h printf "$X" | iconv -f SHIFT-JIS -t UTF-8 | iconv -f UTF-8 -t SHIFT-JIS | od -h ... the reencoding from SHIFT-JIS to UTF-8 would always fail. Therefore, I was not able to generate a test case that produces a different result after the round trip. Anyone an idea how to do that? * Patch 1, 2, 3, and 5 are helper functions. Patch 4 is the real change. Thanks, Lars RFC: ttps://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.comh v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/ Base Ref: master Web-Diff: https://github.com/larsxschneider/git/commit/e11e375fdb Checkout: git fetch https://github.com/larsxschneider/git encoding-v2 && git checkout e11e375fdb ### Interdiff (v1..v2): diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 84de2fa49c..0039bd38c3 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -146,33 +146,6 @@ Unspecified:: Any other value causes Git to act as if `text` has been left unspecified. -`encoding` -^^ - -By default Git assumes UTF-8 encoding for text files. The `encoding` -attribute sets the encoding to be used in the working directory. -If the path is added to the index, then Git encodes the content to -UTF-8. On checkout the content is encoded back to the original -encoding. Consequently, you can use all built-in Git text processing -tools (e.g. git diff, line ending conversions, etc.) with your -non-UTF-8 encoded file. - -Please note that re-encoding content can cause errors and requires -resources. Use the `encoding` attribute only if you cannot store -a file in UTF-8 encoding and if you want Git to be able to process -the text. - - -*.txt text encoding=UTF-16 - - -All `iconv` encodings with a stable round-trip conversion to and from -UTF-8 are supported. You can see a full list with the following command: - - -iconv --list - - `eol` ^ @@ -299,6 +272,65 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`checkout-encoding` +^^^ + +Git recognizes files encoded with ASCII or one of its supersets (e.g. +UTF-8 or ISO-8859-1) as text files. All other encodings are usually +interpreted as binary and consequently built-in Git text processing +tools (e.g. 'git diff') as well as most Git web front ends do not +visualize the content. + +In these cases you can teach Git the encoding of a file in the working +directory with the `checkout-encoding` attribute. If a file with this +attributes is added to Git, then Git reencodes the content from the +specified encoding to UTF-8 and stores the result in its internal data +structure. On checkout the content is encoded back to the specified +encoding. + +Please note that using the `checkout-encoding` attribute has a number +of drawbacks: + +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause + errors as the conversion might not be round trip safe. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +- Git clients that do not support the `checkout-encoding` attribute or + the used encoding will checkout the respective files as UTF-8 encoded. + That means the content appears to be different which could cause + trouble. Affected clients are older Git versions and alternative Git + implementations such as JGit or libgit2 (as of January 2018). + +Use the `checkout-encoding`
[PATCH] git-archive: accept --owner and --group like GNU tar
The ownership of files created by git-archive is always root:root. Add --owner and --group options which work like the GNU tar equivalent to allow overriding these defaults. Signed-off-by: suzuki toshiya--- Documentation/git-archive.txt | 13 +++ archive-tar.c | 8 +- archive.c | 224 ++ archive.h | 4 + t/t5005-archive-uid-gid.sh| 140 ++ t/t5005/parse-tar-file.py | 60 +++ tar.h | 2 + 7 files changed, 447 insertions(+), 4 deletions(-) create mode 100755 t/t5005-archive-uid-gid.sh create mode 100755 t/t5005/parse-tar-file.py diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index cfa1e4ebe..0d156f6c1 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -11,6 +11,7 @@ SYNOPSIS [verse] 'git archive' [--format=] [--list] [--prefix=/] [] [-o | --output=] [--worktree-attributes] + [--owner [username[:uid]] [--group [groupname[:gid]] [--remote= [--exec=]] [...] @@ -63,6 +64,18 @@ OPTIONS This can be any options that the archiver backend understands. See next section. +--owner=[:]:: + Force as owner and as uid for the files in the tar + archive. If is not supplied, can be either a user + name or numeric UID. In this case the missing part (UID or + name) will be inferred from the current host's user database. + +--group=[:]:: + Force as group and as gid for the files in the tar + archive. If is not supplied, can be either a group + name or numeric GID. In this case the missing part (GID or + name) will be inferred from the current host's group database. + --remote=:: Instead of making a tar archive from the local repository, retrieve a tar archive from a remote repository. Note that the diff --git a/archive-tar.c b/archive-tar.c index c6ed96ee7..ca6471870 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -204,10 +204,10 @@ static void prepare_header(struct archiver_args *args, xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); - xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); - xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); - strlcpy(header->uname, "root", sizeof(header->uname)); - strlcpy(header->gname, "root", sizeof(header->gname)); + xsnprintf(header->uid, sizeof(header->uid), "%07lo", args->uid); + xsnprintf(header->gid, sizeof(header->gid), "%07lo", args->gid); + strlcpy(header->uname, args->uname, sizeof(header->uname)); + strlcpy(header->gname, args->gname, sizeof(header->gname)); xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0); xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0); diff --git a/archive.c b/archive.c index 0b7b62af0..aa4b16b75 100644 --- a/archive.c +++ b/archive.c @@ -8,6 +8,7 @@ #include "parse-options.h" #include "unpack-trees.h" #include "dir.h" +#include "tar.h" static char const * const archive_usage[] = { N_("git archive [] [...]"), @@ -417,6 +418,223 @@ static void parse_treeish_arg(const char **argv, { OPTION_SET_INT, (s), NULL, (v), NULL, "", \ PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) } +/* + * GNU tar --owner, --group options reject hexdigit, signed int values. + * strtol(), atoi() are too permissive to simulate the behaviour. + */ +#define STR_IS_DIGIT_OK 0 +#define STR_IS_NOT_DIGIT -1 +#define STR_IS_DIGIT_TOO_LARGE -2 + +static int try_as_simple_digit(const char *s, unsigned long *dst) +{ + unsigned long ul; + char *endptr; + + if (strlen(s) != strspn(s, "0123456789")) + return STR_IS_NOT_DIGIT; + + errno = 0; + ul = strtoul(s, , 10); + + /* catch ERANGE */ + if (errno) { + errno = 0; + return STR_IS_DIGIT_TOO_LARGE; + } + +#if ULONG_MAX > 0xUL + /* +* --owner, --group rejects uid/gid greater than 32-bit +* limits, even on 64-bit platforms. +*/ + if (ul > 0xUL) + return STR_IS_DIGIT_TOO_LARGE; +#endif + + if (dst) + *dst = ul; + return STR_IS_DIGIT_OK; +} + +static const char *skip_leading_colon(const char *s) +{ + const char *col_pos; + + col_pos = strchr(s, ':'); + if (!col_pos) + return s; + + return (col_pos + 1); +} + +#define STR_IS_NAME_COLON_DIGIT 0 +#define STR_HAS_NO_COLON -1 +#define STR_HAS_DIGIT_BROKEN -2 +#define STR_HAS_DIGIT_TOO_LARGE -3 + +static int try_as_name_colon_digit(const char *s, const char **dst_s, + unsigned
Re: Rewrite cat-file.c : need help to find a bug
2017-12-29 16:22 GMT+03:00 Jeff King: > On Fri, Dec 29, 2017 at 01:05:50PM +0300, Оля Тележная wrote: > >> Hi everyone, >> I am trying to reuse formatting logic from ref-filter in cat-file >> command. Now cat-file uses its own formatting code. >> I am trying to achieve that step-by-step, now I want to invoke >> populate_value function, and I have a bug somewhere. >> My code is here. >> https://github.com/telezhnaya/git/commits/catfile >> All commits that starts from "cat-file: " are successfully passing all tests. >> So for now my last commit fails, and I am tired of searching for the >> error. Actually, I just copied some values to another variable and >> move some code to other place. All "intelligent" work will go further, >> but now I need to fix current situation. > > The problem is that "cat_file_info" is NULL when you get to > populate_value(), so the moved code doesn't trigger. > Thanks a lot!
Re: [PATCH] git-archive: accepts --owner --group aslike GNU tar.
Dear Junio, Ævar Thank you very much for your reviews, in spite of my many overlooking of the requirements written in the documents. To classify various cases, I modified my patch heavily. It would be posted soon. I found that some Python scripts are included in the git repository, but nothing in t/ directories. A helper written in Python is not welcomed? If so, I would rewrite the helper in C (perl's standard library does not have a parse of tarfile). So, please do not review parse-tar-file.py. Regards, mpsuzuki Junio C Hamano wrote: suzuki toshiyawrites: Current tar output by git-archive has always root:root. Thanks for a patch. On top of Ævar's comments... ... * t/t5005-archive-uid-gid.sh: a test script comparing uid, gid, uname, gname between the options and generated tar file. --- Before the "---" line, we need to get the patch signed off by the author (see Documentation/SubmittingPatches). diff --git a/archive-tar.c b/archive-tar.c index c6ed96ee7..8546a6229 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -204,10 +204,10 @@ static void prepare_header(struct archiver_args *args, xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); - xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); - xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); - strlcpy(header->uname, "root", sizeof(header->uname)); - strlcpy(header->gname, "root", sizeof(header->gname)); + xsnprintf(header->uid, sizeof(header->uid), "%07o", args->uid); + xsnprintf(header->gid, sizeof(header->gid), "%07o", args->gid); + strlcpy(header->uname, args->uname ? args->uname : "root", sizeof(header->uname)); + strlcpy(header->gname, args->gname ? args->gname : "root", sizeof(header->gname)); Would it be cleaner to make sure aregs->[gu]name is always set (i.e. stuff "root" when it is not given)? xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0); xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0); diff --git a/archive.c b/archive.c index 0b7b62af0..db69041f1 100644 --- a/archive.c +++ b/archive.c @@ -8,6 +8,7 @@ #include "parse-options.h" #include "unpack-trees.h" #include "dir.h" +#include "git-compat-util.h" The coding guideline says that "git-compat-util.h" (or one of the well-known header that includes it) should be the first file to be included, and we already include "cache.h" as the first thing, so I do not think you want this addition here. @@ -417,6 +418,57 @@ static void parse_treeish_arg(const char **argv, { OPTION_SET_INT, (s), NULL, (v), NULL, "", \ PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) } +static void set_args_uname_uid(struct archiver_args *args, + const char* tar_owner, int set_gid_too) The asterisk sticks to the variable name, not the type name, i.e. const char *tar_owner +{ + if (!args || !tar_owner) + return; + + const char* col_pos = strchr(tar_owner, ':'); + struct passwd* pw = NULL; Decl after statement. + if (col_pos) { + args->uname = xstrndup(tar_owner, col_pos - tar_owner); + args->uid = atoi(col_pos + 1); + return; + } + + args->uname = xstrndup(tar_owner, strlen(tar_owner)); + pw = getpwnam(tar_owner); + if (!pw) + return; This means that upon error, the caller gets a half-filled args structure and without any indication. diff --git a/t/parse-tar-file.py b/t/parse-tar-file.py Hmph. Do we still use Python around here?
Re: [PATCH v1] convert: add support for 'encoding' attribute
> On 29 Dec 2017, at 13:59, Torsten Bögershausenwrote: > > On 2017-12-28 17:14, Lars Schneider wrote:> >>> On 17 Dec 2017, at 18:14, Torsten Bögershausen wrote: >>> >>> On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote: From: Lars Schneider >> +`encoding` +^^ + +By default Git assumes UTF-8 encoding for text files. The `encoding` >>> >>> This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8. >> >> I am not sure I understand what you want to tell me here. >> Do you think UTF-8 encoding is not correct in the sentence above? > > Git itself was designed to handle source code in ASCII. > Text files which are encoded in ISO-8859-1, UTF-8 or any > super-set of ASCII are handled as well, and what exactly to do > with bytes >0x80 is outside the responsibility of Git. > E.g. the interpretation and rendering on the screen may be > dependent on the locale or being guessed. > All in all, saying that Git expects UTF-8 may be "overdriven". > Any kind of file that uses an '\n' as an end of line > and has no NUL bytes '\0' works as a text file in Git. > (End of BlaBla ;-) > > We can probably replace: > "By default Git assumes UTF-8 encoding for text files" > > with something like > "Git handles UTF-8 encoded files as text files, but UTF-16 encoded > files are handled as binary files" Well, UTF-32 are handled as binary file, too :-) How about this? Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other... > >>> +attribute sets the encoding to be used in the working directory. +If the path is added to the index, then Git encodes the content to +UTF-8. On checkout the content is encoded back to the original +encoding. Consequently, you can use all built-in Git text processing +tools (e.g. git diff, line ending conversions, etc.) with your +non-UTF-8 encoded file. + +Please note that re-encoding content can cause errors and requires +resources. Use the `encoding` attribute only if you cannot store +a file in UTF-8 encoding and if you want Git to be able to process +the text. + + +*.txt text encoding=UTF-16 + >>> >>> I think that encoding (or worktree-encoding as said elsewhere) must imply >>> text. >>> (That is not done in convert.c, but assuming binary or even auto >>> does not make much sense to me) >> >> "text" only means "LF". "-text" means CRLF which would be perfectly fine >> for UTF-16. Therefore I don't think we need to imply text. >> Does this make sense? > Yes and now. > > "text" means convert CRLF at "git add" (or commit) into LF, > And potentially LF into CRLF at checkout, > depending on the EOL attribute (if set), core.autocrlf and/or core.eol. > > "-text" means don't touch CRLF or LF at all. Files with CRLF are commited > with CRLF. > Running a Unix like "diff" tool will often show "^M" to indicate that there > is a CR before the LF, which may be distracting or confusing. > > If someone ever wants to handle the UTF-16 files on a Linux/Mac/Unix system, > it makes sense to convert the line endings into LF before storing them > into the index (at least this is my experience). > > (Not specifying "text" or "-text" at all will trigger the auto-handling, > which is not needed at all). > So if we want to be sure that line endings are CRLF in the worktree we > should ask the user to specify things like this: > > *.ext worktree-encoding=UTF-16 text eol=CRLF > > It may be enough to give this example in the documentation. > and I would rather be over-careful here, to avoid future problems. Agreed. I'll add that example! + +All `iconv` encodings with a stable round-trip conversion to and from +UTF-8 are supported. You can see a full list with the following command: + + +iconv --list + + `eol` ^ diff --git a/convert.c b/convert.c index 20d7ab67bd..ee19c17104 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } >>> >>> I would avoid to use these #ifdefs here. >>> All functions can be in strbuf.c (and may have #ifdefs there), but not here. >> >> I'll try that. But wouldn't it make more sense to move the functions to >> utf.c? > > May be. > Originally utf8.c was about encoding and all kind of UTF-8 related stuff. > Especially it didn't know anything about strbuf. > I don't know why strbuf.h and other functions
Re: [PATCH] Add shell completion for git remote rm
Keith Smiley wrote: > It looks like that was just about preferring remove in documentation > and the like, I think it would still make sense to have both for > completion since rm is still supported. I read it as a first step in a long process to eventually remove 'remote rm', but if that's never intended, then sure, restoring completion for it seems reasonable. It would be good to hear from those who know or recall the intention. I think we should only complete the preferred subcommand. That encourages use of 'remote remove' even if 'remote rm' will stay forever to avoid breaking existing scripts. If it does make sense to restore completion, adding a link back to e17dba8fe1 and explaining why the completion is being restored would be good. Reading the commit message now makes it sound like 'remote rm' was never present and is being added to correct an oversight. Maybe something like: completion: restore 'remote rm' e17dba8fe1 ("remote: prefer subcommand name 'remove' to 'rm'", 2012-09-06) removed the 'rm' subcommand from completion. The 'remote rm' subcommand is still supported and not planned to be removed. Offer completions for it. I also noticed that in your original commit that you say "list of removes as remove does." That should be "remotes" instead of "removes" there. -- I've made that typo myself quite often. :) -- Todd ~~ There are no stupid questions, but there are a LOT of inquisitive idiots. -- Demotivators (www.despair.com)
[PATCH/RFC 0/2] git diff --UTF-8
From: Torsten BögershausenRFC patch: convert files from e.g. UTF-16 into UTF-8 while running "git diff". The diff must be called with "git diff --UTF-8" and the "encoding" attribute must be set for the file(s). The commit messages may need some improvements, and a closer look at diff.c, how command line options are forwared, is appreciated. It may even be possible to integrate t4066 somewhere... Torsten Bögershausen (2): convert_to_git(): checksafe becomes an integer git diff: Allow to reencode into UTF-8 Documentation/diff-options.txt | 4 ++ Documentation/gitattributes.txt | 9 + apply.c | 4 +- convert.c | 60 +++- convert.h | 20 +- diff.c | 40 +-- diff.h | 1 + diffcore.h | 3 ++ environment.c | 2 +- sha1_file.c | 6 +-- t/t4066-diff-encoding.sh| 86 + 11 files changed, 205 insertions(+), 30 deletions(-) create mode 100755 t/t4066-diff-encoding.sh -- 2.15.1.271.g1a4e40aa5d
[PATCH/RFC 2/2] git diff: Allow to reencode into UTF-8
From: Torsten BögershausenWhen blobs are encoded in UTF-16, `git diff` will treat them as binary. Make it possible to show a user readable diff encoded in UTF-8. This allows to run git diff and feed the into a web sever. Improve Git to look at the "encodig" attribute and to reencode the content into UTF-8 before running the diff itself. Signed-off-by: Torsten Bögershausen --- Documentation/diff-options.txt | 4 ++ Documentation/gitattributes.txt | 9 + convert.c | 40 +++ convert.h | 2 + diff.c | 38 -- diff.h | 1 + diffcore.h | 3 ++ t/t4066-diff-encoding.sh| 86 + 8 files changed, 180 insertions(+), 3 deletions(-) create mode 100755 t/t4066-diff-encoding.sh diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 9d1586b956..bf2f115f11 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -629,6 +629,10 @@ endif::git-format-patch[] linkgit:git-log[1], but not for linkgit:git-format-patch[1] or diff plumbing commands. +--UTF-8:: + Git converts the content into UTF-8 before running the diff when the + "encoding" attribute is defined. See linkgit:gitattributes[5] + --ignore-submodules[=]:: Ignore changes to submodules in the diff generation. can be either "none", "untracked", "dirty" or "all", which is the default. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81a..753a7c39b7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -881,6 +881,15 @@ advantages to choosing this method: 3. Caching. Textconv caching can speed up repeated diffs, such as those you might trigger by running `git log -p`. +Running diff on UTF-16 encoded files + + +Git can convert UTF-16 encoded into UTF-8 before they are feed +into the diff machinery: `diff --UTF-8 file.xxx`. + + +file.xxx encoding=UTF-16 + Marking files as binary ^^^ diff --git a/convert.c b/convert.c index 5efcc3b73b..45577ce504 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -734,6 +735,34 @@ static struct convert_driver { int required; } *user_convert, **user_convert_tail; +const char *get_encoding_attr(const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("encoding", NULL); + if (!git_check_attr(path, check)) { + struct attr_check_item *ccheck = check->items; + const char *value; + value = ccheck->value; + if (ATTR_UNSET(value)) + return NULL; + return value; + } + return NULL; +} + +static int reencode_into_strbuf(const char *path, const char *src, size_t len, + struct strbuf *dst, const char *encoding) +{ + int outsz = 0; + char *buf; + buf = reencode_string_len(src, (int)len, "UTF-8", encoding, ); + if (!buf) + return 0; + strbuf_attach(dst, buf, outsz, outsz); + return SAFE_CRLF_REENCODE; +} + static int apply_filter(const char *path, const char *src, size_t len, int fd, struct strbuf *dst, struct convert_driver *drv, const unsigned int wanted_capability, @@ -1136,6 +1165,17 @@ int convert_to_git(const struct index_state *istate, convert_attrs(, path); + if (checksafe & SAFE_CRLF_REENCODE) { + const char *encoding = get_encoding_attr(path); + if (encoding) { + ret |= reencode_into_strbuf(path, src, len, dst, + encoding); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } + } + } ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL); if (!ret && ca.drv && ca.drv->required) die("%s: clean filter '%s' failed", path, ca.drv->name); diff --git a/convert.h b/convert.h index 532af00423..0b093715c9 100644 --- a/convert.h +++ b/convert.h @@ -13,6 +13,7 @@ struct index_state; #define SAFE_CRLF_WARN(1<<1) #define SAFE_CRLF_RENORMALIZE (1<<2) #define SAFE_CRLF_KEEP_CRLF (1<<3) +#define SAFE_CRLF_REENCODE(1<<4) extern int safe_crlf; @@ -60,6 +61,7 @@ extern const char *get_cached_convert_stats_ascii(const struct index_state *ista
[PATCH/RFC 1/2] convert_to_git(): checksafe becomes an integer
From: Torsten BögershausenWhen calling convert_to_git(), the checksafe parameter has been used to check if commit would give a non-roundtrip conversion of EOL. When checksafe was introduced, 3 values had been in use: SAFE_CRLF_FALSE: no warning SAFE_CRLF_FAIL: reject the commit if EOL do not roundtrip SAFE_CRLF_WARN: warn the user if EOL do not roundtrip Today a small flaw is found in the code base: An integer with the value 0 is passed as the parameter checksafe instead of the correct enum value SAFE_CRLF_FALSE. In the next commit there is a need to turn checksafe into a bitmap, which allows to tell convert_to_git() to obey the encoding attribute or not. Signed-off-by: Torsten Bögershausen --- apply.c | 4 ++-- convert.c | 20 ++-- convert.h | 18 -- diff.c| 4 ++-- environment.c | 2 +- sha1_file.c | 6 +++--- 6 files changed, 26 insertions(+), 28 deletions(-) diff --git a/apply.c b/apply.c index 321a9fa68d..a422516062 100644 --- a/apply.c +++ b/apply.c @@ -2263,7 +2263,7 @@ static void show_stats(struct apply_state *state, struct patch *patch) static int read_old_data(struct stat *st, struct patch *patch, const char *path, struct strbuf *buf) { - enum safe_crlf safe_crlf = patch->crlf_in_old ? + int checksafe = patch->crlf_in_old ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; switch (st->st_mode & S_IFMT) { case S_IFLNK: @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, checksafe); return 0; default: return -1; diff --git a/convert.c b/convert.c index 1a41a48e15..5efcc3b73b 100644 --- a/convert.c +++ b/convert.c @@ -195,13 +195,13 @@ static enum eol output_eol(enum crlf_action crlf_action) static void check_safe_crlf(const char *path, enum crlf_action crlf_action, struct text_stat *old_stats, struct text_stat *new_stats, - enum safe_crlf checksafe) + int checksafe) { if (old_stats->crlf && !new_stats->crlf ) { /* * CRLFs would not be restored by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("CRLF will be replaced by LF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -211,7 +211,7 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action, /* * CRLFs would be added by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("LF will be replaced by CRLF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -268,7 +268,7 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, static int crlf_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *buf, - enum crlf_action crlf_action, enum safe_crlf checksafe) + enum crlf_action crlf_action, int checksafe) { struct text_stat stats; char *dst; @@ -298,12 +298,12 @@ static int crlf_to_git(const struct index_state *istate, * unless we want to renormalize in a merge or * cherry-pick. */ - if ((checksafe != SAFE_CRLF_RENORMALIZE) && + if ((!(checksafe & SAFE_CRLF_RENORMALIZE)) && has_crlf_in_index(istate, path)) convert_crlf_into_lf = 0; } - if ((checksafe == SAFE_CRLF_WARN || - (checksafe == SAFE_CRLF_FAIL)) && len) { + if (((checksafe & SAFE_CRLF_WARN) || +((checksafe & SAFE_CRLF_FAIL) && len))) { struct text_stat new_stats; memcpy(_stats, , sizeof(new_stats)); /* simulate "git add" */ @@ -1129,7 +1129,7 @@ const char *get_convert_attr_ascii(const char *path) int convert_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, - struct strbuf *dst, enum safe_crlf checksafe) + struct strbuf *dst, int checksafe) { int ret = 0; struct conv_attrs ca; @@
Re: Rewrite cat-file.c : need help to find a bug
On Fri, Dec 29, 2017 at 01:05:50PM +0300, Оля Тележная wrote: > Hi everyone, > I am trying to reuse formatting logic from ref-filter in cat-file > command. Now cat-file uses its own formatting code. > I am trying to achieve that step-by-step, now I want to invoke > populate_value function, and I have a bug somewhere. > My code is here. > https://github.com/telezhnaya/git/commits/catfile > All commits that starts from "cat-file: " are successfully passing all tests. > So for now my last commit fails, and I am tired of searching for the > error. Actually, I just copied some values to another variable and > move some code to other place. All "intelligent" work will go further, > but now I need to fix current situation. The problem is that "cat_file_info" is NULL when you get to populate_value(), so the moved code doesn't trigger. That variable is usually pulled from format->cat_file_data during verify_ref_format(). But it triggers only when there's an actual format atom to parse, and in this particular test the format is empty! But cat-file is somewhat special here, in that even without any atoms it needs to make sure the object exists. So the patch below makes the test pass, though I think in the long run all of this cat_file_info stuff would just get folded into regular atoms, and you'd want cat-file to pass a special flag to ask ref-filter to make sure the object exists. -Peff --- diff --git a/ref-filter.c b/ref-filter.c index 62ca83807f..3acea4d2ef 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -742,6 +742,10 @@ int verify_ref_format(struct ref_format *format) { const char *cp, *sp; + /* do this unconditionally, even if we have no atoms! */ + if (format->cat_file_data) + cat_file_info = format->cat_file_data; + format->need_color_reset_at_eol = 0; for (cp = format->format; *cp && (sp = find_next(cp)); ) { const char *color, *ep = strchr(sp, ')'); @@ -753,7 +757,6 @@ int verify_ref_format(struct ref_format *format) if (format->cat_file_data) { - cat_file_info = format->cat_file_data; at = parse_ref_filter_atom(format, valid_cat_file_atoms, ARRAY_SIZE(valid_cat_file_atoms), sp + 2, ep); } else
Re: [PATCH v1] convert: add support for 'encoding' attribute
On 2017-12-28 17:14, Lars Schneider wrote:> >> On 17 Dec 2017, at 18:14, Torsten Bögershausenwrote: >> >> On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote: >>> From: Lars Schneider >>> > >>> +`encoding` >>> +^^ >>> + >>> +By default Git assumes UTF-8 encoding for text files. The `encoding` >> >> This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8. > > I am not sure I understand what you want to tell me here. > Do you think UTF-8 encoding is not correct in the sentence above? Git itself was designed to handle source code in ASCII. Text files which are encoded in ISO-8859-1, UTF-8 or any super-set of ASCII are handled as well, and what exactly to do with bytes >0x80 is outside the responsibility of Git. E.g. the interpretation and rendering on the screen may be dependent on the locale or being guessed. All in all, saying that Git expects UTF-8 may be "overdriven". Any kind of file that uses an '\n' as an end of line and has no NUL bytes '\0' works as a text file in Git. (End of BlaBla ;-) We can probably replace: "By default Git assumes UTF-8 encoding for text files" with something like "Git handles UTF-8 encoded files as text files, but UTF-16 encoded files are handled as binary files" >> >>> +attribute sets the encoding to be used in the working directory. >>> +If the path is added to the index, then Git encodes the content to >>> +UTF-8. On checkout the content is encoded back to the original >>> +encoding. Consequently, you can use all built-in Git text processing >>> +tools (e.g. git diff, line ending conversions, etc.) with your >>> +non-UTF-8 encoded file. >>> + >>> +Please note that re-encoding content can cause errors and requires >>> +resources. Use the `encoding` attribute only if you cannot store >>> +a file in UTF-8 encoding and if you want Git to be able to process >>> +the text. >>> + >>> + >>> +*.txt text encoding=UTF-16 >>> + >> >> I think that encoding (or worktree-encoding as said elsewhere) must imply >> text. >> (That is not done in convert.c, but assuming binary or even auto >> does not make much sense to me) > > "text" only means "LF". "-text" means CRLF which would be perfectly fine > for UTF-16. Therefore I don't think we need to imply text. > Does this make sense? Yes and now. "text" means convert CRLF at "git add" (or commit) into LF, And potentially LF into CRLF at checkout, depending on the EOL attribute (if set), core.autocrlf and/or core.eol. "-text" means don't touch CRLF or LF at all. Files with CRLF are commited with CRLF. Running a Unix like "diff" tool will often show "^M" to indicate that there is a CR before the LF, which may be distracting or confusing. If someone ever wants to handle the UTF-16 files on a Linux/Mac/Unix system, it makes sense to convert the line endings into LF before storing them into the index (at least this is my experience). (Not specifying "text" or "-text" at all will trigger the auto-handling, which is not needed at all). So if we want to be sure that line endings are CRLF in the worktree we should ask the user to specify things like this: *.ext worktree-encoding=UTF-16 text eol=CRLF It may be enough to give this example in the documentation. and I would rather be over-careful here, to avoid future problems. > >> >> >>> + >>> +All `iconv` encodings with a stable round-trip conversion to and from >>> +UTF-8 are supported. You can see a full list with the following command: >>> + >>> + >>> +iconv --list >>> + >>> + >>> `eol` >>> ^ >>> >>> diff --git a/convert.c b/convert.c >>> index 20d7ab67bd..ee19c17104 100644 >>> --- a/convert.c >>> +++ b/convert.c >>> @@ -7,6 +7,7 @@ >>> #include "sigchain.h" >>> #include "pkt-line.h" >>> #include "sub-process.h" >>> +#include "utf8.h" >>> >>> /* >>> * convert.c - convert a file when checking it out and checking it in. >>> @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct >>> text_stat *stats, >>> >>> } >> >> I would avoid to use these #ifdefs here. >> All functions can be in strbuf.c (and may have #ifdefs there), but not here. > > I'll try that. But wouldn't it make more sense to move the functions to utf.c? May be. Originally utf8.c was about encoding and all kind of UTF-8 related stuff. Especially it didn't know anything about strbuf. I don't know why strbuf.h and other functions had been added here, I once moved them into strbuf.c without any problems, but never send out a patch, because of possible merge conflicts in ongoing patches. In any case, if it is about strbuf, I would try to put it into strbuf.c >> >>> >>> +#ifdef NO_ICONV >>> +#ifndef _ICONV_T >>> +/* The type is just a placeholder and not actually used. */ >>> +typedef void* iconv_t; >>> +#endif >>> +#endif >>> + >>> +static struct encoding { >>> + const char *name; >>> + iconv_t
Rewrite cat-file.c : need help to find a bug
Hi everyone, I am trying to reuse formatting logic from ref-filter in cat-file command. Now cat-file uses its own formatting code. I am trying to achieve that step-by-step, now I want to invoke populate_value function, and I have a bug somewhere. My code is here. https://github.com/telezhnaya/git/commits/catfile All commits that starts from "cat-file: " are successfully passing all tests. So for now my last commit fails, and I am tired of searching for the error. Actually, I just copied some values to another variable and move some code to other place. All "intelligent" work will go further, but now I need to fix current situation. You could write here or leave comments in github if you have any ideas. Thank you in advance for your help! Olga
Segfault
git stash pop resulted in crash: /usr/local/Cellar/git/2.10.2/libexec/git-core/git-stash: line 470: 14946 Segmentation fault: 11 git merge-recursive $b_tree -- $c_tree $w_tree although, the changes have been applied successfully.