Re: What's cooking in git.git (Aug 2018, #01; Thu, 2)

2018-08-07 Thread Jakub Narębski
On Tue, 7 Aug 2018 at 16:36, Junio C Hamano  wrote:
> Jakub Narebski  writes:
>> Junio C Hamano  writes:
>>
>>> * ds/commit-graph-with-grafts (2018-07-19) 8 commits
>>>   (merged to 'next' on 2018-08-02 at 0ee624e329)
>>> ...
>>>  Will merge to 'master'.
>>
>> Derrick wrote that he will be sending v2 of this patch series in a few
>> weeks, among others to make it use commit-graph feature if replace
>> objects are present but not used (as some git hosting services do, see
>> core.useReplaceRefs below).
>
> Ah, thanks for stopping me (albeit a bit too late-ish, but reverting
> that merge is easy enough).  Do you have a handy reference to stop
> me from making the same mistake on this topic later?

https://public-inbox.org/git/a3640919-95cf-cca4-d552-4715a031d...@gmail.com/

DS> Since this series now has two dependencies (including Stefan's ref-store
DS> fix that I had included in my v1), I'll let those topics settle
DS> before I send a new v2.
DS>
DS> If there are more comments about how I'm handling these cases, then
DS> please jump in and tell me. I'll revisit this topic in a few weeks.

>> Also, the test for interaction of commit-graph with the grafts file
>> feature does not actually test grafts, but replace objects.

https://public-inbox.org/git/86bmap7l7a@gmail.com/

>>> * jk/core-use-replace-refs (2018-07-18) 3 commits
>>>   (merged to 'next' on 2018-08-02 at 90fb6b1056)
>>>  + add core.usereplacerefs config option
>>>  + check_replace_refs: rename to read_replace_refs
>>>  + check_replace_refs: fix outdated comment
>>>
>>>  A new configuration variable core.usereplacerefs has been added,
>>>  primarily to help server installations that want to ignore the
>>>  replace mechanism altogether.
>>>
>>>  Will merge to 'master'.
[...]
-- 
Jakub Narębski


Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'

2018-06-08 Thread Jakub Narębski
On Fri, 8 Jun 2018 at 15:56, Derrick Stolee  wrote:

> [..], the following
> diff occurs from the previous patch:
[...]
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index b24e8b6689..9a0661983c 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
>  '
>
>  graph_git_two_modes() {
> -   git -c core.commitGraph=true $1 >output
> -   git -c core.commitGraph=false $1 >expect
> +   git -c core.graph=true $1 >output
> +   git -c core.graph=false $1 >expect
> test_cmp output expect
>  }

It seems that you have accidentally removed the fix from previous version.
It needs to be core.commitGraph, not core.graph.

Best,
-- 
Jakub Narębski


Re: [PATCH v5 09/11] technical/shallow: stop referring to grafts

2018-04-25 Thread Jakub Narębski
On 25 April 2018 at 11:54, Johannes Schindelin
<johannes.schinde...@gmx.de> wrote:
> Now that grafts are deprecated, we should start to assume that readers
> have no idea what grafts are. So it makes more sense to make the
> description of the "shallow" feature stand on its own.
>
> Suggested-by: Eric Sunshine <sunsh...@sunshineco.com>
> Helped-by: Junio Hamano <gits...@pobox.com>
> Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
> ---
>  Documentation/technical/shallow.txt | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/technical/shallow.txt 
> b/Documentation/technical/shallow.txt
> index 5183b154229..4ec721335d2 100644
> --- a/Documentation/technical/shallow.txt
> +++ b/Documentation/technical/shallow.txt
> @@ -8,15 +8,10 @@ repo, and therefore grafts are introduced pretending that
>  these commits have no parents.
>  *
>
> -The basic idea is to write the SHA-1s of shallow commits into
> -$GIT_DIR/shallow, and handle its contents like the contents
> -of $GIT_DIR/info/grafts (with the difference that shallow
> -cannot contain parent information).
> -
> -This information is stored in a new file instead of grafts, or
> -even the config, since the user should not touch that file
> -at all (even throughout development of the shallow clone, it
> -was never manually edited!).
> +$GIT_DIR/shallow lists commit object names and tells Git to
> +pretend as if they are root commits (e.g. "git log" traversal
> +stops after showing them; "git fsck" does not complain saying
> +the commits listed on their "parent" lines do not exist).
>
>  Each line contains exactly one SHA-1. When read, a commit_graft
>  will be constructed, which has nr_parent < 0 to make it easier

Is the removed information (repeated below) important or not?

  the user should not touch that file
  at all (even throughout development of the shallow clone, it
  was never manually edited!).

-- 
Jakub Narębski


Re: [PATCH] completion: reduce overhead of clearing cached --options

2018-04-16 Thread Jakub Narębski
On 16 April 2018 at 15:15, SZEDER Gábor <szeder@gmail.com> wrote:
> On Mon, Apr 16, 2018 at 7:10 AM, Junio C Hamano <gits...@pobox.com> wrote:
> > SZEDER Gábor <szeder@gmail.com> writes:
> >> On Fri, Apr 13, 2018 at 11:44 PM, Jakub Narebski <jna...@gmail.com> wrote:
> >>> SZEDER Gábor <szeder@gmail.com> writes:
> >>>>
> >>>> In Bash we can do better: run the 'compgen -v __gitcomp_builtin_'
> >>>> builtin command, which lists the same variables, but without a
> >>>> pipeline and 'sed' it can do so with lower overhead.
> >>>
> >>> What about ZSH?
> >>
> >> Nothing, ZSH is unaffected by this patch.
> >
> > OK, do we want to follow it up with [PATCH 2/1] to add the LC_ALL=C
> > thing to the ZSH side to help that "sed", then?
>
> No.  'sed' would only need need help when its input comes from a buggy
> 'set' builtin of a particular version of Bash from a particular vendor.
>
> As far as I can test this in Travis CI's OSX builds, ZSH doesn't seem to
> be affected, neither the version Apple ships by default nor the version
> installed via homebrew.

That's nice - this means that the patch fixes all of the issue.
The above information should be, in my opinion, included
in the commit message, though.

Best,
-- 
Jakub Narębski


Re: Tools that do an automatic fetch defeat "git push --force-with-lease"

2017-04-08 Thread Jakub Narębski
W dniu 08.04.2017 o 11:29, Jeff King pisze:
[...]

> Perhaps it would be enough to reset the markers whenever the ref is
> pushed. I haven't thought it through well enough to know whether that
> just hits more corner cases.

I wonder if using a separate remote for pushing (with separate remote-
-tracking branches) would be a good solution for this problem...

-- 
Jakub Narębski

 



Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread Jakub Narębski
W dniu 03.04.2017 o 12:56, SZEDER Gábor pisze:
> Ulrich Windl wrote:

>> In the other case (for the user bored of waiting seeking for some
>> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
>> very least: If git is expecting that some operation will take (or
>> already did take) a lot of time, give some message explaining why it
>> is taking a lot of time, and maybe how to avoid that.
> 
> It already does so by default since v2.8.0, see aba37f495 (blame: add
> support for --[no-]progress option, 2015-12-12).
> 
>   $ time git blame sha1_file.c |wc -l
>   4026
>   
>   real0m1.744s
>   user0m1.672s
>   sys 0m0.068s
>   $ time git blame -C -C sha1_file.c |wc -l
>   Blaming lines: 100% (4026/4026), done.
>   4026
>   
>   real0m3.832s
>   user0m3.716s
>   sys 0m0.112s
> 
> However, after a short peek at that commit, it only displays progress
> by default when stderr is a terminal, which might not be the case when
> invoked from emacs.

Emacs (magit?) should use `git blame --porcelain`, and do its own
progress report, just like 'git gui blame' and incremental blame mode
of gitweb.

Actually... there already is git-blamed - Minor mode for incremental
blame for Git, and mo-git-blame - An interactive, iterative 'git blame'
mode for Emacs, both available on ELPA (Emacs Lisp Package Archive).

HTH
-- 
Jakub Narębski



Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-04-02 Thread Jakub Narębski
W dniu 02.04.2017 o 09:45, Jeff King pisze:
> On Sat, Apr 01, 2017 at 08:31:27PM +0200, Jakub Narębski wrote:
> 
>> W dniu 01.04.2017 o 08:08, Jeff King pisze:
>>> On Fri, Mar 31, 2017 at 03:24:48PM +0200, Jakub Narębski wrote:
>>>
>>>>> I suspect in the normal case that git is doing line-ending conversion,
>>>>> but it's suppressed when textconv is in use.
>>>>
>>>> I would not consider this a bug if not for the fact that there is no ^M
>>>> without using iconv as textconv.
>>>
>>> I don't think it's a bug, though. You have told Git that you will
>>> convert the contents (whatever their format) into the canonical format,
>>> but your program to do so includes a CR.
>>
>> Well, I have not declared file binary with "binary = true" in diff driver
>> definition, isn't it?
> 
> I don't think binary has anything to do with it. A textconv filter takes
> input (binary or not) and delivers a normalized representation to feed
> to the diff algorithm. There's no further post-processing, and it's the
> responsibility of the filter to deliver the bytes it wants diffed.
> 
> Like I said, I could see an argument for treating the filter output as
> text to be pre-processed, but that's not how it works (and I don't think
> it is a good idea to change it now, unless by adding an option to the
> diff filter).

I think that actually there is something wrong.

If textconv really gets normalized representation of pre-image (the index
version) and post-image (the filesystem version), as it should I think,
both pre-image lines ('-') and post-image lines ('+') should use CRLF,
so there should be no warning, i.e. ^M

Or textconv filter gets normalized representation (it looks this way
when examining diff result saved to file with `git diff test.tex >test.diff`;
I were unable to use `tr '\r' 'Q', either I got "fatal: bad config line"
from Git, or "tr: extra operand" from tr), and somehow Git mistakes
what is happening and writes those ^M.

If I understand it correctly, if pre-image, post-image and context
all use the same eol, there should be no warning, isn't it?

> 
>> P.S. What do you think about Git supporting 'encoding' attribute (or
>> 'core.encoding' config) plus 'core.outputEncoding' in-core?
> 
> Supporting an "encoding" attribute to normalize file encodings in diffs
> seems reasonable to me. But it would have to be enabled only for
> human-readable diffs, as the result could not be applied (so the same as
> textconv).

I was thinking about human readable diffs, and 'git show ', same
as with textconv.

> 
> I don't think core.outputEncoding is necessarily a good idea. We are not
> really equipped anything that isn't an ascii superset, as we intermingle
> the bytes with ascii diff headers (though I think that is true of the
> commitEncoding stuff; I assume everything breaks horribly if you tried
> to set that to UTF-16, but I've never tried it).

Well, the understanding would be that the same limitation as for 
core.logOutputEncoding (documented if it isn't) that only encodings that
are ASCII compatibile are supported.
 
-- 
Jakub Narębski



Re: [PATCH] contrib/git-resurrect.sh: do not write \t for HT in sed scripts

2017-04-02 Thread Jakub Narębski
W dniu 01.04.2017 o 06:14, Junio C Hamano pisze:

> Just like we did in 0d1d6e50 ("t/t7003: replace \t with literal tab
> in sed expression", 2010-08-12), avoid writing "\t" for HT in sed
> scripts, which is not portable.

I guess it is not possible to use HT variable (similar to LF that we
have in t/test-lib.sh, and which is defined by some scripts) set to
literal tab

  HT="  "

wouldn't work there, is it?

Using this variable would make literal tab character more visible.
-- 
Jakub Narębski



Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-04-01 Thread Jakub Narębski
W dniu 01.04.2017 o 08:08, Jeff King pisze:
> On Fri, Mar 31, 2017 at 03:24:48PM +0200, Jakub Narębski wrote:
> 
>>> I suspect in the normal case that git is doing line-ending conversion,
>>> but it's suppressed when textconv is in use.
>>
>> I would not consider this a bug if not for the fact that there is no ^M
>> without using iconv as textconv.
> 
> I don't think it's a bug, though. You have told Git that you will
> convert the contents (whatever their format) into the canonical format,
> but your program to do so includes a CR.

Well, I have not declared file binary with "binary = true" in diff driver
definition, isn't it?

> 
> We _could_ further process with other canonicalizations, but I'm not
> sure that is a good idea (line-endings sound reasonably harmless, but
> almost certainly we should not be doing clean/smudge filtering). And I'm
> not sure if there would be any compatibility fallouts.

Yes, gitattributes(5) defines interaction between 'text'/'eol', 'ident'
and 'filter' attributes, but nothing about 'diff' and 'text'/'eol'.

> 
> So I think the behavior is perhaps not what you want, but it's not an
> unreasonable one. And the solution is to define your textconv such that
> it produces clean LF-only output. Perhaps:
> 
>   [diff.whatever]
>   textconv = "iconv ... | tr -d '\r'"

Well, either this (or equivalent using dos2unix), or using 'whitespace'
attribute (or 'core.whitespace') with cr-at-eol.


P.S. What do you think about Git supporting 'encoding' attribute (or
'core.encoding' config) plus 'core.outputEncoding' in-core?

Best,
-- 
Jakub Narębski


Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-03-31 Thread Jakub Narębski
W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
> On 30.03.17 21:35, Jakub Narębski wrote:
>> Hello,
>>
>> Recently I had to work on a project which uses legacy 8-bit encoding
>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>> documents).  My terminal, that is Git Bash from Git for Windows is set
>> up for utf-8.
>>
>> I wanted for "git diff" and friends to return something sane on said
>> utf-8 terminal, instead of mojibake.  There is 'encoding'
>> gitattribute... but it works only for GUI ('git gui', that is).
>>
>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>> file encoding to utf-8 encoding of console.
>>
>> I have set the following in .gitattributes file:
>>
>>   ## LaTeX documents in cp1250 encoding
>>   *.tex text diff=mylatex
>>
>> The 'mylatex' driver is defined as:
>>
>>   [diff "mylatex"]
>> xfuncname = "^(((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>> wordRegex = "[a-zA-Z]+|[{}]|.|[^\\{}[:space:]]+"
>> textconv  = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t 
>> utf-8
>> cachetextconv = true
>>
>> And everything would be all right... if not the fact that Git appends
>> spurious ^M to added lines in the `git diff` output.  Files use CRLF
>> end-of-line convention (the native MS Windows one).
>>
>>   $ git diff test.tex
>>   diff --git a/test.tex b/test.tex
>>   index 029646e..250ab16 100644
>>   --- a/test.tex
>>   +++ b/test.tex
>>   @@ -1,4 +1,4 @@
>>   -\documentclass{article}
>>   +\documentclass{mwart}^M
>>   
>>\usepackage[cp1250]{inputenc}
>>\usepackage{polski}
>>
>> What gives?  Why there is this ^M tacked on the end of added lines,
>> while it is not present in deleted lines, nor in content lines?
>>
>> Puzzled.
>>
>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>> that it doesn't supports in core `encoding` attribute together with
>> having `i18n.outputEncoding`.
>
> Is there a chance to give us a receipt how to reproduce it?
> A complete test script or ?
> (I don't want to speculate, if the invocation of iconv is the problem,
>  where stdout is not in "binary mode", or however this is called under 
> Windows)

I'm sorry, I though I posted whole recipe, but I missed some details
in the above description of the case.

First, files are stored on filesystem using CRLF eol (DOS end-of-line
convention).  Due to `core.autocrlf` they are converted to LF in blobs,
that is in the index and in the repository.

Second, a textconv with filter preserving end-of-line needs to be
configured.  I have used `iconv`, but I suspect that the problem would
happen also for `cat`.

In the .gitattributes file, or .git/info/attributes add, for example:

  *.tex text diff=myconv

In the .git/config configure the textconv filter, for example:

  [diff "myconv"]
 textconv  = iconv.exe -f cp1250 -t utf-8

Create a file which filename matches the attribute line, and which
uses CRLF end of line convention, and add it to Git (adding it to
the index):

  $ printf "foo\r\n" >foo.tex
  $ git add foo.tex

Modify file (also with CRLF):

  $ printf "bar\r\n" >foo.tex

Check the difference

  $ git diff foo.tex

HTH
-- 
Jakub Narębski



Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-03-31 Thread Jakub Narębski
W dniu 30.03.2017 o 22:00, Jeff King pisze:
> On Thu, Mar 30, 2017 at 09:35:27PM +0200, Jakub Narębski wrote:
> 
>> And everything would be all right... if not the fact that Git appends
>> spurious ^M to added lines in the `git diff` output.  Files use CRLF
>> end-of-line convention (the native MS Windows one).
>>
>>   $ git diff test.tex
>>   diff --git a/test.tex b/test.tex
>>   index 029646e..250ab16 100644
>>   --- a/test.tex
>>   +++ b/test.tex
>>   @@ -1,4 +1,4 @@
>>   -\documentclass{article}
>>   +\documentclass{mwart}^M
>>   
>>\usepackage[cp1250]{inputenc}
>>\usepackage{polski}
>>
>> What gives?  Why there is this ^M tacked on the end of added lines,
>> while it is not present in deleted lines, nor in content lines?

Gah, I forgot that Git for Windows installed with default options uses
`core.autocrlf=true`, so file contents is stored in repository and
in the index using LF end-of-line convention -- that is why there is
no ^M in pre-image (in removed lines).
 
> Perhaps it's trailing whitespace highlighting for added lines? You can
> add "cr-at-eol" to core.whitespace to suppress it.

Thanks! That solves the problem (or rather workarounds it).

> 
> I suspect in the normal case that git is doing line-ending conversion,
> but it's suppressed when textconv is in use.

I would not consider this a bug if not for the fact that there is no ^M
without using iconv as textconv.

Compare (without textconv => no ^M, but mojibake):

  $ git diff test.txt
  diff --git a/test.txt b/test.txt
  index 029646e..38cd657 100644
  --- a/test.txt
  +++ b/test.txt
  @@ -1,9 +1,10 @@
  -\documentclass{article}
  +\documentclass{mwart}
  
   \usepackage[cp1250]{inputenc}
   \usepackage{polski}
  
   \begin{document}
  +Za g<9C>l ja<9F>!
  
   \end{document}

with the following (with textconv => no gibberish, but ^M):

  $ git diff test.tex
  diff --git a/test.tex b/test.tex
  index 029646e..38cd657 100644
  --- a/test.tex
  +++ b/test.tex
  @@ -1,9 +1,10 @@
  -\documentclass{article}
  +\documentclass{mwart}^M
  
   \usepackage[cp1250]{inputenc}
   \usepackage{polski}
  
   \begin{document}
  +Zażółć gęślą jaźń!^M
  
   \end{document}

-- 
Jakub Narębski



Re: Git Branching - Best Practices - Large project - long running branches

2017-03-31 Thread Jakub Narębski
W dniu 31.03.2017 o 13:55, Mayne, Joe pisze:

> Hello, I work on a team of 15+ developers. We are trying to
> determine best practices for branching because we have had code
> stepped on when a developer has a long running feature branch.
> 
> We have a Development branch. Developers are instructed to create a 
> branch when they begin working on a feature. Sometimes a feature may 
> take a week or two to complete. So a Developer1 creates a branch and 
> works for a week or two. In the meantime, other developers have 
> created feature branches from Development and merged them back into 
> Development.
> 
> At this point we are not certain if Developer1 should:
> 
> * Periodically merge the evolving Origin/Development into their
> Feature branch and when they are done work merge their feature branch
> into Origin/Development.

This is one possible solution.

Another variant of this would be to enable rerere feature (reuse
recorded resolution), and periodically do a trial merge from the
Origin/Development branch to prime the rerere mechanism (discarding
those merges after resolving them).

> 
> OR
> 
> * Stay on their pure feature branch and when they are done merge
> into Origin/Development.
> 
> We have had issues with developers stepping on code when they have 
> long running branches. We are looking for a best practices.

There is yet another solution:

OR

* Periodically _rebase_ pure feature branch on top of current version
of Origin/Development, or do such rebase (perhaps an interactive one
to clean-up feature development steps) before pull request, before
they are to merge feature branch into Origin/Development.

You can also try to use third-party git-imerge tool to help merging
long divergent branches.

HTH,
-- 
Jakub Narębski



[BUG?] iconv used as textconv, and spurious ^M on added lines on Windows

2017-03-30 Thread Jakub Narębski
Hello,

Recently I had to work on a project which uses legacy 8-bit encoding
(namely cp1250 encoding) instead of utf-8 for text files (LaTeX
documents).  My terminal, that is Git Bash from Git for Windows is set
up for utf-8.

I wanted for "git diff" and friends to return something sane on said
utf-8 terminal, instead of mojibake.  There is 'encoding'
gitattribute... but it works only for GUI ('git gui', that is).

Therefore I have (ab)used textconv facility to convert from cp1250 of
file encoding to utf-8 encoding of console.

I have set the following in .gitattributes file:

  ## LaTeX documents in cp1250 encoding
  *.tex text diff=mylatex

The 'mylatex' driver is defined as:

  [diff "mylatex"]
xfuncname = "^(((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
wordRegex = "[a-zA-Z]+|[{}]|.|[^\\{}[:space:]]+"
textconv  = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t 
utf-8
cachetextconv = true

And everything would be all right... if not the fact that Git appends
spurious ^M to added lines in the `git diff` output.  Files use CRLF
end-of-line convention (the native MS Windows one).

  $ git diff test.tex
  diff --git a/test.tex b/test.tex
  index 029646e..250ab16 100644
  --- a/test.tex
  +++ b/test.tex
  @@ -1,4 +1,4 @@
  -\documentclass{article}
  +\documentclass{mwart}^M
  
   \usepackage[cp1250]{inputenc}
   \usepackage{polski}

What gives?  Why there is this ^M tacked on the end of added lines,
while it is not present in deleted lines, nor in content lines?

Puzzled.

P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
that it doesn't supports in core `encoding` attribute together with
having `i18n.outputEncoding`.
--
Jakub Narębski




Re: [PATCH v2] builtin/describe: introduce --broken flag

2017-03-22 Thread Jakub Narębski
W dniu 21.03.2017 o 23:57, Stefan Beller pisze:

> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -30,9 +30,14 @@ OPTIONS
>   Commit-ish object names to describe.  Defaults to HEAD if omitted.
>  
>  --dirty[=]::
> - Describe the working tree.
> - It means describe HEAD and appends  (`-dirty` by
> - default) if the working tree is dirty.
> +--broken[=]::
> + Describe the state of the working tree.  When the working
> + tree matches HEAD, the output is the same as "git describe
> + HEAD".  If the working tree has local modification "-dirty"
> + is appended to it.  If a repository is corrupt and Git
> + cannot determine if there is local modification, Git will
> + error out, unless `--broken' is given, which appends
> + the suffix "-broken" instead.

The common description reads better... but unfortunately it lost
information about the optional parameter, namely .  The
'-dirty' is just the default for , and '-broken' is
the default for .

Maybe /the suffix "-broken"/ suffix ('-broken' by default)/
and similarly for "-dirty"?

Best,
-- 
Jakub Narębski



Re: [PATCH 1/8] tag: Remove a TODO item from the test suite

2017-03-18 Thread Jakub Narębski
W dniu 18.03.2017 o 11:32, Ævar Arnfjörð Bjarmason pisze:

> @@ -136,7 +135,6 @@ test_expect_success \
>   'listing a tag using a matching pattern should output that tag' \
>   'test $(git tag -l mytag) = mytag'
>  
> -# todo: git tag -l now returns always zero, when fixed, change this test
>  test_expect_success \
>   'listing tags using a non-matching pattern should suceed' \
>   'git tag -l xxx'

Could you fix s/suceed/succeed/ in the test description,
while at it?

-- 
Jakub Narębski



Re: Shared repositories no longer securable against privilege escalation

2017-03-18 Thread Jakub Narębski
W dniu 17.03.2017 o 18:12, Joe Rayhawk pisze:
> Quoting Michael Haggerty (2017-03-17 05:07:36)

>>
>> Thanks for the report. This is indeed a problem for people who want to
>> set restrictive privileges on $GIT_DIR. I'd never thought of that use
>> case, but it makes sense. Is this practice recommended somewhere or
>> required by any Git hosting tools? (I'm curious how prevalent it is.)
> 
> I had to work out the practice for my own management engine; I have
> since deployed it to around eight different mixed-use multi-user
> operations, the most significant of which is Freedesktop.org.
> 
> Without this practice, core.sharedRepository is an enormous liability
> of a feature. I can't speak to whether anyone but me ever noticed, what
> with mixed-use multi-user POSIX environments becoming increasingly rare.

Is there a reason why you rely on file permissions and user groups
to enforce access control, instead of using public-key based solution
such as Gitolite?

-- 
Jakub Narębski



Re: Is there a way to have a local version of a header file?

2017-03-18 Thread Jakub Narębski
W dniu 18.03.2017 o 18:08, Ævar Arnfjörð Bjarmason pisze:

> There might be some way I haven't thought of, in particular maybe you
> can use gitattributes to define a custom diff/merge driver that always
> reports no changes, or some ways to (ab)use the index to make git
> ignore any changes to the file.

There is `git update-index --skip-worktree` (originally meant for
sparse checkout), which you can use to kind of ignore changes to
tracked file, in a safe way (though sometimes annoying, when it
prevents stashing changes).

There is also an existing solution of a hook that prevents commiting
files with passwords in them; I forgot the name...

HTH,
-- 
Jakub Narębski



Re: git-push branch confusion caused by user mistake

2017-03-11 Thread Jakub Narębski
W dniu 10.03.2017 o 22:44, Phil Hord pisze:
> This week a user accidentally did this:
> 
> $ git push origin origin/master
> Total 0 (delta 0), reused 0 (delta 0)
> To parent.git
>  * [new branch]  origin/master -> origin/master
> 
> He saw his mistake when the "new branch" message appeared, but he was
> confused about how to fix it and worried he broke something.

It is nowadays very easy to delete accidentally created remote branch
with

  $ git push origin --delete origin/master
 
> It seems reasonable that git expanded the original args into this one:
> 
> git push origin refs/remotes/origin/master
> 
> However, since the dest ref was not provided, it was assumed to be the
> same as the source ref, so it worked as if he typed this:
> 
> git push origin refs/remotes/origin/master:refs/remotes/origin/master

This rule depends on push.default setting, but it is a very simple
rule.  Simple is good.  DWIM is usually not worth it, unless program
can guess what you meant, and what you meant is always the same.

> I think git should be smarter about deducing the dest ref from the
> source ref if the source ref is in refs/remotes, but I'm not sure how
> far to take it.  It feels like we should translate refspecs something
> like this for push:
> 
> origin/master
> => refs/remotes/origin/master:refs/heads/master
[...]

Such push doesn't make sense (unless you have a quite unusual situation).

Note that 'origin/master', that is 'refs/remotes/origin/master' is a
remote-tracking branch, that is a ref that is meant to track position
of the 'master' branch ('refs/heads/master') in the 'origin' remote.
Thus it should always be the same as 'master' in 'origin', or be behind
if you didn't fetch.

> Does this seem reasonable?  I can try to work up a patch if so.

Thus I don't think such complication is reasonable.

-- 
Jakub Narębski
 



Re: [PATCH] branch: honor --abbrev/--no-abbrev in --list mode

2017-03-09 Thread Jakub Narębski
W dniu 08.03.2017 o 23:16, Junio C Hamano pisze:
 
> diff --git a/builtin/branch.c b/builtin/branch.c
> index cbaa6d03c0..537c47811a 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -335,9 +335,18 @@ static char *build_format(struct ref_filter *filter, int 
> maxwidth, const char *r
>   branch_get_color(BRANCH_COLOR_CURRENT));
>  
>   if (filter->verbose) {
> + struct strbuf obname = STRBUF_INIT;
> +
> + if (filter->abbrev < 0)
> + strbuf_addf(, "%%(objectname:short)");
> + else if (!filter->abbrev)
> + strbuf_addf(, "%%(objectname)");
> + else
> + strbuf_addf(, " %%(objectname:short=%d) ", 
> filter->abbrev);
  ^   ^
I wonder why the last one has leading space --/ and trailing one -/
The rest (for default abbrev and for no abbrev do not).

-- 
Jakub Narębski



Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

2017-03-06 Thread Jakub Narębski
W dniu 06.03.2017 o 23:13, Junio C Hamano pisze:
> Stefan Beller <sbel...@google.com> writes:
> 
>> What is the difference between signed commits and tags?
>> (Not from a technical perspective, but for the end user)
[...]
>> Off list I was told gpg-signed commits are a "checkbox feature",
>> i.e. no real world workflow would actually use it. (That's a bold
>> statement, someone has to use it as there was enough interest
>> to implement it, no?)
> 
> I'd agree with that "checkbox" description, except that you need to
> remember that a project can enforce _any_ workflow to its developer,
> even if it does not make much sense, and at that point, the workflow
> would become a real-world workflow.  The word "real world workflow"
> does not make any assurance if that workflow is sensible.
> 
> Historically, "tag -s" came a lot earlier.  When a project for
> whatever reason wants signature for each and every commit so that
> they somehow can feel good, without "commit -s", it would have made
> us unnecessary work to scale tag namespace only because there will
> be tons of pointless tags.  "commit -s" was a remedy for that.

Also from what I remember signed commits came before mergetags, that
is the result of merging a signed tag (storing the signature of
one of parents of the merge commit to not pollute tag namespace).

And this workflow, from what I know, is quite useful.

-- 
Jakub Narębski
 



Re: Reference for quote "creating branch is not the issue, merging is", in context of Subversion/Git

2017-02-27 Thread Jakub Narębski
W dniu 26.02.2017 o 16:19, Igor Djordjevic pisze:
> Hello Michael,
> 
> On 26/02/2017 12:40, Michael Hüttermann wrote:
>> Linus Torvalds made a statement regarding merging/branching and stated
>> (as far as I know) that "creating branch is not the issue, merge is", in
>> context of Subversion/Git.
>> I do not find the origin source for that. Can you please help and point
>> me to a statement or article where Linus elaborated on this?
> 
> Could it be that you think of "Tech Talk: Linus Torvalds on Git"[1]
> (held on May 3, 2007)?
> 
> To give you some clue, here`s an excerpt from Linus' talk/presentation
> (taken from the transcript[2] containing the whole thing):
> 
>   "... Subversion for example, talks very loudly about how they do CVS
>   right by making branching really cheap. It's probably on their main
>   webpage where they probably say branching in subversion is O(1)
>   operation, you can do as many cheap branches as you want. Nevermind
>   that O(1) is actually with pretty large O I think, but even if it
>   takes a millionth of a second to do branching, who cares? It's the
>   wrong thing you are measuring. Nobody is interested in branching,
>   branches are completely useless unless you merge them, and CVS cannot
>   merge anything at all. You can merge things once, but because CVS
>   then forgets what you did, you can never ever merge anything again
>   without getting horrible horrible conflicts. Merging in subversion is
>   a complete disaster. The subversion people kind of acknowledge this
>   and they have a plan, and their plan sucks too. It is incredible how
>   stupid these people are. They've been looking at the wrong problem
>   all the time. Branching is not the issue, merging is..."
> 
> This specific branch/merge performance talk starts at 50:20[3], where
> the part quoted above comes at 51:34[4].

Note also that while "creating branch is not the issue, merge is"
remains true, modern Subversion (post 1.5) makes merging easy thanks
to svn:mergeinfo property.

Though it does it in completely different way than Git and other
"graph of commits" VCS-es, because of the "branch is directory"
philosophy, namely that it keeps information about what was merged
in, rather than finding common ancestor(s) and using this information
for resolving merge.

> 
> Please note that there`s more context before and after this excerpt
> that puts it all into the meant perspective, so you may really want
> to watch/listen/read the whole thing anyway.
> 
> Regards,
> Buga
> 
> [1] https://www.youtube.com/watch?v=4XpnKHJAok8
> [2] https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript
> [3] https://youtu.be/4XpnKHJAok8?t=3020
> [4] https://youtu.be/4XpnKHJAok8?t=3094
> 



Re: [PATCH 2/2] commit: don't check for space twice when looking for header

2017-02-27 Thread Jakub Narębski
W dniu 25.02.2017 o 20:27, René Scharfe pisze:
> Both standard_header_field() and excluded_header_field() check if
> there's a space after the buffer that's handed to them.  We already
> check in the caller if that space is present.  Don't bother calling
> the functions if it's missing, as they are guaranteed to return 0 in
> that case, and remove the now redundant checks from them.
> 
> Signed-off-by: Rene Scharfe 
> ---
>  commit.c | 18 --
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/commit.c b/commit.c
> index 173c6d3818..fab8269731 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1308,11 +1308,11 @@ void for_each_mergetag(each_mergetag_fn fn, struct 
> commit *commit, void *data)
>  
>  static inline int standard_header_field(const char *field, size_t len)
>  {
> - return ((len == 4 && !memcmp(field, "tree ", 5)) ||
> - (len == 6 && !memcmp(field, "parent ", 7)) ||
> - (len == 6 && !memcmp(field, "author ", 7)) ||
> - (len == 9 && !memcmp(field, "committer ", 10)) ||
> - (len == 8 && !memcmp(field, "encoding ", 9)));
> + return ((len == 4 && !memcmp(field, "tree", 4)) ||
> + (len == 6 && !memcmp(field, "parent", 6)) ||
> + (len == 6 && !memcmp(field, "author", 6)) ||
> + (len == 9 && !memcmp(field, "committer", 9)) ||
> + (len == 8 && !memcmp(field, "encoding", 8)));

I agree (for what it is worth from me) with the rest of changes,
but I think current code is better self-documenting for this
function.

>  }
>  
>  static int excluded_header_field(const char *field, size_t len, const char 
> **exclude)
> @@ -1322,8 +1322,7 @@ static int excluded_header_field(const char *field, 
> size_t len, const char **exc
>  
>   while (*exclude) {
>   size_t xlen = strlen(*exclude);
> - if (len == xlen &&
> - !memcmp(field, *exclude, xlen) && field[xlen] == ' ')
> + if (len == xlen && !memcmp(field, *exclude, xlen))
>   return 1;
>   exclude++;
>   }
> @@ -1357,9 +1356,8 @@ static struct commit_extra_header 
> *read_commit_extra_header_lines(
>   eof = memchr(line, ' ', next - line);
>   if (!eof)
>   eof = next;
> -
> - if (standard_header_field(line, eof - line) ||
> - excluded_header_field(line, eof - line, exclude))
> + else if (standard_header_field(line, eof - line) ||
> +  excluded_header_field(line, eof - line, exclude))
>   continue;
>  
>   it = xcalloc(1, sizeof(*it));
> 



Re: [PATCH v2] convert: add "status=delayed" to filter process protocol

2017-02-27 Thread Jakub Narębski
t;command=wait" (or "await" ;-))
might be better.

>>
>>  [Some time may pass if nothing is ready. But eventually we get...]
>>  git< status=success

Or

git< status=resumed

If it would not be undue burden on the filter driver process, we might
require for it to say where to continue at (in bytes), e.g.

git< from=16426

That should, of course, go below index/pathname line.

>>  git< index=0

Or a filter driver could have used pathname as an index, that is

git< pathname=path/testfile.dat

>>  git< 
>>  git< CONTENT
>>  git< 
>>
>> From Git's side, the loop is something like:
>>
>>  while (delayed_items > 0) {
>>  /* issue a wait, and get back the status/index pair */
>>  status = send_wait();
>>  delayed_items--;

This looks like my 'event loop' proposal[1][2], see below.

>>
>>  /*
>>   * use "index" to find the right item in our list of files;
>>   * the format can be opaque to the filter, so we could index
>>   * it however we like. But probably numeric indices in an array
>>   * are the simplest.
>>   */
>>  assert(index > 0 && index < nr_items);
>>  item[index].status = status;
>>  if (status == SUCCESS)
>>  read_content([index]);
>>  }
>>
>> and the filter side just attaches the "index" string to whatever its
>> internal queue structure is, and feeds it back verbatim when processing
>> that item finishes.

I have wrote about this for v1 version of this patch series:

[1]: 
https://public-inbox.org/git/ec8078ef-8ff2-d26f-ef73-5ef612737...@gmail.com/
[2]: 
https://public-inbox.org/git/17fa31a5-8689-2766-952b-704f433a5...@gmail.com/

> 
> That could work! I had something like that in mind:
> 
> I teach Git a new command "list_completed" or similar. The filter
> blocks this call until at least one item is ready for Git. 
> Then the filter responds with a list of paths that identify the
> "ready items". Then Git asks for these ready items just with the
> path and not with any content. Could that work? Wouldn't the path
> be "unique" to identify a blob per filter run?

Why in the "drain" phase it is still Git that needs to ask filter for
contents, one file after another?  Wouldn't it be easier and simpler
for filter to finish sending contents, and send signal that it has
finished continue'ing?

To summarize my earlier emails, current proposal looks for me as if
it were a "busy loop" solution, that is[2]:

  while there are delayed paths:
  for each delayed path:
  request path from filter [a]
  fetch the thing (supporting "delayed") [b]
  if path done
  do the caller's thing to use buf
  remove it from delayed paths list


Footnotes:
--
a) We don't send the Git-side contents of blob again, isn't it?
   So we need some protocol extension / new understanding anyway.
   for example that we don't send contents if we request path again.
b) If path is not ready at all, filter protocol would send status=delayed
   with empty contents.  This means that we would immediately go to the
   next path, if there is one.

There are some cases where busy loop is preferable, but I don't think
it is the case here.


The "event loop" solution, which is I think what Peff proposed, would
be something like this (from the protocol point of view, rather than
from the Git code point of view)[1]:

while there are delayed paths:
get path that is ready from filter
fetch the thing to buf (supporting "delayed")
if path done
do the caller's thing to use buf 
(e.g. finish checkout path, eof convert, etc.)

We can either trust filter process to tell us when it finished sending
delayed paths, or keep list of paths that are being delayed in Git.


Also, one thing that we need to be solved, assuming that the proposed
extension allows to send partial data from filter to be delayed and
continued later, is that Git needs to keep this partial response in buf;
this is because of precedence of gitattributes applying:

  Interaction between checkin/checkout attributes
  ^^^
  In the check-in codepath, the worktree file is first converted with
  `filter` driver (if specified and corresponding driver defined), then
  the result is processed with `ident` (if specified), and then finally
  with `text` (again, if specified and applicable).

  In the check-out codepath, the blob content is first converted with
  `text`, and then `ident` and fed to `filter`.

Or maybe I misunderstood something; I am a bit confused by check-in
vs check-out there...


Note that with support of "command=wait" / "command=continue" Git
could interrupt sending contents, and drain unfinished files, if
it needs it... then continue sending rest of files.  Well, if the
protocol extension is interpreted to allow for this.

Best regards,
-- 
Jakub Narębski



Re: [PATCH] gitweb tests: Skip tests when we don't have Time::HiRes

2017-02-27 Thread Jakub Narębski
W dniu 27.02.2017 o 13:37, Ævar Arnfjörð Bjarmason pisze:
> Change the gitweb tests to skip when we can't load the Time::HiRes
> module.

Could you tell us in the commit message why this module is needed?
Is it because gitweb loads it unconditionally, or does that at least
in the default configuration, or is it used in tests, or...?

[I see it is somewhat addressed below]

> 
> This module has bee in perl core since v5.8, which is the oldest

s/bee/been/

> version we support, however CentOS (and perhaps some other
> distributions) carve it into its own non-core-perl package that's not
> installed along with /usr/bin/perl by default. Without this we'll hard
> fail the gitweb tests when trying to load the module.

I see that it because gitweb.perl as the following at line 20:

use Time::HiRes qw(gettimeofday tv_interval);

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Good catch (if a strange one...).

> ---
>  t/gitweb-lib.sh | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
> index d5dab5a94f..116c3890e6 100644
> --- a/t/gitweb-lib.sh
> +++ b/t/gitweb-lib.sh
> @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 
> || {
>   test_done
>  }
>  
> +perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
> + skip_all='skipping gitweb tests, Time::HiRes module unusable'

Is "unusable" a good description, instead of "not found"?

> + test_done
> +}
> +
>  gitweb_init
> 



Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
W dniu 25.02.2017 o 00:06, Jeff King pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> 
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
>>
>> From what I understand people tried this, and Git doesn't exhibit
>> such problem.  I wonder what assumptions SVN made that were broken...
> 
> To be clear, nobody has generated a sha1 collision in Git yet, and you
> cannot blindly use the shattered PDFs to do so. Git's notion of the
> SHA-1 of an object include the header, so somebody would have to do a
> shattered-level collision search for something that starts with the
> correct "blob 1234\0" header.

What I meant by "Git doesn't exhibit such problem" (but was not clear
enough) is that Git doesn't break by just adding SHAttered.io PDFs
(which somebody had checked), but need customized attack.

> 
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:

You are right that it would be good to know if such Git-geared customized
SHA-1 attack would break Git, or would it simply corrupt it (visibly
or not).

> 
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, 
> unsigned long len)
>   memcpy(ctx->W, data, len);
>  }
>  
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> + 0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> + 0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> + 0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> + 0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
>   static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], 
> blk_SHA_CTX *ctx)
>   /* Output hash */
>   for (i = 0; i < 5; i++)
>   put_be32(hashout + i * 4, ctx->H[i]);
> +
> + /* pretend "foo" and "bar" collide */
> + if (!memcmp(hashout, bar_sha1, 20))
> + memcpy(hashout, foo_sha1, 20);
>  }
> 



Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 23:53, Santiago Torres pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
>>
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
> 
> From what I understood in the thread[1], it was the combination of svn +
> git-svn together. I think Arstechnica may be a little bit
> sensationalistic here.
 
> [1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27

Thanks for the link.  It looks like the problem was with svn itself
(couldn't checkout, couldn't sync), but repository is recovered now,
though not protected against the problem occurring again.

Well, anyone with Subversion installed (so not me) can check it
for himself/herself... though better do this with separate svnroot.


Note that the breakage was an accident, trying to add test case
for SHA-1 collision in WebKit cache.
 
Best regards,
-- 
Jakub Narębski


Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
I have just read on ArsTechnica[1] that while Git repository could be
corrupted (though this would require attackers to spend great amount
of resources creating their own collision, while as said elsewhere
in this thread allegedly easy to detect), putting two proof-of-concept
different PDFs with same size and SHA-1 actually *breaks* Subversion.
Repository can become corrupt, and stop accepting new commits.  

>From what I understand people tried this, and Git doesn't exhibit
such problem.  I wonder what assumptions SVN made that were broken...

The https://shattered.io/ page updated their Q section with this
information.

BTW. what's with that page use of "GIT" instead of "Git"??


[1]: 
https://arstechnica.com/security/2017/02/watershed-sha1-collision-just-broke-the-webkit-repository-others-may-follow/
 
 "Watershed SHA1 collision just broke the WebKit repository, others may 
follow"


Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
> Linking the description for pathname quoting to the configuration
> variable "core.quotePath" removes inconstistent and incomplete
> sections while also giving two hints how to deal with it: Either with
> "-c core.quotePath=false" or with "-z".

This patch I am not sure about.  On one hand it improves consistency
(and makes information more complete), on the other hand it removes
information at hand and instead refers to other manpage.

Perhaps a better solution would be to craft a short description that
is both sufficiently complete, and refers to "core.quotePath" for
more details, and then transclude it with "include::quotepath.txt[]".

> 
> Signed-off-by: Andreas Heiduk 
> ---
>  Documentation/diff-format.txt |  7 ---
>  Documentation/diff-generate-patch.txt |  7 +++
>  Documentation/diff-options.txt|  7 +++
>  Documentation/git-apply.txt   |  7 +++
>  Documentation/git-commit.txt  |  9 ++---
>  Documentation/git-ls-files.txt| 10 ++
>  Documentation/git-ls-tree.txt | 10 +++---
>  Documentation/git-status.txt  |  7 +++
>  8 files changed, 35 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
> index cf52626..706916c 100644
> --- a/Documentation/diff-format.txt
> +++ b/Documentation/diff-format.txt
> @@ -78,9 +78,10 @@ Example:
>  :100644 100644 5be4a4.. 00.. M file.c
>  
>  
> -When `-z` option is not used, TAB, LF, and backslash characters
> -in pathnames are represented as `\t`, `\n`, and `\\`,
> -respectively.
> +Without the `-z` option, pathnames with "unusual" characters are
> +quoted as explained for the configuration variable `core.quotePath`
> +(see linkgit:git-config[1]).  Using `-z` the filename is output
> +verbatim and the line is terminated by a NUL byte.
>  
>  diff format for merges
>  --
> diff --git a/Documentation/diff-generate-patch.txt 
> b/Documentation/diff-generate-patch.txt
> index d2a7ff5..231105c 100644
> --- a/Documentation/diff-generate-patch.txt
> +++ b/Documentation/diff-generate-patch.txt
> @@ -53,10 +53,9 @@ The index line includes the SHA-1 checksum before and 
> after the change.
>  The  is included if the file mode does not change; otherwise,
>  separate lines indicate the old and the new mode.
>  
> -3.  TAB, LF, double quote and backslash characters in pathnames
> -are represented as `\t`, `\n`, `\"` and `\\`, respectively.
> -If there is need for such substitution then the whole
> -pathname is put in double quotes.
> +3.  Pathnames with "unusual" characters are quoted as explained for
> +the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  4.  All the `file1` files in the output refer to files before the
>  commit, and all the `file2` files refer to files after the commit.
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index e6215c3..7c28e73 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -192,10 +192,9 @@ ifndef::git-log[]
>   given, do not munge pathnames and use NULs as output field terminators.
>  endif::git-log[]
>  +
> -Without this option, each pathname output will have TAB, LF, double quotes,
> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
> -respectively, and the pathname will be enclosed in double quotes if
> -any of those replacements occurred.
> +Without this option, pathnames with "unusual" characters are munged as
> +explained for the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  --name-only::
>   Show only names of changed files.
> diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
> index 8ddb207..a7a001b 100644
> --- a/Documentation/git-apply.txt
> +++ b/Documentation/git-apply.txt
> @@ -108,10 +108,9 @@ the information is read from the current index instead.
>   When `--numstat` has been given, do not munge pathnames,
>   but use a NUL-terminated machine-readable format.
>  +
> -Without this option, each pathname output will have TAB, LF, double quotes,
> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
> -respectively, and the pathname will be enclosed in double quotes if
> -any of those replacements occurred.
> +Without this option, pathnames with "unusual" characters are munged as
> +explained for the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  -p::
>   Remove  leading slashes from traditional diff paths. The
> diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
> index 4f8f20a..25dcdcc 100644
> --- a/Documentation/git-commit.txt
> +++ b/Documentation/git-commit.txt
> @@ -117,9 +117,12 @@ OPTIONS
>  
>  -z::
>  --null::
> - When showing `short` or 

Re: [PATCH v2 1/2] Documentation: Improve description for core.quotePath

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
> Signed-off-by: Andreas Heiduk 

Thanks.  This is good work.

> ---
>  Documentation/config.txt | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1fee83c..fa06c2a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -347,16 +347,20 @@ core.checkStat::
>   all fields, including the sub-second part of mtime and ctime.
>  
>  core.quotePath::
> - The commands that output paths (e.g. 'ls-files',
> - 'diff'), when not given the `-z` option, will quote
> - "unusual" characters in the pathname by enclosing the
> - pathname in a double-quote pair and with backslashes the
> - same way strings in C source code are quoted.  If this
> - variable is set to false, the bytes higher than 0x80 are
> - not quoted but output as verbatim.  Note that double
> - quote, backslash and control characters are always
> - quoted without `-z` regardless of the setting of this
> - variable.
> +

This empty line should not be here, I think.

> + Commands that output paths (e.g. 'ls-files', 'diff'), will
> + quote "unusual" characters in the pathname by enclosing the
> + pathname in double-quotes and escaping those characters with
> + backslashes in the same way C escapes control characters (e.g.
> + `\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
> + values larger than 0x80 (e.g. octal `\302\265` for "micro" in

I wonder if we can put UTF-8 in AsciiDoc, that is write "μ"
instead of spelling it "micro" (or: Greek letter "mu").

Or "" / "", though I wonder how well it is supported
in manpage, info and PDF outputs...

> + UTF-8).  If this variable is set to false, bytes higher than
> + 0x80 are not considered "unusual" any more. Double-quotes,
> + backslash and control characters are always escaped regardless
> + of the setting of this variable.  A simple space character is
> + not considered "unusual".  Many commands can output pathnames
> + completely verbatim using the `-z` option. The default value
> + is true.
>  
>  core.eol::
>   Sets the line ending type to use in the working directory for
> 



Re: SHAttered (the first practical SHA1 attack)

2017-02-23 Thread Jakub Narębski
W dniu 23.02.2017 o 16:50, Santiago Torres pisze:
> Hello all,
> 
> I ran into this website presenting the "first practical attack on
> sha1"[1]. I don't recall seeing this on the ML, so I'm sharing this just
> in case. I know there are proposals to move out of sha1 already. I
> wonder if this affects the timeline for their adoption?
> 
> Thanks,
> -Santiago.
> 
> [1] https://shattered.io/

This is already being discussed in "SHA1 collisions found"[1]
thread.  Let's do the discussion there.

[1] https://public-inbox.org/git/xmqqk28g92h7@gitster.mtv.corp.google.com
-- 
Jakub Narębski
 



Re: SHA1 collisions found

2017-02-23 Thread Jakub Narębski
W dniu 23.02.2017 o 18:12, David Lang pisze:
> On Thu, 23 Feb 2017, Junio C Hamano wrote:
> 
>> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess  wrote:
>>> 
>>> Since we now have collisions in valid PDF files, collisions in
>>> valid git commit and tree objects are probably able to be
>>> constructed.
>> 
>> That may be true, but 
>> https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/
>>
>
> it doesn't help that the Google page on this explicitly says that
> this shows that it's possible to create two different git repos that
> have the same hash but different contents.
> 
> https://shattered.it/
> 
> How is GIT affected? GIT strongly relies on SHA-1 for the
> identification and integrity checking of all file objects and
> commits. It is essentially possible to create two GIT repositories
> with the same head commit hash and different contents, say a benign
> source code and a backdoored one. An attacker could potentially
> selectively serve either repository to targeted users. This will
> require attackers to compute their own collision.

The attack on SHA-1 presented there is "identical-prefix" collision,
which is less powerful than "chosen-prefix" collision.  It is the
latter that is required to defeat SHA-1 used in object identity.
Objects in Git _must_ begin with given prefix; the use of zlib
compression adds to the difficulty.  'Forged' Git object would
simply not validate...

https://arstechnica.com/security/2017/02/at-deaths-door-for-years-widely-used-sha1-function-is-now-dead/



Re: Git bisect does not find commit introducing the bug

2017-02-20 Thread Jakub Narębski
W dniu 20.02.2017 o 21:31, Alex Hoffman pisze:
> I see two different problems each with a different assumption (see the
> definition of "bisectable" in the email of Junio C Hamano):
> 
> 1. (Current) Assume the entire history graph is bisectable. DO: Search
> where in the entire graph the first 'trait'/transition occurs.
>
> 2. (New) Assume only the graph between one good commit and one bad
> commit is bisectable. DO: Search where the first transition occurs in
> the graph between these two commits.

If `git bisect` is/would be affected by `git log` history-related options
then this is what `--strict-ancestor` option gives/would give.

-- 
Jakub Narębski



Re: Git bisect does not find commit introducing the bug

2017-02-20 Thread Jakub Narębski
W dniu 20.02.2017 o 08:38, Oleg Taranenko pisze:

>>>> Then you must adjust your definition of "good": All commits that do not 
>>>> have
>>>> the feature, yet, are "good": since they do not have the feature in the
>>>> first place, they cannot have the breakage that you found in the feature.
>>>>
>>>> That is exactly the situation in your original example! But you constructed
>>>> the condition of goodness in such a simplistic way (depending on the
>>>> presence of a string), that it was impossible to distinguish between "does
>>>> not have the feature at all" and "has the feature, but it is broken".
>>>
>>> Johannes, thank you for correctly identifying the error in my logic.
>>> Indeed I was using the term 'bad' also for the commit without the
>>> feature. In order to find the commit introducing the bug in my example
>>> a new state is needed, which would make 'git bisect' a bit more
>>> complicated than the user 'most of the time' probably needs. Or do you
>>> think, it would make sense to ask the user for this state (if e.g 'git
>>> bisect' would be started with a new parameter)?
> 
>> If a commit doesn't have the feature, then it is by definition, not
>> containing a broken feature, and you can simply use the "good" state.
>> There is no need for a different state. If you can't test the commit
>> because it's broken in some other way, you can use "git bisect skip"
>> but that isn't what you want in this case.
> 
> Commits missing feature == 'good' commit is a very confusing one.

Nowadays you can change the names for 'old' and 'new' with
`git bisect terms`.  HTH.
 
> Looks like in real life it happens much often, then git developers can
> imagine. For multi-branch/multi-feature workflow it's pretty easy not
> to recognize whether it is missing or not developed yet, especially on
> retrospective view where cherry-picking/squashing/merging is being
> used. My experience shows most annoying bugs are generating after a
> heavy merge (evil merge) with conflicts resolutions, where developer
> is not involved in the knowing what happens on counterpart changes.
> Then feature can be disappeared after it was worked & tested in its
> own branches.

Good to know about this problem.

> @Alex, I'm pretty interesting in fixing this weird bisect behaviour as
> well, as far as I struggled on it last summer and continue struggling
> so far :) If you want we can join to your efforts on fixing.

Anyway, I don't think it is feasible to weaken the assumption that there
is only one transition from 'old' ('good') to 'new' ('bad'); this is
what allows O(log(N)) operations.  See bisection method of root finding,
that is finding zeros of a continuous function.

Best,
-- 
Jakub Narębski



Re: topological index field for commit objects

2017-02-17 Thread Jakub Narębski
On 17 February 2017 at 10:26, Jeff King <p...@peff.net> wrote:
> On Sat, Feb 04, 2017 at 02:43:01PM +0100, Jakub Narębski wrote:
>
>> >>>> Do Git use EWAH / EWOK bitmaps for reachability analysis, or is it still
>> >>>> limited to object counting?
>> >>>
>> >>> At GitHub we are using them for --contains analysis, along with mass
>> >>> ahead/behind (e.g., as in https://github.com/gitster/git/branches). My
>> >>> plan is to send patches upstream, but they need some cleanup first.
>> >>
>> >> Ping. have you got time to clean up those patches?
>> >
>> > No, I haven't. Don't hold your breath; it's something I hope to work on
>> > in the next 6 months, not the next 6 weeks.
>>
>> Ping, Was there any progress on this front? It is now almost 6 months
>> later...
>>
>> Could those patches be made available in a "dirty" form?
>
> I just pushed them up to the "jk/ahead-behind" branch of
> https://github.com/peff/git.
>
> They were originally done on top of v1.9.1, but I forward-ported them to
> the current "master" just now. The result compiles, but I haven't really
> tested it extensively. Caveat applier.

Thanks a lot! I'll check this out.

-- 
Jakub Narebski


[RFH] Request for Git Merge 2017 impressions for Git Rev News

2017-02-11 Thread Jakub Narębski
Hello,

Git Rev News #24 is planned to be released on February 15. It is meant
to cover what happened during the month of January 2017 (and earely
February 2017) and the Git Merge 2017 conference that happened on
February 2nd and 3rd 2017.

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-24.md

I would like to ask everyone who attended the conference (and the
GitMerge 2017 Contributors’s Summit day before it), or watched it live
at http://git-merge.com/watch to write his or her impressions.

You can contribute either by replying to this email, or by editing the
above page on GitHub and sending a pull request, or by commenting on
the following GitHub issue about Git Rev News 24:

  https://github.com/git/git.github.io/issues/221

If you prefer to post on your own blog (or if you have did it
already), please send an URL.


P.S. I wonder if there should be not a separate section on
https://git.github.io/ about recollection from various Git-related
events, with Git Merge 2017 as the first one.  This way we can wait
for later response, and incorporate videos and slides from events, as
they begin to be available.

P.P.S. Please distribute this information more widely.

Thanks in advance,
-- 
Jakub Narębski



Re: [RFC] Add support for downloading blobs on demand

2017-02-07 Thread Jakub Narębski
I'd like to point to two (or rather one and a half) solutions that I got
aware of when watching streaming of "Git Merge 2017"[0].  There should
be here people who were there; and hopefully video of those presentations
and slides / notes would be soon available.

[0]: http://git-merge.com/

First tool that I'd like to point to is Git Virtual File System, or
GVFS in short (which unfortunately shares abbreviation with GNOME Virtual
File System).

The presentation was "Scaling Git at Microsoft" by Saeed Noursalehi, 
Microsoft.  You can read about this solution in ArsTechnica article[1],
and on Microsoft blog[2].  The code (or early version of thereof) is
also available[3] - I wonder why on GitHub and not Codeplex...

[1]: 
https://arstechnica.com/information-technology/2017/02/microsoft-hosts-the-windows-source-in-a-monstrous-300gb-git-repository/
[2]: 
https://blogs.msdn.microsoft.com/visualstudioalm/2017/02/03/announcing-gvfs-git-virtual-file-system/
[3]: https://github.com/Microsoft/GVFS


The second presentation that might be of some interest is "Scaling
Mercurial at Facebook: Insights from the Other Side" by Durham Goode,
Facebook.  The code is supposedly available as open-source; though
I don't know how useful their 'blob storage' solution would be of use
for your problem.


HTH
-- 
Jakub Narębski



Re: topological index field for commit objects

2017-02-04 Thread Jakub Narębski
On 20 July 2016 at 15:02, Jeff King <p...@peff.net> wrote:
> On Wed, Jul 20, 2016 at 02:07:38AM +0200, Jakub Narębski wrote:
>> W dniu 2016-06-30 o 00:00, Jeff King pisze:
>>> On Wed, Jun 29, 2016 at 11:49:35PM +0200, Jakub Narębski wrote:
>>
>>>> Do Git use EWAH / EWOK bitmaps for reachability analysis, or is it still
>>>> limited to object counting?
>>>
>>> At GitHub we are using them for --contains analysis, along with mass
>>> ahead/behind (e.g., as in https://github.com/gitster/git/branches). My
>>> plan is to send patches upstream, but they need some cleanup first.
>>
>> Ping. have you got time to clean up those patches?
>
> No, I haven't. Don't hold your breath; it's something I hope to work on
> in the next 6 months, not the next 6 weeks.

Ping, Was there any progress on this front? It is now almost 6 months
later...

Could those patches be made available in a "dirty" form?

TIA,
-- 
Jakub Narębski


-- 
Jakub Narebski


Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard

2017-01-25 Thread Jakub Narębski
W dniu 24.01.2017 o 21:14, Jeff King pisze:
> On Sat, Jan 21, 2017 at 08:08:02PM +, Thomas Gummerer wrote:
> 
>> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
>> index 2e9cef06e6..0ad5335a3e 100644
>> --- a/Documentation/git-stash.txt
>> +++ b/Documentation/git-stash.txt
>> @@ -47,8 +47,9 @@ OPTIONS
>>  
>>  save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] 
>> [-a|--all] [-q|--quiet] []::
>>  
>> -Save your local modifications to a new 'stash', and run `git reset
>> ---hard` to revert them.  The  part is optional and gives
>> +Save your local modifications to a new 'stash', and revert the
>> +the changes in the working tree to match the index.
>> +The  part is optional and gives
> 
> Hrm. "git reset --hard" doesn't just make the working tree match the
> index. It also resets the index to HEAD.  So either the original or your
> new description is wrong.
> 
> I think it's the latter. We really do reset the index unless
> --keep-index is specified.

I wonder if it is worth mentioning that "saving local modifications"
in 'git stash' means storing both the state of the worktree (of tracked
files in it) _and_ the state of the index, before both get set to
state of HEAD.

Best,
-- 
Jakub Narębski


Re: [PATCH 1/3] Documentation/stash: remove mention of git reset --hard

2017-01-24 Thread Jakub Narębski
W dniu 21.01.2017 o 21:08, Thomas Gummerer pisze:
> Don't mention git reset --hard in the documentation for git stash save.
> It's an implementation detail that doesn't matter to the end user and
> thus shouldn't be exposed to them.
> 
> Signed-off-by: Thomas Gummerer <t.gumme...@gmail.com>
> ---
>  Documentation/git-stash.txt | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 2e9cef06e6..0ad5335a3e 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -47,8 +47,9 @@ OPTIONS
>  
>  save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
> [-q|--quiet] []::
>  
> - Save your local modifications to a new 'stash', and run `git reset
> - --hard` to revert them.  The  part is optional and gives
> + Save your local modifications to a new 'stash', and revert the
> + the changes in the working tree to match the index.

I think the following might be better:

..., and set the working tree to match the index.

Or not, as it ignores problem of untracked files.

Anyway, removing internal implementation detail looks like a good idea.
OTOH the reader should be familiar with what `git reset --hard` does,
and if not, he knows where to find the information.

> + The  part is optional and gives
>   the description along with the stashed state.  For quickly making
>   a snapshot, you can omit _both_ "save" and , but giving
>   only  does not trigger this action to prevent a misspelled
> 

-- 
Jakub Narębski


Re: [RFC 1/2] grep: only add delimiter if there isn't one already

2017-01-24 Thread Jakub Narębski
W dniu 23.01.2017 o 14:15, Stefan Hajnoczi pisze:
> On Fri, Jan 20, 2017 at 10:16:31AM -0800, Junio C Hamano wrote:

>> My only piece of advice to folks who feel that way is to learn Git
>> more and get comfortable.  You can do neat things like
>>
>>$ git grep -e pattern rev -- t ':!t/helper/'
>>
>> that you cannot do with "rev:t", for example ;-)
> 
> Neat, thanks for showing the path exclusion syntax.  I wasn't aware of
> it.

That reminds me of mu TODO item: moving extended pathspec information
from gitglossary(7) manpage (sic!) to to-be-created gitpathspec(7).

-- 
Jakub Narębski



Re: merge maintaining history

2017-01-20 Thread Jakub Narębski
W dniu 19.01.2017 o 22:42, Junio C Hamano pisze:
> "David J. Bakeman" <nak...@comcast.net> writes:
 
[...]
>> Thanks I think that's close but it's a little more complicated I think
>> :<(  I don't know if this diagram will work but lets try.
>>
>> original A->B->C->D->E->F
>>  \
>> first branch  b->c->d->e
>>
>> new repo e->f->g->h
>>
>> Now I need to merge h to F without loosing b through h hopefully.  Yes e
>> was never merged back to the original repo and it's essentially gone now
>> so I can't just merge to F or can I?
> 
> With the picture, I think you mean 'b' is forked from 'B' and the
> first branch built 3 more commits on top, leading to 'e'.
> 
> You say "new repo" has 'e' thru 'h', and I take it to mean you
> started developing on top of the history that leads to 'e' you built
> in the first branch, and "new repo" has the resulting history that
> leads to 'h'.
> 
> Unless you did something exotic and non-standard, commit 'e' in "new
> repo" would be exactly the same as 'e' sitting on the tip of the
> "first branch", so the picture would be more like:
> 
>> original A->B->C->D->E->F
>>  \
>> first branch  b->c->d->e
>> \
>> new repo f->g->h
> 
> no?

On the other hand Git has you covered even if you did something 
non-standard, like starting new repo from the _state_ of 'e', that
is you have just copied files and created new repository, having
'e' (or actually 'e*') as an initial commit.

   original A<-B<-C<-D<-E<-F
\
   first branch  b<-c<-d<-e

   new repo   e*<-f<-g<-h

Note that arrows are in reverse direction, as it is newer commit
pointing to its parents, not vice versa.

Assuming that you have everything in a single repository, by adding
both original and new repo as "remotes", you can use 'git replace'
command to replace 'e*' with 'e'.

   original A<-B<-C<-D<-E<-F
\
   first branch  b<-c<-d<-e
   \
   new repo \-f<-g<-h
   (with refs/replace)

> Then merging 'h' into 'F' will pull everything you did since
> you diverged from the history that leads to 'F', resulting in a
> history of this shape:
> 
>> original A->B->C->D->E->F--M
>>  \/
>> first branch  b->c->d->e /
>> \   /
>> new repo f->g->h

Then you would have the above history in repositories that fetched
refs/replace/*, and the one below if replacement info is absent:

   original A<-B<-C<-D<-E<-F<---M
\  /
   first branch  b<-c<-d<-e   /
 /
   new repo   e*<-f->g->h

But as Junio said it is highly unlikely that you are in this situation.

HTH
-- 
Jakub Narębski



Re: Git: new feature suggestion

2017-01-20 Thread Jakub Narębski
W dniu 20.01.2017 o 01:26, Linus Torvalds pisze:
> On Thu, Jan 19, 2017 at 1:48 PM, Jakub Narębski <jna...@gmail.com> wrote:
>> W dniu 19.01.2017 o 19:39, Linus Torvalds pisze:
>>>
>>> You can do it in tig, but I suspect a more graphical tool might be better.
>>
>> Well, we do have "git gui blame".
> 
> Does that actually work for people? Because it really doesn't for me.
> 
> And I'm not just talking about the aesthetics of the thing, but the
> whole experience, and the whole "dig into parent" which just gives me
> an error message.

Strange. I had been using "git gui blame" _because_ of its "dig to parent"
functionality, and it worked for me just fine.

The other thing that I like about "git gui blame" is that it shows both
the commit that moved the fragment of code (via "git blame"), and the
commit that created the fragment of code (via "git blame -C -C -w", I think).


Anyway, all of this (sub)discussion is about archeology, but what might
be more important is automatic rename handling when integrating changes,
be it git-am, git-merge, or something else...

-- 
Jakub Narębski



Re: Git: new feature suggestion

2017-01-19 Thread Jakub Narębski
W dniu 19.01.2017 o 19:39, Linus Torvalds pisze:
> On Wed, Jan 18, 2017 at 10:33 PM, Konstantin Khomoutov
> <kostix+...@007spb.ru> wrote:
>>
>> Still, I welcome you to read the sort-of "reference" post by Linus
>> Torvalds [1] in which he explains the reasoning behind this approach
>> implemented in Git.
> 
> It's worth noting that that discussion was from some _very_ early days
> in git (one week into the whole thing), when none of those
> visualization tools were actually implemented.
> 
> Even now, ten years after the fact, plain git doesn't actually do what
> I outlined. Yes, "git blame -Cw" works fairly well, and is in general
> better than the traditional per-file "annotate". And yes, "git log
> --follow" does another (small) part of the outlined thing, but is
> really not very powerful.

It is really a pity that "git log --follow" is so limited; it's
development stopped at early 'good enough' implementation.

For example "git log --follow gitweb/gitweb.perl" would not show
the whole history of a file (which was once independent project),
and "git log --follow" doesn't work for directories or multiple
files.

> 
> Some tools on top of git do more, but I think in general this is an
> area that could easily be improved upon. For example, the whole
> iterative and interactive drilling down in history of a particular
> file is very inconvenient to do with "git blame" (you find a commit
> that change the area in some way that you don't find interesting, so
> then you have to restart git blame with the parent of that
> unintersting commit).
> 
> You can do it in tig, but I suspect a more graphical tool might be better.

Well, we do have "git gui blame".

[...]
-- 
Jakub Narębski


Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-11 Thread Jakub Narębski
W dniu 11.01.2017 o 11:20, Lars Schneider pisze: 
> On 10 Jan 2017, at 23:11, Jakub Narębski <jna...@gmail.com> wrote:
>> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
>>> larsxschnei...@gmail.com writes:
>>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>>
>>>> Some `clean` / `smudge` filters might require a significant amount of
>>>> time to process a single blob. During this process the Git checkout
>>>> operation is blocked and Git needs to wait until the filter is done to
>>>> continue with the checkout.
>>
>> Lars, what is expected use case for this feature; that is when do you
>> think this problem may happen?  Is it something that happened IRL?
> 
> Yes, this problem happens every day with filters that perform network
> requests (e.g. GitLFS). 

Do I understand it correctly that the expected performance improvement
thanks to this feature is possible only if there is some amount of
parallelism and concurrency in the filter?  That is, filter can be sending
one blob to Git while processing other one, or filter can be fetching blobs
in parallel.

This means that filter process works as a kind of (de)multiplexer for
fetching and/or processing blob contents, I think.

> [...] In GitLFS we even implemented Git wrapper
> commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988
> The ultimate goal of this patch is to be able to get rid of the wrapper 
> commands.

I'm sorry, I don't see it how the wrapper helps here.

> 
>>>> Teach the filter process protocol (introduced in edcc858) to accept the
>>>> status "delayed" as response to a filter request. Upon this response Git
>>>> continues with the checkout operation and asks the filter to process the
>>>> blob again after all other blobs have been processed.
>>>
>>> Hmm, I would have expected that the basic flow would become
>>>
>>> for each paths to be processed:
>>> convert-to-worktree to buf
>>> if not delayed:
>>> do the caller's thing to use buf
>>> else:
>>> remember path
>>>
>>> for each delayed paths:
>>> ensure filter process finished processing for path
>>> fetch the thing to buf from the process
>>> do the caller's thing to use buf
>>
>> I would expect here to have a kind of event loop, namely
>>
>>while there are delayed paths:
>>get path that is ready from filter
>>fetch the thing to buf (supporting "delayed")
>>if path done
>>do the caller's thing to use buf 
>>(e.g. finish checkout path, eof convert, etc.)
>>
>> We can either trust filter process to tell us when it finished sending
>> delayed paths, or keep list of paths that are being delayed in Git.
> 
> I could implement "get path that is ready from filter" but wouldn't
> that complicate the filter protocol? I think we can use the protocol pretty 
> much as if with the strategy outlined here:
> http://public-inbox.org/git/f533857d-9b51-44c1-8889-aa0542ad8...@gmail.com/

You are talking about the "busy-loop" solution, isn't it?  In the
same notation, it would look like this:

  while there are delayed paths:
  for each delayed path:
  request path from filter [1]
  fetch the thing (supporting "delayed") [2]
  if path done
  do the caller's thing to use buf
  remove it from delayed paths list


Footnotes:
--
1) We don't send the Git-side contents of blob again, isn't it?
   So we need some protocol extension / new understanding anyway.
   for example that we don't send contents if we request path again.
2) If path is not ready at all, filter protocol would send status=delayed
   with empty contents.  This means that we would immediately go to the
   next path, if there is one.

There are some cases where busy loop is preferable, but I don't think
it is the case here.


The event loop solution would require additional protocol extension,
but I don't think those complicate protocol too much:

A. Git would need to signal filter process that it has sent all paths,
and that it should be sending delayed paths when they are ready.  This
could be done for example with "command=continue".

packet:  git> command=continue


B. Filter driver, in the event-loop phase, when (de)multiplexing fetching
or processing of data

Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

2017-01-10 Thread Jakub Narębski
W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
> larsxschnei...@gmail.com writes:
>> From: Lars Schneider 
>>
>> Some `clean` / `smudge` filters might require a significant amount of
>> time to process a single blob. During this process the Git checkout
>> operation is blocked and Git needs to wait until the filter is done to
>> continue with the checkout.

Lars, what is expected use case for this feature; that is when do you
think this problem may happen?  Is it something that happened IRL?

>>
>> Teach the filter process protocol (introduced in edcc858) to accept the
>> status "delayed" as response to a filter request. Upon this response Git
>> continues with the checkout operation and asks the filter to process the
>> blob again after all other blobs have been processed.
> 
> Hmm, I would have expected that the basic flow would become
> 
>   for each paths to be processed:
>   convert-to-worktree to buf
>   if not delayed:
>   do the caller's thing to use buf
>   else:
>   remember path
> 
>   for each delayed paths:
>   ensure filter process finished processing for path
>   fetch the thing to buf from the process
>   do the caller's thing to use buf

I would expect here to have a kind of event loop, namely

while there are delayed paths:
get path that is ready from filter
fetch the thing to buf (supporting "delayed")
if path done
do the caller's thing to use buf 
(e.g. finish checkout path, eof convert, etc.)

We can either trust filter process to tell us when it finished sending
delayed paths, or keep list of paths that are being delayed in Git.

> 
> and that would make quite a lot of sense.  However, what is actually
> implemented is a bit disappointing from that point of view.  While
> its first part is the same as above, the latter part instead does:
> 
>   for each delayed paths:
>   checkout the path
> 
> Presumably, checkout_entry() does the "ensure that the process is
> done converting" (otherwise the result is simply buggy), but what
> disappoints me is that this does not allow callers that call
> "convert-to-working-tree", whose interface is obtain the bytestream 
> in-core in the working tree representation, given an object in the
> object-db representation in an in-core buffer, to _use_ the result
> of the conversion.  The caller does not have a chance to even see
> the result as it is written straight to the filesystem, once it
> calls checkout_delayed_entries().
> 



Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Jakub Narębski
Hello,

W dniu 28.11.2016 o 18:34, Johannes Schindelin pisze:

> My original "create a file in libexec/git-core/" was simple, did the job
> reliably, and worked also for testing.
> 
> It is a pity that you two gentlemen shot it down for being inelegant. And
> ever since, we try to find a solution that is as simple, works as
> reliably, also for testing, *and* appeases your tastes.

I just would like to note that existence of file is used for both
git-daemon and gitweb (the latter following the git-daemon example).

So there is a precedent for the use of this mechanism.

Best,
-- 
Jakub Narębski



Re: [PATCH v3 2/2] difftool: implement the functionality in the builtin

2016-11-27 Thread Jakub Narębski
Hello Johannes,

On 27 November 2016 at 12:10, Johannes Schindelin
<johannes.schinde...@gmx.de> wrote:
> On Fri, 25 Nov 2016, Jakub Narębski wrote:
> > W dniu 24.11.2016 o 21:55, Johannes Schindelin pisze:

[...]
> > > +static int difftool_config(const char *var, const char *value, void *cb)
> > > +{
> > > +   if (!strcmp(var, "diff.guitool")) {
> >
> > Shouldn't you also read other configuration variables, like "diff.tool",
> > and it's mergetool fallbacks ("merge.guitool", "merge.tool")?
>
> No, as those configuration variables are not used by the builtin difftool
> directly but read by subsequently spawned commands separately. There would
> be no use reading them here, for now.

Ah, all right then.

Though NEEDSWORK comment would be nice to have here (for when
we don't spawn commands).

[...]
> I read the rest of your review, but it appears that it is more about
> style than about substance, while I am only willing to address the latter
> issues at the moment. You see, I want to focus on getting difftool correct
> first before attempting to make it pretty.
>
> Ciao,
> Dscho

Well, excet for the submodule-relates stuff, which I have skipped,
it looks good to me.
-- 
Jakub Narębski


Re: [PATCH v3 2/2] difftool: implement the functionality in the builtin

2016-11-25 Thread Jakub Narębski
, , , ,
> +  ))

After rename it would read as:

  + if (parse_raw_diff(info.buf, , , , ,
  +))

Though now I see that you parse here index information of raw diff
(I think)... so disregard my musings.

> + break;
> + if (strbuf_getline_nul(, fp))
> + break;
> + src_path = lpath.buf;
> + src_path_len = lpath.len;
> +
> + i++;
> + if (status != 'C' && status != 'R') {
> + dst_path = src_path;
> + dst_path_len = src_path_len;
> + } else {
> + if (strbuf_getline_nul(, fp))
> + break;
> + dst_path = rpath.buf;
> + dst_path_len = rpath.len;
> + }


[...]
> + /*
> +  * Changes to submodules require special treatment.This loop writes a

Here and in few other places you are missing space after full stop.

  +  * Changes to submodules require special treatment. This loop writes a


> +  * temporary file to both the left and right directories to show the
> +  * change in the recorded SHA1 for the submodule.
> +  */
> + hashmap_iter_init(, );
> + while ((entry = hashmap_iter_next())) {
> + if (*entry->left) {
> + add_path(, ldir_len, entry->path);
> + ensure_leading_directories(ldir.buf);
> + write_file(ldir.buf, "%s", entry->left);
> + }
> + if (*entry->right) {
> + add_path(, rdir_len, entry->path);
> + ensure_leading_directories(rdir.buf);
> + write_file(rdir.buf, "%s", entry->right);
> + }
> + }
> +
> + /*
> +  * Symbolic links require special treatment.The standard "git diff"

Same here.

> +  * shows only the link itself, not the contents of the link target.
> +  * This loop replicates that behavior.
> +  */

Best,
-- 
Jakub Narębski



Re: [char-misc-next] mei: request async autosuspend at the end of enumeration

2016-11-25 Thread Jakub Narębski
W dniu 25.11.2016 o 04:14, Jeff King pisze:
> On Thu, Nov 24, 2016 at 10:37:14PM +, Winkler, Tomas wrote:
> 
>>>>> Cc: <sta...@vger.kernel.org> # 4.4+
>>>>
>>>> Looks like git send-email is not able to parse this address correctly
>>>> though this is suggested format by Documentation/stable_kernel_rules.txt.
>>>> Create wrong address If git parsers is used : 'sta...@vger.kernel.org#4.4+'
[...]

> The patch just brings parity to the Mail::Address behavior and git's
> fallback parser, so that you don't end up with the broken
> sta...@vger.kernel.org#4.4+ address. Instead, that content goes into the
> name part of the address.
> 
> It sounds like you want the "# 4.4+" to be dropped entirely in the
> rfc822 header. It looks like send-email used to do that, but stopped in
> b1c8a11c8 (send-email: allow multiple emails using --cc, --to and --bcc,
> 2015-06-30).
> 
> So perhaps there are further fixes required, but it's hard to know. The
> input isn't a valid rfc822 header, so it's not entirely clear what the
> output is supposed to be. I can buy either "drop it completely" or
> "stick it in the name field of the cc header" as reasonable.

Well, we could always convert it to email address comment, converting
for example the following trailer:

  Cc: John Doe <j...@example.com> # comment

to the following address:

  John Doe <j...@example.com> (comment)

Just FYI.  Though I'm not sure how well this would work...

Best,
-- 
Jakub Narębski


Re: What's cooking in git.git (Nov 2016, #05; Wed, 23)

2016-11-25 Thread Jakub Narębski
W dniu 24.11.2016 o 00:21, Junio C Hamano pisze:

> * nd/rebase-forget (2016-10-28) 1 commit
>  - rebase: add --forget to cleanup rebase, leave HEAD untouched
> 
>  "git rebase" learned "--forget" option, which allows a user to
>  remove the metadata left by an earlier "git rebase" that was
>  manually aborted without using "git rebase --abort".
> 
>  Waiting for a reroll.

It's always a good thing to stop requiring messing with .git insides.

> * jc/reset-unmerge (2016-10-24) 1 commit
>  - reset: --unmerge
> 
>  After "git add" is run prematurely during a conflict resolution,
>  "git diff" can no longer be used as a way to sanity check by
>  looking at the combined diff.  "git reset" learned a new
>  "--unmerge" option to recover from this situation.
> 
>  Will discard.
>  This may not be needed, given that update-index has a similar
>  option.

OTOH update-index is considered plumbing, so having "git reset --unmerge"
might be good thing (note that we can re-checkout file merge).
 
 > * jc/merge-drop-old-syntax (2015-04-29) 1 commit
>   (merged to 'next' on 2016-10-11 at 8928c8b9b3)
>  + merge: drop 'git merge  HEAD ' syntax
> 
>  Stop supporting "git merge  HEAD " syntax that has
>  been deprecated since October 2007, and issues a deprecation
>  warning message since v2.5.0.
> 
>  Will cook in 'next'.
> 



Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms

2016-11-20 Thread Jakub Narębski
W dniu 20.11.2016 o 18:32, Junio C Hamano pisze:
> Karthik Nayak <karthik@gmail.com> writes:
> 
>> We could have lstrip and rstrip as you suggested and perhaps make
>> it work together too. But I see this going off the scope of this
>> series. Maybe I'll follow up with another series introducing these
>> features. Since we can currently make do with 'strip=2' I'll drop
>> this patch from v8 of this series and pursue this idea after this.
> 
> My primary point was that if we know we want to add "rstrip" later
> and still decide not to add it right now, it is OK, but we will
> regret it if we named the one we are going to add right now "strip".
> That will mean that future users, when "rstrip" is introduced, will
> end up having to choose between "strip" and "rstrip" (as opposed to
> "lstrip" and "rstrip"), wondering why left-variant is more important
> and named without left/right prefix.

Another solution would be to implement 'splice=[,]',
where if length is omitted it means to the end; perhaps with special
casing (as in Perl) of negative  and/or negative .

Or implement POSIX shell expansion:

  %(parameter%word)  - Remove Smallest Suffix Glob Pattern.
  %(parameter%%word) - Remove Largest Suffix Glob Pattern.
  %(parameter#word)  - Remove Smallest Prefix Pattern.
  %(parameter##word) - Remove Largest Prefix Pattern.

Though this one looks like serious overkill...

-- 
Jakub Narębski



Re: [PATCH v7 14/17] ref-filter: allow porcelain to translate messages in the output

2016-11-18 Thread Jakub Narębski
W dniu 08.11.2016 o 21:12, Karthik Nayak pisze:
> From: Karthik Nayak <karthik@gmail.com>
> 
> Introduce setup_ref_filter_porcelain_msg() so that the messages used in
> the atom %(upstream:track) can be translated if needed. This is needed
> as we port branch.c to use ref-filter's printing API's.
> 
> Written-by: Matthieu Moy <matthieu@grenoble-inp.fr>
> Mentored-by: Christian Couder <christian.cou...@gmail.com>
> Mentored-by: Matthieu Moy <matthieu@grenoble-inp.fr>
> Signed-off-by: Karthik Nayak <karthik@gmail.com>
> ---
>  ref-filter.c | 28 
>  ref-filter.h |  2 ++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index b47b900..944671a 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -15,6 +15,26 @@
>  #include "version.h"
>  #include "wt-status.h"
>  
> +static struct ref_msg {
> + const char *gone;
> + const char *ahead;
> + const char *behind;
> + const char *ahead_behind;
> +} msgs = {
> + "gone",
> + "ahead %d",
> + "behind %d",
> + "ahead %d, behind %d"
> +};
> +
> +void setup_ref_filter_porcelain_msg(void)
> +{
> + msgs.gone = _("gone");
> + msgs.ahead = _("ahead %d");
> + msgs.behind = _("behind %d");
> + msgs.ahead_behind = _("ahead %d, behind %d");
> +}

Do I understand it correctly that this mechanism is here to avoid
repeated calls into gettext, as those messages would get repeated
over and over; otherwise one would use foo = N_("...") and _(foo),
isn't it?

I wonder if there is some way to avoid duplication here, but I don't
see anything easy and safe (e.g. against running setup_*() twice).

Best,
-- 
Jakub Narębski



Re: [PATCH v7 13/17] ref-filter: add `:dir` and `:base` options for ref printing atoms

2016-11-18 Thread Jakub Narębski
W dniu 15.11.2016 o 18:42, Junio C Hamano pisze:
> Jacob Keller <jacob.kel...@gmail.com> writes:
> 
>> dirname makes sense. What about implementing a reverse variant of
>> strip, which you could perform stripping of right-most components and
>> instead of stripping by a number, strip "to" a number, ie: keep the
>> left N most components, and then you could use something like
>> ...
>> I think that would be more general purpose than basename, and less confusing?
> 
> I think you are going in the right direction.  I had a similar
> thought but built around a different axis.  I.e. if strip=1 strips
> one from the left, perhaps we want to have rstrip=1 that strips one
> from the right, and also strip=-1 to mean strip everything except
> one from the left and so on?.  I think this and your keep (and
> perhaps you'll have rkeep for completeness) have the same expressive
> power.  I do not offhand have a preference one over the other.
> 
> Somehow it sounds a bit strange to me to treat 'remotes' as the same
> class of token as 'heads' and 'tags' (I'd expect 'heads' and
> 'remotes/origin' would be at the same level in end-user's mind), but
> that is probably an unrelated tangent.  The reason this series wants
> to introduce :base must be to emulate an existing feature, so that
> existing feature is a concrete counter-example that argues against
> my "it sounds a bit strange" reaction.

If it is to implement the feature where we select if to display only
local branches (refs/heads/**), only remote-tracking branches
(refs/remotes/**/**), or only tags (refs/tags/**), then perhaps
':category' or ':type' would make sense?

As in '%(refname:category)', e.g.

  %(if:equals=heads)%(refname:category)%(then)...%(end)

-- 
Jakub Narębski



Re: [PATCH v7 10/17] ref-filter: introduce refname_atom_parser_internal()

2016-11-18 Thread Jakub Narębski
W dniu 08.11.2016 o 21:12, Karthik Nayak pisze:
> From: Karthik Nayak 
> 
> Since there are multiple atoms which print refs ('%(refname)',
> '%(symref)', '%(push)', '%upstream'), it makes sense to have a common

Minor typo; it should be: "%(upstream)"

> ground for parsing them. This would allow us to share implementations of
> the atom modifiers between these atoms.
> 
> Introduce refname_atom_parser_internal() to act as a common parsing
> function for ref printing atoms. This would eventually be used to
> introduce refname_atom_parser() and symref_atom_parser() and also be
> internally used in remote_ref_atom_parser().
> 
> Helped-by: Jeff King 
> Signed-off-by: Karthik Nayak 
> ---
[...]

> +static void refname_atom_parser_internal(struct refname_atom *atom,
> +  const char *arg, const char *name)
> +{
> + if (!arg)
> + atom->option = R_NORMAL;
> + else if (!strcmp(arg, "short"))
> + atom->option = R_SHORT;
> + else if (skip_prefix(arg, "strip=", )) {
> + atom->option = R_STRIP;
> + if (strtoul_ui(arg, 10, >strip) || atom->strip <= 0)
> + die(_("positive value expected refname:strip=%s"), arg);
> + }   else
  ^^

It looks like you have spurious tab here.

> + die(_("unrecognized %%(%s) argument: %s"), name, arg);
> +}
> +
>  static void remote_ref_atom_parser(struct used_atom *atom, const char *arg)
>  {
>   struct string_list params = STRING_LIST_INIT_DUP;
> 



Re: [PATCH v7 09/17] ref-filter: make "%(symref)" atom work with the ':short' modifier

2016-11-18 Thread Jakub Narębski
W dniu 08.11.2016 o 21:12, Karthik Nayak pisze:
[...]
 
> Add tests for %(symref) and %(symref:short) while we're here.

That's nice.

> 
> Helped-by: Junio C Hamano <gits...@pobox.com>
> Signed-off-by: Karthik Nayak <karthik@gmail.com>
> ---
[...]

> +test_expect_success 'Add symbolic ref for the following tests' '
> + git symbolic-ref refs/heads/sym refs/heads/master
> +'
> +
> +cat >expected < +refs/heads/master
> +EOF

This should be inside the relevant test, not outside.  In other
patches in this series you are putting setup together with the
rest of test, by using "cat >expected <<-\EOF".

> +
> +test_expect_success 'Verify usage of %(symref) atom' '
> + git for-each-ref --format="%(symref)" refs/heads/sym > actual &&

This should be spelled " >actual", rather than " > actual"; there
should be no space between redirection and file name.

> + test_cmp expected actual
> +'
> +
> +cat >expected < +heads/master
> +EOF
> +
> +test_expect_success 'Verify usage of %(symref:short) atom' '
> +     git for-each-ref --format="%(symref:short)" refs/heads/sym > actual &&
> + test_cmp expected actual
> +'

Same here.

> +
>  test_done
> 

-- 
Jakub Narębski



Re: [PATCH v7 03/17] ref-filter: implement %(if:equals=) and %(if:notequals=)

2016-11-18 Thread Jakub Narębski
W dniu 08.11.2016 o 21:11, Karthik Nayak pisze:
> From: Karthik Nayak <karthik@gmail.com>
> 
> Implement %(if:equals=) wherein the if condition is only
> satisfied if the value obtained between the %(if:...) and %(then) atom
> is the same as the given ''.
> 
> Similarly, implement (if:notequals=) wherein the if condition
> is only satisfied if the value obtained between the %(if:...) and
> %(then) atom is differnt from the given ''.
  

s/differnt/different/   <-- typo

> 
> This is done by introducing 'if_atom_parser()' which parses the given
> %(if) atom and then stores the data in used_atom which is later passed
> on to the used_atom of the %(then) atom, so that it can do the required
> comparisons.

Nb. the syntax reminds me a bit of RPM SPEC language.

> 
> Add tests and Documentation for the same.
> 
> Mentored-by: Christian Couder <christian.cou...@gmail.com>
> Mentored-by: Matthieu Moy <matthieu@grenoble-inp.fr>
> Signed-off-by: Karthik Nayak <karthik@gmail.com>
> ---
>  Documentation/git-for-each-ref.txt |  3 +++
>  ref-filter.c   | 43 
> +-
>  t/t6302-for-each-ref-filter.sh | 18 
>  3 files changed, 59 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/git-for-each-ref.txt 
> b/Documentation/git-for-each-ref.txt
> index fed8126..b7b8560 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -155,6 +155,9 @@ if::
>   evaluating the string before %(then), this is useful when we
>   use the %(HEAD) atom which prints either "*" or " " and we
>   want to apply the 'if' condition only on the 'HEAD' ref.

So %(if) is actually %(if:notempty) ?  Just kidding.

> + Append ":equals=" or ":notequals=" to compare
> + the value between the %(if:...) and %(then) atoms with the
> + given string.
>  
>  In addition to the above, for commit and tag objects, the header
>  field names (`tree`, `parent`, `object`, `type`, and `tag`) can
> diff --git a/ref-filter.c b/ref-filter.c
> index 8392303..44481c3 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -22,6 +22,8 @@ struct align {
>  };
>  
>  struct if_then_else {
> + const char *if_equals,
> + *not_equals;

I guess using anonymous structs from C11 here...

>   unsigned int then_atom_seen : 1,
>   else_atom_seen : 1,
>   condition_satisfied : 1;
> @@ -49,6 +51,10 @@ static struct used_atom {
>   enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, 
> C_SUB } option;
>   unsigned int nlines;
>   } contents;
> + struct {
> + const char *if_equals,
> + *not_equals;
> + } if_then_else;

...to avoid code duplication there is rather out of question?

>   enum { O_FULL, O_SHORT } objectname;
>   } u;
>  } *used_atom;
> @@ -169,6 +175,19 @@ static void align_atom_parser(struct used_atom *atom, 
> const char *arg)
>   string_list_clear(, 0);
>  }
>  
> +static void if_atom_parser(struct used_atom *atom, const char *arg)
> +{
> + if (!arg)
> + return;
> + else if (skip_prefix(arg, "equals=", >u.if_then_else.if_equals))
> +  ;
> + else if (skip_prefix(arg, "notequals=", 
> >u.if_then_else.not_equals))
> + ;

Those ';' should be perfectly aligned, isn't it?
 
[...]
> +test_expect_success 'check %(if:equals=)' '
> + git for-each-ref 
> --format="%(if:equals=master)%(refname:short)%(then)Found master%(else)Not 
> master%(end)" refs/heads/ >actual &&
> + cat >expect <<-\EOF &&
> + Found master
> + Not master
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'check %(if:notequals=)' '
> + git for-each-ref 
> --format="%(if:notequals=master)%(refname:short)%(then)Not master%(else)Found 
> master%(end)" refs/heads/ >actual &&
> + cat >expect <<-\EOF &&
> + Found master
> + Not master
> + EOF
> + test_cmp expect actual
> +'

Nice!

-- 
Jakub Narębski



Re: RFC: Enable delayed responses to Git clean/smudge filter requests

2016-11-16 Thread Jakub Narębski
W dniu 16.11.2016 o 10:53, Lars Schneider pisze:
> On 15 Nov 2016, at 19:03, Junio C Hamano <gits...@pobox.com> wrote:
>> Lars Schneider <larsxschnei...@gmail.com> writes:
>>
>>>> The filter itself would need to be aware of parallelism
>>>> if it lives for multiple objects, right?
>>>
>>> Correct. This way Git doesn't need to deal with threading...
[...]

>> * You'd need to rein in the maximum parallelism somehow, as you do
>>   not want to see hundreds of competing filter processes starting
>>   only to tell the main loop over an index with hundreds of entries
>>   that they are delayed checkouts.
> 
> I intend to implement this feature only for the new long running filter
> process protocol. OK with you?

If I remember and understand it correctly, current version of long
running process protocol processes files sequentially, one by one:
git sends file to filter wholly, and receives response wholly.

In the single-file filter case, git calls filter process as async
task, in a separate thread, so that one thread feeds the filter,
and main thread (I think?) reads from it, to avoid deadlocks.

Couldn't something like this be done for long running filter process,
via protocol extension?  Namely, Git would send in an async task
content to be filtered, and filter process would stream the response
back, in any order.  If it would be not enough, we could borrow
idea of channels, and be sending few files back concurrently in
parallel, as separate channels... though that would probably
require quite a bit of change in caller.

Best,
-- 
Jakub Narębski



Re: [PATCH] sequencer: shut up clang warning

2016-11-09 Thread Jakub Narębski
W dniu 09.11.2016 o 14:56, Johannes Schindelin pisze:

> When comparing a value of type `enum todo_command` with a value that is
> outside the defined enum constants, clang greets the developer with this
> warning:
> 
>   comparison of constant 2 with expression of type
>   'const enum todo_command' is always true
> 
> While this is arguably true *iff* the value was never cast from a
> free-form int, we should keep the cautious code in place.
> 
> To shut up clang, we simply introduce an otherwise pointless enum constant
> and compare against that.
> 
> Noticed by Torsten Bögershausen.
> 
> Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
> ---

>  sequencer.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 5fd75f3..f80e9c0 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -619,7 +619,8 @@ static int allow_empty(struct replay_opts *opts, struct 
> commit *commit)
>  
>  enum todo_command {
>   TODO_PICK = 0,
> - TODO_REVERT
> + TODO_REVERT,
> + TODO_INVALID
>  };

Why not name it TODO_N, or N_TODO, or something like that?

-- 
Jakub Narębski



Re: [PATCH v5 01/16] Git.pm: add subroutines for commenting lines

2016-11-09 Thread Jakub Narębski
W dniu 09.11.2016 o 18:02, Vasco Almeida pisze:
> A Ter, 08-11-2016 às 17:06 -0800, Junio C Hamano escreveu:
>> Vasco Almeida <vascomalme...@sapo.pt> writes:

>>> +sub comment_lines {
>>> +   my $comment_line_char = config("core.commentchar") || '#';
>>> +   return prefix_lines("$comment_line_char ", @_);
>>> +}
>>> +
>>
>> This makes it appear as if comment_lines can take arbitrary number
>> of strings as its arguments (because the outer caller just passes @_
>> thru), but in fact because prefix_lines ignores anything other than
>> $_[0] and $_[1], only the first parameter given to comment_lineS sub
>> is inspected for lines in it and the prefix-char prefixed at the
>> beginning of each of them.
>>
>> Which is not a great interface, as it is quite misleading.
>>
>> Perhaps
>>
>>  prefix_lines("#", join("\n", @_));
>>
>> or something like that may make it less confusing.
> 
> I prefer to have like this instead
> 
> sub prefix_lines {
> my $prefix = shift;
> my $string = join("\n", @_);
> $string =~ s/^/$prefix/mg;
> return $string;
> }
> 
> So both subroutines can take several strings as arguments.

I like the interface, but the implementation looks a bit inefficient.
Why not simply:

  sub prefix_lines {
  my $prefix = shift;
  return "$prefix" . join("\n$prefix", @_) . "\n";
  }

That is, if we can assume that those lines are not terminated by
newlines themselves.

If they can be (but cannot have embedded newlines), then

  sub prefix_lines {
  my $prefix = shift;
  return "$prefix" . join("\n$prefix", map(chomp, @_)) . "\n";
  }

If those strings can contain embedded newlines (so that they can be
called as in Junio example), then your solution is a must-be

  sub prefix_lines {
  my $prefix = shift;
  my $string = join("\n", @_);
  $string =~ s/^/$prefix/mg;
  return $string;
  }

Well, nevermind then
-- 
Jakub Narębski



Re: Git issue - ignoring changes to tracked file with assume-unchanged

2016-11-07 Thread Jakub Narębski
W dniu 01.11.2016 o 19:11, Junio C Hamano pisze:
> Jeff King <p...@peff.net> writes:
>> On Tue, Nov 01, 2016 at 10:28:57AM +, Halde, Faiz wrote:
>>
>>> I frequently use the following command to ignore changes done in a file
>>>
>>> git update-index --assume-unchanged somefile
>>>
>>> Now when I do a pull from my remote branch and say the file 'somefile'
>>> was changed locally and in remote, git will abort the merge saying I
>>> need to commit my changes of 'somefile'.
>>>
>>> But isn't the whole point of the above command to ignore the changes
>>> within the file?
>>
>> No. The purpose of --assume-unchanged is to promise git that you will
>> not change the file, so that it may skip checking the file contents in
>> some cases as an optimization.
> 
> That's correct.  
> 
> The next anticipated question is "then how would I tell Git to
> ignore changes done to a file locally by me?", whose short answer is
> "You don't", of course.

Well, you can always use --skip-worktree.  It is a better fit than using
--assume-unchanged, because at least you wouldn't loose your precious
local changes (which happened to me).

OTOH it doesn't solve your issue of --skip-worktree / --assume-unchanged
blocking operation (pull in your case, stash is what I noticed problem
with when using --skip-worktree).

But --skip-worktree is still workaround...

-- 
Jakub Narębski



Re: Reporting Bug in Git Version Control System

2016-10-31 Thread Jakub Narębski
W dniu 25.10.2016 o 10:51, Pranit Bauva pisze:
> Hey Yash,
> 
> Junio has explained the problem very well. Since you seem to be a
> beginner (guessing purely by your email) I will tell you how to fix
> it.
> 
> Remember when you would have first installed git, you would have done
> something like `git config --global user.name ` and
> `git config --global user.email `, it gets
> permanently stored in the git configuration file (~/.gitconfig). Now
> all the commits in git are made with this name and email. If you want
> to change this, again run the above commands with your new name and
> email. After that commits will be done with a different name and
> email. Hope this helps! :)

First, per user Git configuration doesn't necessarily go into ~/.gitconfig;
it could be in ~/.config/git/config

Second, you can configure user.email and user.name with `git config` or
`git config --local`.  The information would go into .git/config for
specific repository.  This might be a better solution if you want
different settings for different repositories.

-- 
Jakub Narębski



Re: RFC Failover url for fetches?

2016-10-23 Thread Jakub Narębski
W dniu 21.10.2016 o 21:03, Junio C Hamano pisze:
> Stefan Beller <sbel...@google.com> writes:
> 
>> So when pushing it is possible to have multiple urls
>> (remote..url) configured.
>>
>> When fetching only the first configured url is considered.
>> Would it make sense to allow multiple urls and
>> try them one by one until one works?
> 
> I do not think the two are related.  Pushing to multiple is not "I
> want to update at least one of them" in the first place.

Push is/should be 'update all', fetch is/should be 'fetch any'.
I thought that multiple remote..url values provide this
fallback for fetch, but it looks like it isn't so...

> 
> As to fetching from two or more places as "fallback", I am
> moderately negative to add it as a dumb feature that does nothing
> more than "My fetch from A failed, so let's blindly try it from B".
> I'd prefer to keep the "My fetch from A is failing" knowledge near
> the surface of end user's consciousness as a mechanism to pressure A
> to fix it--that way everybody who is fetching from A benefits.
> After all, doing "git remote add B" once (you'd need to tell the URL
> for B anyway to Git) and issuing "git fetch B" after seeing your
> regular "git fetch" fails once in a blue moon is not all that
> cumbersome, I would think.

One would need to configure fallback B remote to use the same
remote-branch namespace as remote A, if it is to be used as fallback,
I would think.

> 
> Some people _may_ have objection based on A and B going out of sync,
> especially B may fall behind even yourself and cause non-ff errors,
> but I personally am not worried about that, because when somebody
> configures B as a fallback for A, there is an expectation that B is
> kept reasonably up to date.  It would be a problem if some refs are
> expected to be constantly rewound at A (e.g. 'pu' in my tree) and
> configured to always force-fetch, though.  A stale B would silently
> set such a branch in your repository back without failing.

Nb. there is also http-alternates mechanism... which nowadays doesn't
matter anyway, I would think.

-- 
Jakub Narębski



Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-15 Thread Jakub Narębski
W dniu 15.10.2016 o 16:45, Lars Schneider pisze:
>> On 12 Oct 2016, at 03:54, Jakub Narębski <jna...@gmail.com> wrote:
>> W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>>>> On 09 Oct 2016, at 01:06, Jakub Narębski <jna...@gmail.com> wrote:
>>>>>
>>>
>>>>> After the filter started
>>>>> Git sends a welcome message ("git-filter-client"), a list of
>>>>> supported protocol version numbers, and a flush packet. Git expects
>>>>> +to read a welcome response message ("git-filter-server") and exactly
>>>>> +one protocol version number from the previously sent list. All further
>>>>> +communication will be based on the selected version. The remaining
>>>>> +protocol description below documents "version=2". Please note that
>>>>> +"version=42" in the example below does not exist and is only there
>>>>> +to illustrate how the protocol would look like with more than one
>>>>> +version.
>>>>> +
>>>>> +After the version negotiation Git sends a list of all capabilities that
>>>>> +it supports and a flush packet. Git expects to read a list of desired
>>>>> +capabilities, which must be a subset of the supported capabilities list,
>>>>> +and a flush packet as response:
>>>>> +
>>>>> +packet:  git> git-filter-client
>>>>> +packet:  git> version=2
>>>>> +packet:  git> version=42
>>>>> +packet:  git> 
>>>>> +packet:  git< git-filter-server
>>>>> +packet:  git< version=2
>>>>> +packet:  git> clean=true
>>>>> +packet:  git> smudge=true
>>>>> +packet:  git> not-yet-invented=true
>>>>> +packet:  git> 
>>>>> +packet:  git< clean=true
>>>>> +packet:  git< smudge=true
>>>>> +packet:  git< 
>>>>
>>>> WARNING: This example is different from description!!!
>>>
>>> Can you try to explain the difference more clearly? I read it multiple
>>> times and I think this is sound.
>>
>> I'm sorry it was *my mistake*.  I have read the example exchange wrong.
>>
>> On the other hand that means that I have other comment, which I though
>> was addressed already in v10, namely that not all exchanges ends with
>> flush packet (inconsistency, and I think a bit of lack of extendability).
> 
> Well, this part of the protocol is not supposed to be extensible because
> it is supposed to deal *only* with the version number. It needs to keep 
> the same structure to ensure forward and backward compatibility.
> 
> However, for consistency sake I will add a flush packet.

Thanks.  That is one thing I feel quite strongly about.

I can agree that extendability does not matter much here: we can always
change the version number.  But there might be some additional information
that filter process wants to send to Git in first exchange, and using
flush-terminated list here means that we don't need to change version
number, assuming that this additional information is advisory.

The consistency means in my opinion that it should be easier to implement
filter scripts.

>>>> In description above the example you have 4-part handshake, not 3-part;
>>>> the filter is described to send list of supported capabilities last
>>>> (a subset of what Git command supports).
>>>
>>> Part 1: Git sends a welcome message...
>>> Part 2: Git expects to read a welcome response message...
>>> Part 3: After the version negotiation Git sends a list of all 
>>> capabilities...
>>> Part 4: Git expects to read a list of desired capabilities...
>>>
>>> I think example and text match, no?
>>
>> Yes, it does; as I have said already, I have misread the example. 
>>
>> Anyway, in some cases 4-way handshake, where Git sends list of
>> supported capabilities first, is better.  If the protocol has
>> to prepare something for each of capabilities, and perhaps check
>> those preparation status, it can do it after Git sends what it
>> could need, and before it sends what it does support.
>>
>> Though it looks a bit strange that client (as Git is client here)
>> sends its capabilities first...
> 
> Git tells the filter what it can do. Then the filter decides what
> features it supports. I would prefer to keep i

Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:
> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
> 
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

It's important to cover most common cases occurring naturally in
people's repositories, while trying to avoid false positives (the latter
is important more now, where gitweb doesn't check for rev name validity).

I guess that most common is to use non-hierarchical tags with ASCII-only
names for describe output, so while "æ/var-gf6727b0" won't be picked,
it is IMVHO also much less likely to occur.

> 
> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
> 
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
> 
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.

All right.  That's good and enough for one patch.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Acked-by: Jakub Narębski <jna...@gmail.com>

BTW. to add to whole "git describe" output or not discussion: having link
span whole revision marker makes for easier to use UI: larger are to hit.

> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b

Nice using of eXplained regexp.  It is much easier to understand with
comments.

> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }
> 



Re: [PATCH v2 2/6] trailer: use list.h for doubly-linked list

2016-10-14 Thread Jakub Narębski
W dniu 13.10.2016 o 01:40, Jonathan Tan pisze:
> Replace the existing handwritten implementation of a doubly-linked list
> in trailer.c with the functions and macros from list.h. This
> significantly simplifies the code.
> ---
>  trailer.c | 258 
> ++
>  1 file changed, 91 insertions(+), 167 deletions(-)

Very nice gains!

BTW. could you sign (Signed-off-by) your patches? TIA.

Best,
-- 
Jakub Narębski



Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:

> Change the minimum length of an abbreviated object identifier in the
> commit message gitweb tries to turn into link from 8 hexchars to 7.
> 
> This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
> SHA-1 in commit log message links to "object" view", 2006-12-10), but
> the default abbreviation length is 7, and has been for a long time.
> 
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.

That's nice explanation why we want to support 7 character SHA-1
(it is the default, and was seen in the wild), but don't need to
support shorter lengths.

Another reason (just for completeness; you don't need to put it in
the commit message) to not go below 7 characters is that with 
decreasing length there is more and more chance for false positives
(like 'beef' or 'caffee' for 4-char or 5-char hexstring), as I wrote
previously.

s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).

> 
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a ("first working version",
> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> them as far as I can tell.

All right.  If we decide to be more strict in what we accept, we can
do it in a separate commit.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Acked-by: Jakub Narębski <jna...@gmail.com>

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index cba7405..92b5e91 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{

By the way, it is quite long commit message for one character change.
Not that it is a bad thing...

>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
>   }eg;
> 



Re: [PATCH v2 1/3] gitweb: Fix a typo in a comment

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason napisał:

> Change a typo'd MIME type in a comment. The Content-Type is
> application/xhtml+xml, not application/xhtm+xml.
> 
> Fixes up code originally added in 53c4031 ("gitweb: Strip
> non-printable characters from syntax highlighter output", 2011-09-16).
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>

Good.  Thanks for taking care of this.
For what is worth for such a trivial patch:

Acked-by: Jakub Narębski <jna...@gmail.com>

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 44094f4..cba7405 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1616,7 +1616,7 @@ sub esc_path {
>   return $str;
>  }
>  
> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>  sub sanitize {
>   my $str = shift;
>  
> 



Re: Huge performance bottleneck reading packs

2016-10-14 Thread Jakub Narębski
W dniu 13.10.2016 o 11:04, Vegard Nossum pisze:
> On 10/13/2016 01:47 AM, Jeff King wrote:
>> On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote:
>>
>>> Also, is it possible to make the repository in question available? I
>>> might be able to reproduce based on your description, but it would save
>>> time if I could directly run gdb on your example.
> 
> I won't be able to make the repository available, sorry.

We have 'git fast-export --anonymize ...', but it would obviously
not preserve the pack structure (though it can help in other cases
when one cannot make repository available).

-- 
Jakub Narębski



Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-12 Thread Jakub Narębski
W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>> On 09 Oct 2016, at 01:06, Jakub Narębski <jna...@gmail.com> wrote:
>>
>> Part 1 of review, starting with the protocol v2 itself.
>>
>> W dniu 08.10.2016 o 13:25, larsxschnei...@gmail.com pisze:
>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>
>>> +upon checkin. By default these commands process only a single
>>> +blob and terminate.  If a long running `process` filter is used
>>> +in place of `clean` and/or `smudge` filters, then Git can process
>>> +all blobs with a single filter command invocation for the entire
>>> +life of a single Git command, for example `git add --all`.  See
>>> +section below for the description of the protocol used to
>>> +communicate with a `process` filter.
>>
>> I don't remember how this part looked like in previous versions
>> of this patch series, but "... is used in place of `clean` ..."
>> does not tell explicitly about the precedence of those 
>> configuration variables.  I think it should be stated explicitly
>> that `process` takes precedence over any `clean` and/or `smudge`
>> settings for the same `filter.` (regardless of whether
>> the long running `process` filter support "clean" and/or "smudge"
>> operations or not).
> 
> This is stated explicitly later on. I moved it up here:
> 
> "If a long running `process` filter is used
> in place of `clean` and/or `smudge` filters, then Git can process
> all blobs with a single filter command invocation for the entire
> life of a single Git command, for example `git add --all`. If a 
> long running `process` filter is configured then it always takes 
> precedence over a configured single blob filter. "
> 
> OK?

Looks good to me.

I think this information about precedence between one-shot `clean`
and `smudge` filter driver configuration, and multi-file `process`
filter driver should be here for two reasons.

First, if one is interested in running filter, but do not want to
write one (he or she uses existing tool, for example one of
existing LFS solutions), one can skip the "Long Running Filter
Process" section.  But one still needs to know if to remove or
comment out old `clean` and `smudge` config, or how to provide
fallback for older Git (if one uses the same configuration with
pre-process Git and Git including support for this feature).

Second, the configuration belongs, in my opinion, here.  It is
not a part of long running filter protocol.

>>> +If the filter command (a string value) is defined via
>>> +`filter..process` then Git can process all blobs with a
>>> +single filter invocation for the entire life of a single Git
>>> +command. This is achieved by using a packet format (pkt-line,
>>> +see technical/protocol-common.txt) based protocol over standard
>>> +input and standard output as follows. All packets, except for the
>>> +"*CONTENT" packets and the "" flush packet, are considered
>>> +text and therefore are terminated by a LF.
>>
>> Maybe s/standard input and output/\& of filter process,/ (that is,
>> add "... of filter process," to the third sentence in the above
>> paragraph).
> 
> You mean "This is achieved by using a packet format (pkt-line,
> see technical/protocol-common.txt) based protocol over standard
> input and standard output of filter process as follows." ?

Yes.

> I think I like the original version better.

Well, I think it is better to err out on the side of being more
explicit.

> 
>>> After the filter started
>>> Git sends a welcome message ("git-filter-client"), a list of
>>> supported protocol version numbers, and a flush packet. Git expects
>>> +to read a welcome response message ("git-filter-server") and exactly
>>> +one protocol version number from the previously sent list. All further
>>> +communication will be based on the selected version. The remaining
>>> +protocol description below documents "version=2". Please note that
>>> +"version=42" in the example below does not exist and is only there
>>> +to illustrate how the protocol would look like with more than one
>>> +version.
>>> +
>>> +After the version negotiation Git sends a list of all capabilities that
>>> +it supports and a flush packet. Git expects to read a list of desired
>>> +capabilities, which must be a subset of the supported capabilities list,
>>> +and a flush packet as response:
>>> +
>>> +packet:  git> git-filter-clien

Re: [PATCH 2/2] reset: support the --stdin option

2016-10-11 Thread Jakub Narębski
W dniu 11.10.2016 o 18:09, Johannes Schindelin pisze:

>  SYNOPSIS
>  
>  [verse]
> -'git reset' [-q] [] [--] ...
> +'git reset' [-q] [--stdin [-z]] [] [--] ...

I think you meant here

  +'git reset' [-q] [--stdin [-z]] []

Because you say "*Instead*" below.

> +--stdin::
> + Instead of taking list of paths from the command line,
> + read list of paths from the standard input.  Paths are
> + separated by LF (i.e. one path per line) by default.

And die if  were supplied:

> + if (pathspec.nr)
> + die(_("--stdin is incompatible with path arguments"));

Of course you need to fix it in built-in synopsis as well:

> + N_("git reset [-q] [--stdin [-z]] [] [--] ..."),
>   N_("git reset --patch [] [--] [...]"),

-- 
Jakub Narębski



Re: git merge deletes my changes

2016-10-11 Thread Jakub Narębski
W dniu 10.10.2016 o 19:52, Paul Smith pisze:
> On Mon, 2016-10-10 at 10:19 +, Eduard Egorov wrote:
>> # ~/gitbuild/git-2.10.1/git merge -s subtree --squash ceph_ansible
>>
>> Can somebody confirm this please? Doesn't "merge -s subtree" really
>> merges branches?
> 
> I think possibly you're not fully understanding what the --squash flag
> does... that's what's causing your issue here, not the "-s" option.
> 
> A squash merge takes the commits that would be merged from the origin
> branch and squashes them into a single patch and applies them to the
> current branch as a new commit... but this new commit is not a merge
> commit (that is, when you look at it with "git show" etc. the commit
> will have only one parent, not two--or more--parents like a normal merge
> commit).
> 
> Basically, it's syntactic sugar for a diff plus patch operation plus
> some Git goodness wrapped around it to make it easier to use.

Actually this is full merge + commit surgery (as if you did merge with
--no-commit, then deleted MERGE_HEAD); the state of worktree is as if
it were after a merge.

> 
> But ultimately once you're done, Git has no idea that this new commit
> has any relationship whatsoever to the origin branch.  So the next time
> you merge, Git doesn't know that there was a previous merge and it will
> try to merge everything from scratch rather than starting at the
> previous common merge point.
> 
> So either you'll have to use a normal, non-squash merge, or else you'll
> have to tell Git by hand what the previous common merge point was (as
> Jeff King's excellent email suggests).  Or else, you'll have to live
> with this behavior.

The `git subtree` command (from contrib) allows yet another way: it
squashes *history* of merged subproject (as if with interactive rebase
'squash'), then merges this squash commit.

Now I know why this feature is here...
-- 
Jakub Narębski



Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-11 Thread Jakub Narębski
W dniu 11.10.2016 o 13:51, SZEDER Gábor pisze:
> Quoting Duy Nguyen <pclo...@gmail.com>:
>> On Fri, Oct 7, 2016 at 10:55 PM, Jakub Narębski <jna...@gmail.com> wrote:
>>> W dniu 07.10.2016 o 16:19, Johannes Schindelin pisze:
>>>> On Fri, 7 Oct 2016, Jakub Narębski wrote:
>>>
>>>>> Note that we would have to teach git completion about new syntax;
>>>>> or new configuration variable if we go that route.
>>>>
>>>> Why would we? Git's completion does not expand aliases, it only completes
>>>> the aliases' names, not their values.
>>>
>>> Because Git completion finds out which _options_ and _arguments_
>>> to complete for an alias based on its expansion.
>>>
>>> Yes, this is nice bit of trickery...
>>
>> It's c63437c (bash: improve aliased command recognition - 2010-02-23)
>> isn't it? This may favor j6t's approach [1] because retrieving alias
>> attributes is much easier.
>>
>> [1] 
>> https://public-inbox.org/git/20161006190720.4ecf3ptl6mszt...@sigill.intra.peff.net/T/#mb1d7b8f31d595b05105b8ea2137756761e13dbf4
>> -- 
>> Duy
> 
> The completion script is concerned in three ways:
> 
>   1. it has to get the names of all aliases, to offer them along with
>  git commands for 'git ' or 'git help ',
> 
>   2. it has to get the command executed in place of the alias, and
> 
>   3. strip everything that can't be a git command, so it can offer the
>  right options for the aliased command.

There is also a possibility to tell the completion script which git
command to use for option completion via some trickery, even if the
first git command of many used in script is not right (e.g. when
"$@" is passed somewhere in the middle).

I don't remember exact details, but let's look at source:

  # If you use complex aliases of form '!f() { ... }; f', you can use the null
  # command ':' as the first command in the function body to declare the desired
  # completion style.  For example '!f() { : git commit ; ... }; f' will
  # tell the completion to use commit completion.  This also works with aliases
  # of form "!sh -c '...'".  For example, "!sh -c ': git commit ; ... '".

Very nice.

> 
> The '!!' syntax is the easiest, I think it will just work even with
> the current completion code, no changes necessary.
> 
> The '(nocd)' form is still easy, we only have to add this '(nocd)' to
> that list of stripped words for (3), but no further changes necessary
> for (1) and (2).

Shouldn't the '!' vs '!!' / '(nocd)!' affect pathname completion?
Or do tab completion in subdir offer wrong completion of filenames
for aliases?

Best,
-- 
Jakub Narębski



Re: How to watch a mailing list & repo for patches which affect a certain area of code?

2016-10-10 Thread Jakub Narębski
W dniu 09.10.2016 o 21:03, Ian Kelling pisze:

> I've got patches in various projects, and I don't have time to keep up
> with the mailing list, but I'd like to help out with maintenance of that
> code, or the functions/files it touches. People don't cc me. I figure I
> could filter the list, test patches submitted, commits made, mentions of
> files/functions, build filters based on the code I have in the repo even
> if it's been moved or changed subsequently. I'm wondering what other
> people have implemented already for automation around this, or general
> thoughts. Web search is not showing me much.

First, the practice on this mailing list is to Cc all, and from what
I have seen people tend to do that (well, at least the regular participants).

Second, you can read this mailing list (and send emails / respond) with
a news reader via the NNTP interface (gmane or public-inbox).  News readers
usually have good search capability.

Hope that helps
-- 
Jakub Narębski



Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-08 Thread Jakub Narębski
t:  git< status=success
> +packet:  git< 
> +packet:  git< HALF_WRITTEN_ERRONEOUS_CONTENT
> +packet:  git< 
> +packet:  git< status=error
> +packet:  git< 
> +
> +
> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it
> +with the next file that needs to be processed. Depending on the
> +`filter..required` flag Git will interpret that as error.

Uhh... until now the order was explanation, then example.  From the
duplicated description above, it is now first example, then
description.  Consistency would be good.

> +
> +The error handling for all cases above mimic the behavior of
> +the `filter..clean` / `filter..smudge` error
> +handling.

You have "error handling" repeated here.

> +
> +In case the filter cannot or does not want to process the content
> +as well as any future content for the lifetime of the Git process,
> +it is expected to respond with an "abort" status at any point in
> +the protocol. Depending on the `filter..required` flag Git
> +will interpret that as error for the content as well as any future
> +content for the lifetime of the Git process but it will not stop or
> +restart the filter process.

Here the order, first about `required` then without, looks all right
to me: the result wrt process is the same, only the error or not
changes.

And here we have description first, example second.

> +
> +packet:  git< status=abort
> +packet:  git< 
> +
> +
> +After the filter has processed a blob it is expected to wait for
> +the next "key=value" list containing a command. Git will close
> +the command pipe on exit. The filter is expected to detect EOF
> +and exit gracefully on its own.

Any "kill filter" solutions should probably be put here.  I guess
that filter exiting means EOF on its standard output when read
by Git command, isn't it?

> +
> +If you develop your own long running filter
> +process then the `GIT_TRACE_PACKET` environment variables can be
> +very helpful for debugging (see linkgit:git[1]).

s/environment variables/environment variable/  - there is only
one GIT_TRACE_PACKET.  Unless you wanted to write about GIT_TRACE?

> +
> +If a `filter..process` command is configured then it
> +always takes precedence over a configured `filter..clean`
> +or `filter..smudge` command.

Ah, it is here! I think it would be better to put it upfront; you
don't need information about the protocol to *use* the existing
filter, but you need this info.

Or maybe we can repeat this information.

> +
> +Please note that you cannot use an existing `filter..clean`
> +or `filter..smudge` command with `filter..process`
> +because the former two use a different inter process communication
> +protocol than the latter one.

I'm not sure where this should be (but it is needed)

> +
> +
>  Interaction between checkin/checkout attributes
>  ^^^

-- 
Jakub Narębski



Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-10-07 Thread Jakub Narębski
W dniu 07.10.2016 o 19:52, Konstantin Khomoutov pisze:
> On Fri, 30 Sep 2016 19:14:13 +0300
> Konstantin Khomoutov <kostix+...@007spb.ru> wrote:
> 
>> The "It Will Never Work in Theory" blog has just posted a summary of a
>> study which tried to identify shortcomings in the design of Git.
>>
>> In the hope it might be interesting, I post this summary here.
>> URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html
> 
> I think it would be cool if someone among subscribers could post a link
> to our discussion thread [2] back onto that page.  Unfortunatrly that
> requires a Disqus login which I don't have, so may be someone in
> possession of such login could do that? ;-)
> 
> 2. 
> https://public-inbox.org/git/ce42f934-4a94-fa29-cff0-5ebb0f004...@gmail.com/T/#e95875b7940512b432ab2e29b3dd50ca448df9720

Posted.  Thanks for the idea.

-- 
Jakub Narębski




Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-10-07 Thread Jakub Narębski
On 7 October 2016 at 18:55, Santiago Perez De Rosso
<spere...@csail.mit.edu> wrote:
> On Wed, Oct 5, 2016 at 6:15 AM Jakub Narębski <jna...@gmail.com> wrote:
>> On 5 October 2016 at 04:55, Santiago Perez De Rosso
>> <spere...@csail.mit.edu> wrote:
>>> On Fri, Sep 30, 2016 at 6:25 PM Jakub Narębski <jna...@gmail.com> wrote:
>>>> W dniu 30.09.2016 o 18:14, Konstantin Khomoutov pisze:
>>>>
>>>>> The "It Will Never Work in Theory" blog has just posted a summary of a
>>>>> study which tried to identify shortcomings in the design of Git.
>>>>>
>>>>> In the hope it might be interesting, I post this summary here.
>>>>> URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html
>>>>
>>>> I will comment on the article itself, not just on the summary.
>>>>
>>>> | 2.2 Git
>>>> [...]
>>>> | But tracked files cannot be ignored; to ignore a tracked file
>>>> | one has to mark it as “assume unchanged.” This “assume
>>>> | unchanged” file will not be recognized by add; to make it
>>>> | tracked again this marking has to be removed.
[...]
>> Yes, this is true that users may want to be able to ignore changes to
>> tracked files (commit with dirty tree), but using `assume-unchanged` is
>> wrong and dangerous solution.  Unfortunately the advice to use it is
>> surprisingly pervasive.  I would thank you to not further this error.
>
> Ok, I added a footnote in the paper when we first mention assume unchanged
> that says:
>
> Assume unchanged was intended to be used as a performance optimization but
> has since been appropriated by users as a way to ignore tracked files. The
> current advice is to use the “skip worktree” marking instead
>
> This should prompt readers to look into skip worktree next time they want to
> ignore tracked files. I don't think people reading the paper are doing so to
> learn Git but at least it should contribute to not furthering the error.

Thank you very much.

The problem with "assume-unchanged" is that by using it to ignore
changes to tracked files you are lying to Git (telling it 'assume this
is unchanged' while changing it), and can lead to DATA LOSS, that
is to losing those changes.

[...]
>>>> [...]
>>>> | The problem
>>>> | is the lack of connection between this purpose and the highlevel
>>>> | purposes for version control, which suggests that the
>>>> | introduction of stashing might be to patch flaws in the design
>>>> | of Git and not to satisfy a requirement of version control.
>>>>
>>>> Or the problem might be that you are missing some (maybe minor)
>>>> requirement of version control system. Just saying...
>>>
>>> What would that purpose be? and why would you say that's a
>>> high-level purpose for version control and not one that's
>>> git-specific?
>>
>> The stash (or rather its equivalent) is not something Git specific.
>> It is present also in other version control systems, among others:
>>
>> * Mercurial: as 'shelve' extension (in core since 1.8)
>> * Bazaar: as 'bzr shelve' command
>> * Fossil: as 'fossil stash' command (with subcommands)
>> * Subversion: Shelve planned for 1.10 (2017?)
>
> Do these other VCSs have the same "Switching branches" misfit? Do you
> usually need to stash if you want to switch with uncommitted changes? I know
> Mercurial has the same problem since ``bookmarks'' are like Git branches, so
> it makes sense for them to have added something like stashing (if otherwise
> switching with uncommitted changes would be very difficult).

I suspect that all those are inspired by each other, and that
they all use 'uncommitted changes are not tied to a branch'
paradigm, which allows for creating a branch for changes
after a fact (when it turns out that it would take more than
one commit to implement the feature) quite easy.

>> I would say that 'stash' could be considered about isolating work on
>> different features, different sub-branch sized parallel work.

Note that 'isolating work' is missing from your list of purposes
of a version control system; though it is fairly obvious.

>
> That sounds a lot like having independent lines of development, which is
> what branches are supposed to be for

Those are sub-commit changes. Branches are composed of
commits. But I agree that this may be a bit of a stretch in
trying to find a high-level purpose for stash (rather than it being
a convenience feature). As I said below...

>> But it might be that stash doesn't have connection with highlevel
&g

Re: [PATCH v4 0/6] Pull out require_clean_work_tree() functionality from builtin/pull.c

2016-10-07 Thread Jakub Narębski
W dniu 07.10.2016 o 18:08, Johannes Schindelin pisze:

> - marked the hint "please commit or stash them" (reintroduced from the
>   original git-pull.sh script) as translatable.

I wonder if we can make automatic check if everything introduced is
translatable, for example with something akin to "English (XT)"
autogenerated pseudo-localization that Android uses?

Just food for thought.
-- 
Jakub Narębski



Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-07 Thread Jakub Narębski
W dniu 07.10.2016 o 16:19, Johannes Schindelin pisze: 
> On Fri, 7 Oct 2016, Jakub Narębski wrote:

>> Note that we would have to teach git completion about new syntax;
>> or new configuration variable if we go that route.
> 
> Why would we? Git's completion does not expand aliases, it only completes
> the aliases' names, not their values.

Because Git completion finds out which _options_ and _arguments_
to complete for an alias based on its expansion.

Yes, this is nice bit of trickery...
-- 
Jakub Narębski



Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-07 Thread Jakub Narębski
Hello, Johannes

W dniu 07.10.2016 o 13:20, Johannes Schindelin pisze:
> On Thu, 6 Oct 2016, Junio C Hamano wrote:
>> Nguyễn Thái Ngọc Duy  <pclo...@gmail.com> writes:
>>
>>> Throwing something at the mailing list to see if anybody is
>>> interested.
>>>
>>> Current '!' aliases move cwd to $GIT_WORK_TREE first, which could make
>>> handling path arguments hard because they are relative to the original
>>> cwd. We set GIT_PREFIX to work around it, but I still think it's more
>>> natural to keep cwd where it is.
>>>
>>> We have a way to do that now after 441981b (git: simplify environment
>>> save/restore logic - 2016-01-26). It's just a matter of choosing the
>>> right syntax. I'm going with '!!'. I'm not very happy with it. But I
>>> do like this type of alias.
>>
>> I do not know why you are not happy with the syntax, but I
>> personally think it brilliant, both the idea and the preliminary
>> clean-up that made this possible with a simple patch like this.
> 
> I guess he is not happy with it because "!!" is quite unintuitive a
> construct. I know that *I* would have been puzzled by it, asking "What the
> heck does this do?".

Well, "!" as a prefix is not intuitive either.

Perhaps "!.", because "." is current directory, and the "." command
(that is, alias to "source") doesn't make sense in git aliases.

Note that we would have to teach git completion about new syntax;
or new configuration variable if we go that route.
-- 
Jakub Narębski



Re: Regression: git no longer works with musl libc's regex impl

2016-10-07 Thread Jakub Narębski
W dniu 07.10.2016 o 00:42, Ramsay Jones pisze: 
> On 06/10/16 20:18, Ævar Arnfjörð Bjarmason wrote:
[...]
>> But just to clarify, does anyone have any objection to making our
>> configure.ac compile a C program to check for this sort of thing?
>> Because that seems like the easiest solution to this class of problem.
> 
> Err, you do know that we already do that, right?
> 
> [see commit a1e3b669 ("autoconf: don't use platform regex if it lacks 
> REG_STARTEND", 17-08-2010)]
> 
> In fact, if you run the auto tools on cygwin, you get a different setting
> for NO_REGEX than via config.mak.uname. Which is why I don't run configure
> on cygwin. :-D
> 
> [The issue is exposed by t7008-grep-binary.sh, where the cygwin native
> regex library matches '.' in a pattern with the NUL character. ie the
> test_expect_failure test passes.]

Huh.  So we have NO_REGEX support in ./configure, and people using
Git on untypical architectures and systems *can* make use of it.

It was just described wrongly, so in turn to have the more neutral
description, the same as in Makefile, let's do this:

 >8 -- >8 - >8 -- >8 --
Subject: [PATCH] configure.ac: Improve description of NO_REGEX test

The commit 2f8952250a changed description of NO_REGEX build config
variable to be more neutral, and actually say that it is about
support for REG_STARTEND.  Change description in configure.ac to
be the same.

Change also the test message and variable name to match.  The test
just checks that REG_STARTEND is #defined.

Issue-found-by: Ramsay Jones <ram...@ramsayjones.plus.com>
Signed-off-by: Jakub Narębski <jna...@gmail.com>
---
 configure.ac | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/configure.ac b/configure.ac
index aa9c91d..7f39fd0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -835,9 +835,10 @@ AC_CHECK_TYPE([struct addrinfo],[
 ])
 GIT_CONF_SUBST([NO_IPV6])
 #
-# Define NO_REGEX if you have no or inferior regex support in your C library.
-AC_CACHE_CHECK([whether the platform regex can handle null bytes],
- [ac_cv_c_excellent_regex], [
+# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
+# feature.
+AC_CACHE_CHECK([whether the platform regex supports REG_STARTEND],
+ [ac_cv_c_regex_with_reg_startend], [
 AC_EGREP_CPP(yippeeyeswehaveit,
AC_LANG_PROGRAM([AC_INCLUDES_DEFAULT
 #include 
@@ -846,10 +847,10 @@ AC_EGREP_CPP(yippeeyeswehaveit,
 yippeeyeswehaveit
 #endif
 ]),
-   [ac_cv_c_excellent_regex=yes],
-   [ac_cv_c_excellent_regex=no])
+   [ac_cv_c_regex_with_reg_startend=yes],
+   [ac_cv_c_regex_with_reg_startend=no])
 ])
-if test $ac_cv_c_excellent_regex = yes; then
+if test $ac_cv_c_regex_with_reg_startend = yes; then
NO_REGEX=
 else
NO_REGEX=YesPlease
-- 
2.10.0





Re: [PATCH v2 05/25] sequencer: allow the sequencer to take custody of malloc()ed data

2016-10-06 Thread Jakub Narębski
W dniu 06.10.2016 o 21:23, Junio C Hamano pisze:
> Johannes Schindelin  writes:
> 
>> If you prefer to accept such sloppy work, I will change it of course,
>> feeling dirty that it has my name on it.
> 
> I do want neither leaks nor sloppyness, and I thought that was
> understood by everybody, hence I thought the last round made it
> clear that _entrust() must not exist.

Also, the *_entrust() mechanism in v1 was quite sequencer-specific,
rather than be a generic mechanism.

> 
> Let's try again.
> 
> We manage lifetime of a field in a structure in one of three ways in
> our codebase [*1*].
> 
>  * A field can always point at a borrowed piece of memory that the
>destruction of the containing structure or re-assignment to the
>field do not have to free the current value of the field.  This
>is a minority, simply because it is rare for a field to be always
>able to borrow from others.  The destruction of the containing
>structure or re-assignment to the field needs to do nothing
>special.
> 
>  * A field can own the piece of memory that it points at.  The
>destruction of the containing structure or re-assignment to the
>field needs to free the current value of the field [*2*].  A
>field that can be assigned a value from the configuration (which
>is typically xstrdup()'ed) is an example of such a field, and if
>a command line option also can assign to the field, we'd need to
>xstrdup() it, even though we know we could borrow from argv[],
>because cleaning-up needs to be able to free(3) it.
> 
>  * A field can sometimes own and sometimes borrow the memory, and it
>is accompanied by another field to tell which case it is, so that
>cleaning-up can tell when it needs to be free(3)d.  This is a
>minority case, and we generally avoid it especially in modern
>code for small allocation, as it makes the lifetime rule more
>complex than it is worth.

Great writeup!

> The _entrust() thing may have been an excellent fourth option if it
> were cost-free.  xstrdup() that the second approach necessitates for
> literal strings (like part of argv[]) would become unnecessary and
> we can treat as if all the fields in a structure were all borrowing
> the memory from elsewhere (in fact, _entrust() creates the missing
> owners of pieces of memory that need to be freed later); essentially
> it turns a "mixed ownership" case into the first approach.
> 
> But it is not cost-free.  The mechanism needs to allocate memory to
> become the missing owner and keep track of which pieces of memory
> need to be freed later, and the resulting code does not become any
> easier to follow.  The programmer still needs to know which ones to
> _entrust() just like the programmer in the model #2 needs to make
> sure xstrdup() is done for appropriate pieces of memory--the only
> difference is that an _entrust() programmer needs to mark the pieces
> of memory to be freed, while a #2 programmer needs to xstrdup() the
> pieces of memory that are being "borrowed".
> 
> So let's not add the fourth way to our codebase.  "mixed ownership"
> case should be turned into "field owns the memory", i.e. approach #2.

On the other hand the _entrust() mechanism might be a good solution
if the amount of memory was large, for example order of magnitude more
than what would be needed to keep ownership info *and* borrowing would
not be possible for some reason.

But the _entrust() mechanism reminds me on one hand side of memory
debuggers like dmalloc, Electric Fence (eFence), or valgrind; combined
a bit with garbage collection.  Namely, when we are in a library call,
record all allocations (perhaps by redirecting alloc functions), then
free those resources at the exit of library call.
 
> 
> [Footnotes]
> 
> *1* It is normal for different fields in the same structure follow
> different lifetime rules.
> 
> *2* Unless leaks are allowed, that is.  I think we have instances
> where "git cmd --opt=A --opt=B" allocates and stores a new piece
> of memory that is computed based on A and store it to a field,
> and then overwrites the field with another allocation of a value
> based on B without freeing old value, saying "the caller can
> avoid passing the same thing twice, and such a leak is miniscule
> anyway".  That is not nice, and we've been reducing them over
> time.
> 



Re: [PATCH v9 00/14] Git filter protocol

2016-10-05 Thread Jakub Narębski
W dniu 04.10.2016 o 14:59, larsxschnei...@gmail.com pisze:
> From: Lars Schneider <larsxschnei...@gmail.com>
> 
> The goal of this series is to avoid launching a new clean/smudge filter
> process for each file that is filtered.
> 
> A short summary about v1 to v5 can be found here:
> https://git.github.io/rev_news/2016/08/17/edition-18/
> 
> This series is also published on web:
> https://github.com/larsxschneider/git/pull/13
> 
> Patches 1 and 2 are cleanups and not strictly necessary for the series.
> Patches 3 to 12 are required preparation. Patch 13 is the main patch.
> Patch 14 adds an example how to use the Git filter protocol in contrib.
> 
> Thanks a lot to
>   Ramsay, Jakub, Junio, Johannes, Torsten, and Peff
> for very helpful reviews,
> Lars

I'll try to review it before the end of the week.

-- 
Jakub Narębski



Re: [PATCH v3 00/14] Mark strings in Perl scripts for translation

2016-10-05 Thread Jakub Narębski
W dniu 05.10.2016 o 19:20, Vasco Almeida pisze:
> Mark messages in some perl scripts for translation.
> 
> Thanks for the reviews of Junio Hamano and Jakub Narębski.  Although I think
> Jakub Narębski's suggestions are overall good, I am not willing to go that 
> path
> because I cannot see huge benefits from them given what we already have and
> also I lack Perl skills.

All right.  While I think that Locale::TextDomain-like interpolation
in translated strings (__x, __xn / __nx, etc.), which is labelled as
'perl-brace-format' by gettext, is more Perl-ish and better for unbiased
translators, the printf based interpolation, labelled as ‘perl-format’
(and identical to 'c-format', I think), may be preferred in this case.

The 'perl-brace-format' doesn't need TRANSLATOR comments to explain
what placeholders are, and placeholders are easier to reorder.  On
the other hand translator needs to know to not translate contents
of placeholders.

  "This is the {color} {thing}.\n"

With 'perl-format' / 'c-format' the translator might need to know
how to change order of placeholders, but he or she should know
how to do it translating strings from C code.

  "This is the %s %s.\n"

Also, if Perl code shares translation strings with C code, as in
most cases here, then printf format is needed to do translation
only once.

tldr; I am reversing my opinion, and agree with your solution.

> 
> Interdiff bellow.

One thing I have noticed in the interdiff is using *translated*
strings in hash content (translation takes time, and might not
be necessary), that is:

   $hashref = {
KEY => __('value'),
   }

   ... $hashref->{KEY} ...

instead of marking value for translation, and doing translation
only on print, when it is necessary

   $hashref = {
KEY => N__('value'),
   }

   ... __($hashref->{KEY}) ...

> 
> Vasco Almeida (14):
[...]

I'll try to review those later.

Thank you for your work,
-- 
Jakub Narębski



Re: [PATCH v3 6/6] wt-status: begin error messages with lower-case

2016-10-05 Thread Jakub Narębski
W dniu 04.10.2016 o 15:06, Johannes Schindelin pisze:

> The previous code still followed the old git-pull.sh code which did not
> adhere to our new convention.

Good to know why it used its own convention.
 
> Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
> ---
>  builtin/pull.c | 2 +-
>  wt-status.c| 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/pull.c b/builtin/pull.c
> index c639167..0bf9802 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -810,7 +810,7 @@ int cmd_pull(int argc, const char **argv, const char 
> *prefix)
>  
>   if (!autostash)
>   require_clean_work_tree(N_("pull with rebase"),
> - "Please commit or stash them.", 1, 0);
> + "please commit or stash them.", 1, 0);
>  

Shouldn't those also be marked for translation with N_() or _()?

Best,
-- 
Jakub Narębski



Re: [PATCH v3 4/6] Export also the has_un{staged,committed}_changed() functions

2016-10-05 Thread Jakub Narębski
W dniu 04.10.2016 o 15:05, Johannes Schindelin pisze:

> Subject: Export also the has_un{staged,committed}_changed() functions

s/changed/changes/   that is   d -> s

Those are has_unstaged_changes() and has_uncommitted_changes().

Though I wonder if "other has_un*_changes() functions" would be
more readable (while shorter), if less specific...

>
> They will be used in the upcoming rebase helper.
> 
> Signed-off-by: Johannes Schindelin <johannes.schinde...@gmx.de>
> ---
[...]
> -/* The following function expect that the caller took care of reading the 
> index. */
> +/* The following functions expect that the caller took care of reading the 
> index. */
> +int has_unstaged_changes(void);
> +int has_uncommitted_changes(void);
>  int require_clean_work_tree(const char *action, const char *hint, int 
> gently);

Nice to see the fix in comment too.  Good work!

-- 
Jakub Narębski



Re: [PATCH v2 1/5] check_connected: accept an env argument

2016-10-05 Thread Jakub Narębski
W dniu 03.10.2016 o 22:49, Jeff King pisze:

> diff --git a/connected.h b/connected.h
> index afa48cc..4ca325f 100644
> --- a/connected.h
> +++ b/connected.h
> @@ -33,6 +33,11 @@ struct check_connected_options {
>  
>   /* If non-zero, show progress as we traverse the objects. */
>   int progress;
> +
> + /*
> +  * Insert these variables into the environment of the child process.
> +  */
> + const char **env;
>  };

Just a nitpick, but I wonder why one comment is in single-line form,
and the other uses block-form with a single line.

-- 
Jakub Narębski, bikeshedding



Re: [PATCH 5/6] Documentation/git-merge.txt: improve short description in DESCRIPTION

2016-10-05 Thread Jakub Narębski
W dniu 05.10.2016 o 16:46, sorga...@gmail.com pisze:
> From: Sergey Organov <sorga...@gmail.com>
> 
> Old description had a few problems:
> 
> - sounded as if commits have changes
> 
> - stated that changes are taken since some "divergence point"
>   that was not defined.
> 
> New description rather uses "common ancestor" and "merge base",
> definitions of which are easily discoverable in the rest of GIT
> documentation.

This is a step in a good direction, but it has a few issues.

> 
> Signed-off-by: Sergey Organov <sorga...@gmail.com>
> ---
>  Documentation/git-merge.txt | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
> index cc0329d..351b8fc 100644
> --- a/Documentation/git-merge.txt
> +++ b/Documentation/git-merge.txt
> @@ -16,11 +16,16 @@ SYNOPSIS
>  
>  DESCRIPTION
>  ---
> -Incorporates changes from the named commits (since the time their
> -histories diverged from the current branch) into the current
> -branch.  This command is used by 'git pull' to incorporate changes
> -from another repository and can be used by hand to merge changes
> -from one branch into another.
> +
> +Incorporates changes that lead to the named commits into the current
> +branch, and joins corresponding histories. The best common ancestor of
> +named commits and the current branch, called "merge base", is
> +calculated, and then net changes taken from the merge base to
> +the named commits are applied.

The first sentence is all right; it reads better than the original
without the introduced part in parentheses.  The only minor issue
is with "joins corresponding histories" - it is a good description,
but may imply that the branch we are merging vanishes: it doesn't.
But all in all, it is a good change.

Second sentence has some problems.  First, while it is a good idea
to use well defined term "merge base", I think writing "since the
time their histories diverged" or "(which is the point where histories
diverged)" would be a good plain language description; it was removed
entirely in the proposal.

Second, while "common ancestor" and "least common ancestor" are well
defined in mathematics of graphs, "best common ancestor" isn't...
but this is what git-merge-base(1) documentation uses.

Also, the "best common ancestor" doesn't need to be only one.  There
might be many such ancestors... though Git would generate then a
virtual best common ancestor thanks to recursive merge strategy.
And usually there is only one "best common ancestor", that is a single
merge base.  So this may need clarification, but it is not much of
a problem.

Third, and most important, is that "net changes taken from the merge
base to the named commits are applied" is simply not true.  The
`git merge` command does not reapply changes - that is what rebase
and cherry-pick do.  The merge operation uses 3-way merge strategy
(diff3) between merge-base, current branch, and merged commit.
That is, it finds differences between differences, and "applies"
that.

See "A Formal Investigation of Diff3" paper by Sanjeev Khanna,
Keshav Kunal, and Benjamin C. Pierce:
  http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

I'm not sure how to explain it succintly.  Perhaps

  net changes between merge base to the current (merged into)
  branch and named commits are integrated

There is description of trivial 3-way merge somewhere in Git docs,
though in very unobvious place; we can link it.

> +
> +This command is used by 'git pull' to incorporate changes from another
> +repository, and can be used by hand to merge changes from one branch
> +into another.

Rather "can be used by 'git pull'", or "is used by 'git pull' (unless
configured otherwise)"...

Separating this information makes a very good sense.  Thanks.

>  
>  Assume the following history exists and the current branch is
>  "`master`":
> @@ -31,11 +36,11 @@ Assume the following history exists and the current 
> branch is
>  D---E---F---G master
>  
>  
> -Then "`git merge topic`" will replay the changes made on the
> -`topic` branch since it diverged from `master` (i.e., `E`) until
> -its current commit (`C`) on top of `master`, and record the result
> -in a new commit along with the names of the two parent commits and
> -a log message from the user describing the changes.
> +Then "`git merge topic`" will replay the changes made on the `topic`
> +branch since it diverged from `master` (i.e., `E`) until its current
> +commit (`C`) on top of `master`, and record the result in a new commit
> +along with references to the two parent commits and a log message from
> +the user describing the changes.

What the happened here!?!  Please do not rewrap documentation, especially
not without changes!

>  
>  
> A---B---C topic
> 
-- 
Jakub Narębski



Re: [PATCH 16/18] count-objects: report alternates via verbose mode

2016-10-05 Thread Jakub Narębski
W dniu 03.10.2016 o 22:36, Jeff King pisze:

> +test_expect_success 'count-objects shows the alternates' '
> + cat >expect <<-EOF &&
> + alternate: $(pwd)/B/.git/objects
> + alternate: $(pwd)/A/.git/objects
> + EOF
> + git -C C count-objects -v >actual &&
> + grep ^alternate: actual >actual.alternates &&
> + test_cmp expect actual.alternates
> +'

This was bit hard to grok for me without quotes around
regular expression in grep (should it be sane_grep, BTW?):

  + grep "^alternate:" actual >actual.alternates &&

But it might be just me... It's certainly not necessary.

-- 
Jakub Narębski



Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests

2016-10-05 Thread Jakub Narębski
W dniu 04.10.2016 o 22:58, Stefan Beller pisze:
> On Tue, Oct 4, 2016 at 1:55 PM, Jeff King <p...@peff.net> wrote:
>> On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:
>>
>>>>>>>> +# Note: These tests depend on the hard-coded value of 5 as "too 
>>>>>>>> deep". We start
>>>>>>>> +# the depth at 0 and count links, not repositories, so in a chain 
>>>>>>>> like:
>>>>>>>> +#
>>>>>>>> +#   A -> B -> C -> D -> E -> F -> G -> H
>>>>>>>> +#  0123456
>>>>>>>> +#

> 
> Input from a self-claimed design expert for ASCII art. ;)
> What about this?
> 
> #   A  -0->  B  -1->  C  -2->  ...

I would prefer the following:

#   A --> B --> C --> D --> E --> F --> G --> H
#  0     1 2 3 4 5 6

that is, the number below the middle of the arrow
(which could have been even longer)

#   A ---> B ---> C ---> D ---> E ---> F ---> G ---> H
#  0  1  2  3  4  5  6


Let's paint this bikeshed _plaid_ ;-
-- 
Jakub Narębski




Re: Regression: git no longer works with musl libc's regex impl

2016-10-05 Thread Jakub Narębski
W dniu 05.10.2016 o 00:33, Rich Felker pisze:
> On Wed, Oct 05, 2016 at 09:06:25AM +1100, James B wrote:
>> On Tue, 4 Oct 2016 18:08:33 +0200 (CEST)
>> Johannes Schindelin <johannes.schinde...@gmx.de> wrote:
>>>
>>> No, it is not. You quote POSIX, but the matter of the fact is that we use
>>> a subset of POSIX in order to be able to keep things running on Windows.
>>>
>>> And quite honestly, there are lots of reasons to keep things running on
>>> Windows, and even to favor Windows support over musl support. Over four
>>> million reasons: the Git for Windows users.
>>
>> Wow, I don't know that Windows is a git's first-tier platform now,
>> and Linux/POSIX second. Are we talking about the same git that was
>> originally written in Linus Torvalds, and is used to manage Linux
>> kernel? Are you by any chance employed by Redmond, directly or
>> indirectly?
>>
>> Sorry - can't help it.

Windows is one of the major platforms, yes.  I think there much, much
more people using Git on Windows, than using Git with musl.  More
users = more important.

Also, working with some inconvenience (requiring compilation with
NO_REGEX=1) is better than not working at all.

In CodingGuidelines we say:

 - Most importantly, we never say "It's in POSIX; we'll happily
   ignore your needs should your system not conform to it."
   We live in the real world.

 - However, we often say "Let's stay away from that construct,
   it's not even in POSIX".

 - In spite of the above two rules, we sometimes say "Although
   this is not in POSIX, it (is so convenient | makes the code
   much more readable | has other good characteristics) and
   practically all the platforms we care about support it, so
   let's use it".

The REG_STARTEND is 3rd point, mmap shenningans looks like 1st...

...on the other hand midipix <writeo...@midipix.org> wrote in
http://public-inbox.org/git/20161004200057.dc30d64f61e5ec441c34ffd4f788e58e.efa66ead67@email15.godaddy.com/
that the proposed fix should work on all Windows version we are
interested in (I think).  Test program included / attached.

The above-mentioned email also explains that the problem was
caught on MS Windows; it triggers if file end falls on the mmapped
page boundary, which is more likely to happen with 4096 mod size
on Windows rather than 65536 mod size on Linux.


On the other hand, while the proposed solution of "add padding as
to not end at page boundary, if necessary" doesn't have the
performance impact of "memcpy into NUL-terminated buffer" that
was originally proposed in patch series, it is still extra code
to maintain.

> 
> I don't think the hostility and sarcasm are really needed here. But
> what this does speak to is that users don't like feeling like their
> platform is being treated as a second-class target, which is what it
> feels like when you have to manually flip a switch to make git build.

You are welcome to send a patch adding to configure.ac detection
of REG_STARTEND support in standard library - setting NO_REGEX if
needed, and/or adding to Makefile uname-based defaults setting
NO_REGEX for compiling with musl.

> This is especially unfriendly when the semantics of the switch come
> across, at least to some users, as "your system regex is incomplete"
> rather than "git can't use it because git depends on nonstandard
> extensions".

Nonstandard but common extension.  As 2f8952250a commit message says
https://github.com/git/git/commit/2f8952250a84313b74f96abb7b035874854cf202

  Happily, there is an extension to regexec() introduced by the NetBSD
  project and present in all major regex implementation including
  Linux', MacOSX' and the one Git includes in compat/regex/: [...]

  [...]

  Since support for REG_STARTEND is so widespread by now, let's just
  introduce a helper function that always uses it, and tell people
  on a platform whose regex library does not support it to use the
  one from our compat/regex/ directory.

Also, as Junio said, the description of NO_REGEX option in the
Makefile now explicitly says:

# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
# feature.

Best,
-- 
Jakub Narębski



Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-10-05 Thread Jakub Narębski
[git@vger.kernel.org does not accept HTML emails]

I just hope that this email don't get mangled too much...

On 5 October 2016 at 04:55, Santiago Perez De Rosso
<spere...@csail.mit.edu> wrote:
> On Fri, Sep 30, 2016 at 6:25 PM Jakub Narębski <jna...@gmail.com> wrote:
>> W dniu 30.09.2016 o 18:14, Konstantin Khomoutov pisze:
>>
>>> The "It Will Never Work in Theory" blog has just posted a summary of a
>>> study which tried to identify shortcomings in the design of Git.
>>>
>>> In the hope it might be interesting, I post this summary here.
>>> URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html
>>
>> I will comment on the article itself, not just on the summary.
>>
>> | 2.2 Git
>> [...]
>> | But tracked files cannot be ignored; to ignore a tracked file
>> | one has to mark it as “assume unchanged.” This “assume
>> | unchanged” file will not be recognized by add; to make it
>> | tracked again this marking has to be removed.
>>
>> WRONG!  Git has tracked files, untracked unignored files, and
>> untracked ignored files (mostly considered unimportant).
>>
>> The "assume unchanged" bit is _performance_ optimization. It is not,
>> and cannot be a 'ignore tracked files' bit - here lies lost work!!!
>> You can use (imperfectly) "prefer worktree" bit hack instead.
>>
>> You can say, if 'ignoring change to tracked files' is motivation,
>> or purpose, it lacks direct concept.
>
>
> I don't see what's wrong with the paragraph you mention. I am aware of the
> fact that assumed unchanged is intended to be used as a performance
> optimization but that doesn't seem to be the way it is used in practice.
> Users have appropriated the optimization and effectively turned into a
> concept that serves the purpose of preventing the commit of a file. For
> example:
>
> from 
> http://gitready.com/intermediate/2009/02/18/temporarily-ignoring-files.html
>
>  So, to temporarily ignore changes in a certain file, run:
>  git update-index --assume-unchanged 
>  ...
>
> from 
> http://stackoverflow.com/questions/17195861/undo-git-update-index-assume-unchanged-file
>  The way you git ignore watching/tracking a particular dir/file.
>  you just run this:
>  git update-index --assume-unchanged 
> ...
>
>
> btw, this appropriation suggests that users want to be able to ignore
> tracked files and they do what they can with what they are given (which
> in this case means abusing the assumed unchanged bit).

Yes, this is true that users may want to be able to ignore changes to
tracked files (commit with dirty tree), but using `assume-unchanged` is
wrong and dangerous solution.  Unfortunately the advice to use it is
surprisingly pervasive.  I would thank you to not further this error.
(Well, `skip-worktree` is newer, that's why it is lesser known, perhaps)

To ignore tracked files you need to use `skip-worktree` bit.

You can find the difference between `assume-unchanged` and
`skip-worktree`, and when use which in:
http://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree
http://fallengamer.livejournal.com/93321.html
http://blog.stephan-partzsch.de/how-to-ignore-changes-in-tracked-files-with-git/

The difference is that skip-worktree will not overwrite a file that is
different from the version in the index, but assume-unchanged can.  This
means that the latter can OVERWRITE YOUR PRECIOUS CHANGES!

Some people started to recommend it
http://stackoverflow.com/questions/32251037/ignore-changes-to-a-tracked-file
http://www.virtuouscode.com/2011/05/20/keep-local-modifications-in-git-tracked-files/


>> | The notion of a “remote branch” must not be confused
>> | with that of an “upstream branch.” An upstream branch is
>> | just a convenience for users: after the user assigns it to some
>> | branch, commands like pull and push default to use that
>> | branch for fetching and pushing changes if no branch is given
>> | as input.
>>
>> Actually "upstream branch" (and related "upstream repository")
>> are a concept, not only a convenience. They denote a branch
>> (usually in remote repository) which is intended to ultimately
>> include changes in given branch. Note that "upstream branch"
>> can be set separately for any given local branch.
>
>
> We never say upstream branch is not a concept. It even appears in the
> table.

Hmmm... I got onfused by words "is just a convenience", which for me
implies that it is not a concept.

>>
>>
>> One thing that can enormously help recovering from errors, and
>> is not covered in the list of concepts is REFLOG.
>
>
&

Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-04 Thread Jakub Narębski
[Some of answers and comments may got invalidated by v9]

W dniu 01.10.2016 o 17:34, Lars Schneider pisze:
>> On 29 Sep 2016, at 01:14, Jakub Narębski <jna...@gmail.com> wrote:
>>
>> Part third (and last) of the review of v8 11/11.

[...]
>>> +
>>> +test_expect_success PERL 'required process filter should filter data' '
>>> +   test_config_global filter.protocol.process 
>>> "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
>>> +   test_config_global filter.protocol.required true &&
>>> +   rm -rf repo &&
>>> +   mkdir repo &&
>>> +   (
>>> +   cd repo &&
>>> +   git init &&
>>
>> Don't you think that creating a fresh test repository for each
>> separate test is a bit too much?  I guess that you want for
>> each and every test to be completely independent, but this setup
>> and teardown is a bit excessive.
>>
>> Other tests in the same file (should we reuse the test, or use
>> new test file) do not use this method.
> 
> I see your point. However, I am always annoyed if Git tests are
> entangled because it makes working with them way way harder.
> This test test runs in 4.5s on a slow Travis CI machine. I think
> that is OK considering that we have tests running 3.5min (t3404).

All right, this is good argument... though inconsistency (assuming
that we don't switch to separate test for multi-file filter) could
be argument against.

Though I wonder if test preparation could be extracted into a
common function...

[...]
>>> +   cp ../test.o test.r &&
>>> +   cp ../test2.o test2.r &&
>>
>> What does this test2.o / test2.r file tests, that test.o / test.r
>> doesn't?  The name doesn't tell us.
> 
> This just tests multiple files with different content.

All right, so it is here to test multiple files (and that filter
actually process multiple files).

>> Why it is test.r, but test2.r?  Why it isn't test1.r?
> 
> test.r already existed (created in setup test).

With each test in separate repository we could copy test.r prepared
in 'setup' into test1.r in each of multi-file tests.

[...]
>>> +   >test4-empty.r &&
>>
>> You test ordinary file, file in subdirectory, file with filename
>> containing spaces, and an empty file.
>>
>> Other tests of single file `clean`/`smudge` filters use filename
>> that requires mangling; maybe we should use similar file?
>>
>>special="name  with '\''sq'\'' and \$x" &&
>>echo some test text >"$special" &&
> 
> OK.

This test was required because of %f to pass filename as parameter
coupled with the fact that we use `clean` and `smudge` as shell
script fragment (so e.g. pipes and redirection would work in
one-shot filter definition).

This is not the case with multi-file filter, where filenames are
passed internally, and we don't need to worry about shell quoting
at all.
 
>> In case of `process` filter, a special filename could look like
>> this:
>>
>>process_special="name=with equals and\nembedded newlines\n" &&
>>echo some test text >"$process_special" &&
> 
> I think this test would create trouble on Windows. I'll stick to
> the special characters used in the single shot filter.

This would test... example parser.  Well, all right, better not
give problems for Windows.

But... you can create such files in Git Bash:

  $ touch "$(echo -n -e "fo=o\nbar\n")"

and though they look strange

  $ ls -1 fo*
  'fo=o'$'\n''bar'

but work all right

  $ echo "strangest" >>"$(echo -n -e "fo=o\nbar\n")"
  $ name="$(echo -n -e "fo=o\nbar\n")"
  $ cat "$name"
  strangest

>> Third, why the filter even writes output size? It is no longer
>> part of `process` filter driver protocol, and it makes test more
>> fragile.
> 
> I would prefer to leave that in. I think it is good for the test to
> check that we are transmitting the amount of content that what we 
> think we transmit.

Right, we test that we processed full file this way, in the multi
packet test. 

>>> +   <<-\EOF &&
>>> +   START
>>> +   wrote filter header
>>> +   STOP
>>> +   EOF
>>
>> Why is even filter process invoked?  If this is not expected, perhaps
>> simply ignore what checking out almost empty branch (one w

Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-04 Thread Jakub Narębski
[Some of answers and comments may got invalidated by v9]

W dniu 30.09.2016 o 21:38, Lars Schneider pisze:
>> On 27 Sep 2016, at 17:37, Jakub Narębski <jna...@gmail.com> wrote:
>>
>> Part second of the review of 11/11.
[...]
>>> +
>>> +   if (start_command(process)) {
>>> +   error("cannot fork to run external filter '%s'", cmd);
>>> +   kill_multi_file_filter(hashmap, entry);
>>> +   return NULL;
>>> +   }
>>
>> I guess there is a reason why we init hashmap entry, try to start
>> external process, then kill entry of unable to start, instead of
>> trying to start external process, and adding hashmap entry when
>> we succeed?
> 
> Yes. This way I can reuse the kill_multi_file_filter() function.

I don't quite understand.  If you didn't fill the entry before
using start_command(process), you would not need kill_multi_file_filter(),
which in that case IIUC just removes the just created entry from hashmap.
Couldn't you add entry to hashmap in the 'else' part?  Or would it
be racy?

[...]
>>> +static void read_multi_file_filter_values(int fd, struct strbuf *status) {
>>
>> This is more
>>
>>  +static void read_multi_file_filter_status(int fd, struct strbuf *status) {
>>
>> It doesn't read arbitrary values, it examines 'metadata' from
>> filter for "status=" lines.
> 
> True!
>
>>> +   if (pair[0] && pair[0]->len && pair[1]) {
>>> +   if (!strcmp(pair[0]->buf, "status=")) {
>>> +   strbuf_reset(status);
>>> +   strbuf_addbuf(status, pair[1]);
>>> +   }
>>
>> So it is last status= line wins behavior?
> 
> Correct.

Perhaps this should be described in code comment.
 
>>>
>>> +   fflush(NULL);
>>
>> Why this fflush(NULL) is needed here?
> 
> This flushes all open output streams. The single filter does the same.

I know what it does, but I don't know why.  But "single filter does it"
is good enough for me.  Still would want to know why, though ;-)
 
>>>
>>> +   if (fd >= 0 && !src) {
>>> +   if (fstat(fd, _stat) == -1)
>>> +   return 0;
>>> +   len = xsize_t(file_stat.st_size);
>>> +   }
>>
>> Errr... is it necessary?  The protocol no longer provides size=
>> hint, and neither uses such hint if provided.
> 
> We require the size in write_packetized_from_buf() later.

Don't we use write_packetized_from_fd() in the case of fd >= 0?

[...]

Best,
-- 
Jakub Narębski



Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-04 Thread Jakub Narębski
[Some of answers may get invalidated by v9]

W dniu 30.09.2016 o 20:56, Lars Schneider pisze:
>> On 27 Sep 2016, at 00:41, Jakub Narębski <jna...@gmail.com> wrote:
>>
>> Part first of the review of 11/11.

[...]
>>> +to read a welcome response message ("git-filter-server") and exactly
>>> +one protocol version number from the previously sent list. All further
>>
>> I guess that is to provide forward-compatibility, isn't it?  Also,
>> "Git expects..." probably means filter process MUST send, in the
>> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning.
> 
> True. I feel "expects" reads better but I am happy to change it if
> you feel strong about it.

I don't have strong opinion about this.  I guess following the example
of pkt-line format description may be a good idea.  We are not writing
an RFC here... but having be explicit is better than be good read :-P

>>> +
>>> +After the version negotiation Git sends a list of supported capabilities
>>> +and a flush packet.
>>
>> Is it that Git SHOULD send list of ALL supported capabilities, or is
>> it that Git SHOULD NOT send capabilities it does not support, and that
>> it MAY send only those capabilities it needs (so for example if command
>> uses only `smudge`, it may not send `clean`, so that filter driver doesn't
>> need to initialize data it would not need).
> 
> "After the version negotiation Git sends a list of all capabilities that
> it supports and a flush packet."
> 
> Better?

Better.

I wonder if it is a matter of current implementation, namely that at
the point of code where Git is sending capabilities it doesn't know
which of them it will be using; at least in some of cases.

Because if it would be possible for Git to not send capabilities which
it supports, but is sure that it would not be using for a given operation,
then it would be good to do that.  It might be that filter driver must
do some prep work for each of capabilities, so skipping some of prep
would make git faster.  Though all this is for now theoretical musings...
it might be an argument for such description of protocol so it does
not prevent Git from sending only supported capabilities it needs.

>> I wonder why it is "=true", and not "capability=".
>> Is there a case where we would want to send "=false".  Or
>> is it to allow configurable / value based capabilities?  Isn't it going
>> a bit too far: is there even a hind of an idea for parametrize-able
>> capability? YAGNI is a thing...
> 
> Peff suggested that format and I think it is OK:
> http://public-inbox.org/git/20160803224619.bwtbvmslhuicx...@sigill.intra.peff.net/

It wouldn't kill you to summarize the argument, would it?

>From what I understand the argument is that "=true" allows
for simplist parsing into a [intermediate] hash, while the other
proposal of using"capability=" would require something more
sophisticated.  And that it is better to be explicit rather than
use synonyms / shortcuts for "".

Though one can argue that "" is synonym / shortcut for "=true";
it would not complicate parsing much.

Nb. we don't use trick of 'parse metadata to hash' in neither example,
nor filter driver used in test...

[...]
>>> +
>>> +After the filter has processed a blob it is expected to wait for
>>> +the next "key=value" list containing a command. Git will close
>>> +the command pipe on exit. The filter is expected to detect EOF
>>> +and exit gracefully on its own.

Is this still true?

>>
>> Good to have it documented.  
>>
>> Anyway, as it is Git command that spawns the filter driver process,
>> assuming that the filter process doesn't daemonize itself, wouldn't
>> the operating system reap it after its parent process, that is the
>> git command it invoked, dies? So detecting EOF is good, but not
>> strictly necessary for simple filter that do not need to free
>> its resources, or can leave freeing resources to the operating
>> system? But I may be wrong here.
> 
> The filter process runs independent of Git.

Ah.  So without some way to tell long-lived filter process that
it can shut down, because no further data will be incoming, or
killing it by Git, it would hang indefinitely?

[...]
>>> +}
>>> +elsif ( $pkt_size > 4 ) {
>>
>> Isn't a packet of $pkt_size == 4 a valid packet, a keep-alive
>> one?  Or is it forbidden?
> 
> "Implementations SHOULD NOT send an empty pkt-line ("0004")."
> Source: Documentation/technical/protocol-common.txt

All right.  Not that we need keep-alive packet for communication
between two processe

Re: [PATCH v8 00/11] Git filter protocol

2016-10-04 Thread Jakub Narębski
W dniu 03.10.2016 o 19:13, Lars Schneider pisze: 
>> On 01 Oct 2016, at 22:48, Jakub Narębski <jna...@gmail.com> wrote:
>> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
>>> On 29 Sep 2016, at 23:27, Junio C Hamano <gits...@pobox.com> wrote:
>>>> Lars Schneider <larsxschnei...@gmail.com> writes:
>>>>
>>>> If the filter process refuses to die forever when Git told it to
>>>> shutdown (by closing the pipe to it, for example), that filter
>>>> process is simply buggy.  I think we want users to become aware of
>>>> that, instead of Git leaving it behind, which essentially is to
>>>> sweep the problem under the rug.
>>
>> Well, it would be good to tell users _why_ Git is hanging, see below.
> 
> Agreed. Do you think it is OK to write the message to stderr?

On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE)
was invented for.  We do not signal troubles with single-shot filters,
so I guess doing it for multi-file filters is not needed.
 
>>>> I agree with what Peff said elsewhere in the thread; if a filter
>>>> process wants to take time to clean things up while letting Git
>>>> proceed, it can do its own process management, but I think it is
>>>> sensible for Git to wait the filter process it directly spawned.
>>>
>>> To realize the approach above I prototyped the run-command patch below:
>>>
>>> I added an "exit_timeout" variable to the "child_process" struct.
>>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>>> seconds until it kills the child process. If "exit_timeout" is negative
>>> then Git will wait until the process is done.
>>
>> That might be good approach.  Probably the default would be to wait.
> 
> I think I would prefer a 2sec timeout or something as default. This way
> we can ensure Git would not wait indefinitely for a buggy filter by default.

Actually this waiting for multi-file filter is only about waiting for
the shutdown process of the filter.  The filter could still hang during
processing a file, and git would hang too, if I understand it correctly.

[...]
>> Also, how would one set default value of timeout for all process
>> based filters?
> 
> I think we don't need that because a timeout is always specific
> to a filter (if the 2sec default is not sufficient).

All right (assuming that timeouts are good idea). 

>>>
>>> +   while ((waitpid(p->pid, , 0)) < 0 && errno == 
>>> EINTR)
>>> +   ;   /* nothing */
>>
>> Ah, this loop is here because waiting on waitpid() can be interrupted
>> by the delivery of a signal to the calling process; though the result
>> is -1, not just any < 0.
> 
> "< 0" is also used in wait_or_whine()

O.K. (though it doesn't necessary mean that it is correct, there
is another point for using "< 0").
 
[...]
>> There is also another complication: there can be more than one
>> long-running filter driver used.  With this implementation we
>> wait for each of one in sequence, e.g. 10s + 10s + 10s.
> 
> Good idea, I fixed that in the version below!
> 
[...]
> [...] this function is also used with the async struct... 

Hmmm... now I wonder if it is a good idea (similar treatment for
single-file async-invoked filter, and multi-file pkt-line filters).

For single-file one-shot filter (correct me if I am wrong):

 - git sends contents to filter, signals end with EOF
   (after process is started)
 - in an async process:
   - process is started
   - git reads contents from filter, until EOF
   - if process did not end, it is killed


For multi-process pkt-line based filter (simplified):

 - process is started
 - handshake
 - for each file
   - file is send to filter process over pkt-line,
 end signalled with flush packet
   - git reads from filter from pkt-line, until flush
 - ...


See how single-shot filter is sent EOF, though in different part
of code.  We need to signal multi-file filter that no more files
will be coming.  Simplest solution is to send EOF (we could send
"command=shutdown" for example...) to filter, and wait for EOF
from filter (or for "status=finished" and EOF).

We could kill multi-file filter after sending last file and
receiving full response... but I think single-shot filter gets
killed only because it allows for very simple filters, and reusing
existing commands as filters.

[...]
> diff --git a/run-command.c b/run-command.c
> index 3269362..ca0feef 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -21,6 +21,9 @@ void child_process_clear(struct c

Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-10-01 Thread Jakub Narębski
W dniu 01.10.2016 o 22:13, Kevin Daudt pisze:
> On Sat, Oct 01, 2016 at 12:24:57AM +0200, Jakub Narębski wrote:
>>
>> | 7.2.1 Discussion
>> [...]
>> | There could be other use cases for the
>> | staging area that Gitless doesn’t handle well but we expect
>> | these to be fairly infrequent.

I'd like to point to the word "other" in the above sentence...

>>
>> Like handling merge conflict...??? Infrequent doesn't mean
>> unimportant.
> 
> For me the most important thing is that the lack of staging area leads
> on commits that have no clear goal, people just commit everything they
> have changed at some point, as a sort of checkpoint.
> 
> Although lots of people still do this with git currently, they don't
> even have the possibility[1] to improve on this.
> 
> This makes history and features like git bisect less useful.
> 
> 
> [1] At most they can specify the files they want to commit, but this is
> still a very crude way to group together changes.

... If you had read the original discussed research paper 
http://people.csail.mit.edu/sperezde/pre-print-oopsla16.pdf
you would notice that this use case is covered by Gitless,
though imperfectly.

  Common use cases for the staging area in Git are to select
  files to commit, split up a large change into multiple commits,
  and review the changes selected to be committed. We address
  the first by providing a more flexible `commit` command that
  lets the user easily customize the set of files to commit (with
  `only`, `include` and `exclude` flags). For the second use case
  we have a `partial` flag in `commit` that allows the user to
  interactively select segments of files to commit (like Git’s
  `commit --patch`).

First imperfection is that only equivalent of `git commit --patch`,
that is additive selection of changes during commit.  There are,
from what I understand, no equivalents of `git add --interactive`
(crafting additively and subtractively part by part, not all at
once), `git reset --patch` (crafting additively from any commit,
defaults to HEAD), `git checkout --patch` (crafting subtractively).
It is a good thing to have those possibilities when disentangling
working area, or splitting commit during interactive rebase.

Though all of those could be added (if they are not present
already) to the interactive interface of `partial` flag in
`gl commit`.


Second imperfection is that you cannot test the crafted state
without creating a commit.  In Git you have `git stash --keep-index`
for this; go to the state crafted in the staging area (the index),
and you can test it.

Though you can always create a [temporary] commit, run tests, and
then if necessary amend this commit.


On the other hand the index / the staging area is important and
useful thing in helping to resolve merge conflicts, for example
the one where one side changed file and other deleted it, or
where one side renamed file and other changed it, etc.

-- 
Jakub Narębski

mailto:jna...@gmail.com
mailto:jna...@mat.umk.pl



Re: [PATCH v8 00/11] Git filter protocol

2016-10-01 Thread Jakub Narębski
W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
> On 29 Sep 2016, at 23:27, Junio C Hamano  wrote:
>> Lars Schneider  writes:
>>
>>> We discussed that issue in v4 and v6:
>>> http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/
>>> http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/
>>>
>>> My impression was that you don't want Git to wait for the filter process.
>>> If Git waits for the filter process - how long should Git wait?
>>
>> I am not sure where you got that impression.  I did say that I do
>> not want Git to _KILL_ my filter process.  That does not mean I want
>> Git to go away without waiting for me.
>>
>> If the filter process refuses to die forever when Git told it to
>> shutdown (by closing the pipe to it, for example), that filter
>> process is simply buggy.  I think we want users to become aware of
>> that, instead of Git leaving it behind, which essentially is to
>> sweep the problem under the rug.

Well, it would be good to tell users _why_ Git is hanging, see below.

>>
>> I agree with what Peff said elsewhere in the thread; if a filter
>> process wants to take time to clean things up while letting Git
>> proceed, it can do its own process management, but I think it is
>> sensible for Git to wait the filter process it directly spawned.
> 
> To realize the approach above I prototyped the run-command patch below:
> 
> I added an "exit_timeout" variable to the "child_process" struct.
> On exit, Git will close the pipe to the process and wait "exit_timeout" 
> seconds until it kills the child process. If "exit_timeout" is negative
> then Git will wait until the process is done.

That might be good approach.  Probably the default would be to wait.

> 
> If we use that in the long running filter process, then we could make
> the timeout even configurable. E.g. with "filter..process-timeout".

Sidenote: we prefer camelCase rather than kebab-case for config
variables, that is, "filter..processTimeout".

Also, how would one set default value of timeout for all process
based filters?

> 
> What do you think about this solution?

I think this addition be done after, assuming that we come up
with good default behavior (e.g. wait for filter processes
to finish).

Also, we would probably want to add some progress information
in a similar way to progress info for checkout, that is display
it after a few seconds waiting.

This could be, for example:

  Waiting for filter '' to finish... done

With timeout it could look like this (where underlined part
is interactive, that is changes every second):

  Waiting 10s for '' filter process to finish.
  ^^^

And then either

  Filter '' killed

or

  Filter '' finished


> diff --git a/run-command.c b/run-command.c
> index 3269362..a933066 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -21,6 +21,8 @@ void child_process_clear(struct child_process *child)
>  
>  struct child_to_clean {
>   pid_t pid;
> + int stdin;
> + int timeout;
>   struct child_to_clean *next;
>  };
>  static struct child_to_clean *children_to_clean;
> @@ -28,9 +30,30 @@ static int installed_child_cleanup_handler;
>  
>  static void cleanup_children(int sig, int in_signal)
>  {
> + int status;
> + struct timeval tv;
> + time_t secs;
> +
>   while (children_to_clean) {
>   struct child_to_clean *p = children_to_clean;
>   children_to_clean = p->next;
> +
> + if (p->timeout != 0 && p->stdin > 0)
> + close(p->stdin);

Why are you not closing stdin of filter driver process if timeout
is zero?  Is it maybe some kind of special value?  If it is, this
is not documented.

> +
> + if (p->timeout < 0) {
> + // Wait until the process finishes
> + while ((waitpid(p->pid, , 0)) < 0 && errno == 
> EINTR)
> + ;   /* nothing */

Ah, this loop is here because waiting on waitpid() can be interrupted
by the delivery of a signal to the calling process; though the result
is -1, not just any < 0.

Looks good (but we would want some progress information, probably).

> + } else if (p->timeout != 0) {
> + // Wait until the process finishes or timeout
> + gettimeofday(, NULL);
> + secs = tv.tv_sec;
> + while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < 
> p->timeout) {
> + gettimeofday(, NULL);
> + }

WTF?  Busy wait loop???

Also, if we want to wait for child without blocking, then instead
of cryptic getpgid(p->pid) maybe use waitpid(p->pid, , WNOHANG);
it is more explicit.

There is also another complication: there can be more than one
long-running filter driver used.  With this implementation we
wait for each of one in sequence, e.g. 10s + 10s + 10s.

> + }
> +
>   

Re: [PATCH v2 11/11] i18n: difftool: mark warnings for translation

2016-10-01 Thread Jakub Narębski
W dniu 26.09.2016 o 01:21, Junio C Hamano pisze:
> Vasco Almeida <vascomalme...@sapo.pt> writes:
> 
>> -warn << 'EOF';
>> +warn __ <<'EOF';
>>  Combined diff formats ('-c' and '--cc') are not supported in
>>  directory diff mode ('-d' and '--dir-diff').
>>  EOF
> 
> Wow, didn't imagine gettext would pick this up.  Nice.

>From gettext documentation:
https://www.gnu.org/software/gettext/manual/gettext.html

 15 Other Programming Languages
 [...]
 15.5 Individual Programming Languages
 [...]
 15.5.18 Perl
 [...]
 15.5.18.4 What are Strings And Quote-like Expressions?

 Perl offers a plethora of different string constructs. Those that
 can be used either as arguments to functions or inside braces for
 hash lookups are generally supported by xgettext.

 [...]

 * here documents 

   print gettext <<'EOF';
   program not found in $PATH
   EOF

   print ngettext <<EOF, <<"EOF";
   one file deleted
   EOF
   several files deleted
   EOF

 Here-documents are recognized. If the delimiter is enclosed in single
 quotes, the string is not interpolated. If it is enclosed in double
 quotes or has no quotes at all, the string is interpolated.

 Delimiters that start with a digit are not supported!

-- 
Jakub Narębski



Re: [PATCH v2 09/11] i18n: send-email: mark warnings and errors for translation

2016-10-01 Thread Jakub Narębski
W dniu 31.08.2016 o 14:31, Vasco Almeida pisze:

> Mark warnings, errors and other messages for translation.
>

Which discovered a few places with questionable code...

(though there is one place with bad handling of translation)
 
> Signed-off-by: Vasco Almeida <vascomalme...@sapo.pt>
> ---
>  git-send-email.perl | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 2521832..e7f712e 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -118,20 +118,20 @@ sub format_2822_time {
>   my $localmin = $localtm[1] + $localtm[2] * 60;
>   my $gmtmin = $gmttm[1] + $gmttm[2] * 60;
>   if ($localtm[0] != $gmttm[0]) {
> - die "local zone differs from GMT by a non-minute interval\n";
> + die __("local zone differs from GMT by a non-minute 
> interval\n");
>   }
>   if ((($gmttm[6] + 1) % 7) == $localtm[6]) {
>   $localmin += 1440;
>   } elsif ((($gmttm[6] - 1) % 7) == $localtm[6]) {
>   $localmin -= 1440;
>   } elsif ($gmttm[6] != $localtm[6]) {
> - die "local time offset greater than or equal to 24 hours\n";
> + die __("local time offset greater than or equal to 24 hours\n");

As one can see that this is the same message as below, so I wondered
why the same test is repeated... But it is even worse.  This test
is first, wrongly described: it checks that the day of week is the
same in local timezone and in UTC / GMT.  But second, it is WRONG!

For example if UTC time is just before midnight, then any positive
timezone has different day of week (next day); if UTC is just before
midnight, then any negative timezone has different day of week (previous
day).

  $ LC_ALL=C TZ=US/Hawaii date --date="2016-10-01 01:00 UTC"
  Fri Sep 30 15:00:00 HST 2016
  $ LC_ALL=C TZ=US/Hawaii date --date="2016-10-01 01:00 UTC" --utc
  Sat Oct  1 01:00:00 UTC 2016


>   }
>   my $offset = $localmin - $gmtmin;
>   my $offhour = $offset / 60;
>   my $offmin = abs($offset % 60);
>   if (abs($offhour) >= 24) {
> - die ("local time offset greater than or equal to 24 hours\n");
> + die __("local time offset greater than or equal to 24 hours\n");

This is good test.

>   }
>  
>   return sprintf("%s, %2d %s %d %02d:%02d:%02d %s%02d%02d",
> @@ -199,13 +199,13 @@ sub do_edit {
>   map {
>   system('sh', '-c', $editor.' "$@"', $editor, $_);
>   if (($? & 127) || ($? >> 8)) {
> - die("the editor exited uncleanly, aborting 
> everything");
> + die(__("the editor exited uncleanly, aborting 
> everything"));
>   }
>   } @_;
>   } else {
>   system('sh', '-c', $editor.' "$@"', $editor, @_);
>   if (($? & 127) || ($? >> 8)) {
> - die("the editor exited uncleanly, aborting everything");
> + die(__("the editor exited uncleanly, aborting 
> everything"));
>   }
>   }

Here we see some unnecessary code duplication, and thus
message duplication.


> @@ -1410,7 +1410,7 @@ Message-Id: $message_id
>   $smtp->code =~ /250|200/ or die "Failed to send 
> $subject\n".$smtp->message;
>   }
>   if ($quiet) {
> - printf (($dry_run ? "Dry-" : "")."Sent %s\n", $subject);
> + printf (($dry_run ? "Dry-" : ""). __("Sent %s\n"), $subject);
>   } else {
>   print (($dry_run ? "Dry-" : ""). __("OK. Log says:\n"));
>   if (!file_name_is_absolute($smtp_server)) {

You would want to translate Dry-Sent and Dry-OK.

-- 
Jakub Narębski



Re: [PATCH v2 08/11] i18n: send-email: mark strings for translation

2016-10-01 Thread Jakub Narębski
W dniu 26.09.2016 o 01:18, Junio C Hamano pisze:
> Vasco Almeida <vascomalme...@sapo.pt> writes:
> 
>> @@ -1403,7 +1412,7 @@ Message-Id: $message_id
>>  if ($quiet) {
>>  printf (($dry_run ? "Dry-" : "")."Sent %s\n", $subject);
>>  } else {
>> -print (($dry_run ? "Dry-" : "")."OK. Log says:\n");
>> +print (($dry_run ? "Dry-" : ""). __("OK. Log says:\n"));
> 
> I am not sure about this change.  We either say Dry-Sent/Dry-OK
> (under --dry-run) or "Sent/OK"; don't you want "dry-" part also
> translated?

Actually, we would probably want to unfold it, that is use 
__("Dry-OK. Log says:\n") and __("OK. Log says:\n"), etc.

-- 
Jakub Narębski



  1   2   3   4   5   >