from:"Torsten Bögershausen"

Re: What's cooking in git.git (Dec 2017, #01; Mon, 4)

2017-12-09 Thread Torsten Bögershausen

On Thu, Dec 07, 2017 at 04:33:12PM -0500, Todd Zullinger wrote:
> Jeff Hostetler wrote:
> >I'm looking at t5616 now on my mac.
> >Looks like the MAC doesn't like my line counting in the tests.
> >I'll fix in my next version.
> 
[]
>   | sort >expect_2.oids &&
> - test "$(wc -l  + test_line_count = 8 expect_2.oids &&
>   git -C src blame master -- file.1.txt >expect.blame
> '


The problem seems to be the '"' around wc, this would work:
test $(wc -l ) {
/^\s*declare\s+/ and err 'arrays/declare not portable';
/^\s*[^#]\s*which\s/ and err 'which is not portable (please use type)';
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (please use 
=)';
+   /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable, please use 
`$(wc -l)`';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(please use FOO=bar && expo

Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Torsten Bögershausen

On Fri, Dec 08, 2017 at 10:21:19AM -0800, Junio C Hamano wrote:
> Junio C Hamano <gits...@pobox.com> writes:
> 
> > tbo...@web.de writes:
> >
> >> From: Torsten Bögershausen <tbo...@web.de>
> >>
> >> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
> >> remove the empty path spec.
> >>
> >> Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> >> ---
> >>  t/t0027-auto-crlf.sh | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > This looks a bit strange.  The intent seems to commit all changes
> > made in the working tree, so I'd understand it if it replaced the
> > empty string with a single dot.
> >
> > I also thought that we deprecated use of an empty string as "match
> > all" pathspec recently, and expected that this to break.
> >
> > ... goes and looks ...
> >
> > Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
> > element", 2017-06-23) does exactly that.
> 
> OK, I think I can safely omit this patch, because
> 
>  (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf,
>  because ex/deprecate-empty-pathspec-as-match-all is not yet in
>  the topic, an empty pathspec still means "match all" and
>  nothing breaks; and
> 
>  (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any
>  branch with ex/deprecate-empty-pathspec-as-match-all, the topic
>  already fixes what this 1/2 tries to
> 
> Thanks for being thorough, though.
> 

Sure, the credit goes 100% to you:

commit 229a95aafa77b583b46a3156b4fad469c264ddfd
Author: Junio C Hamano <gits...@pobox.com>
Date:   Fri Jun 23 11:04:14 2017 -0700

t0027: do not use an empty string as a pathspec element

My brain did just assume that this had mad it to master, sorry for the noise

Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-07 Thread Torsten Bögershausen

> * tb/check-crlf-for-safe-crlf (2017-11-27) 1 commit
>   (merged to 'next' on 2017-12-05 at 7adaa1fe01)
>  + convert: tighten the safe autocrlf handling
> 
>  The "safe crlf" check incorrectly triggered for contents that does
>  not use CRLF as line endings, which has been corrected.
> 
>  Broken on Windows???
>  cf. 

Yes, broken on Windows. A fix is coming the next days.

Re: What's cooking in git.git (Dec 2017, #01; Mon, 4)

2017-12-06 Thread Torsten Bögershausen

On 2017-12-06 16:14, Lars Schneider wrote:
> 
>> On 04 Dec 2017, at 22:46, Junio C Hamano  wrote:
>>
>>
>> * tb/check-crlf-for-safe-crlf (2017-11-27) 1 commit
>> - convert: tighten the safe autocrlf handling
>>
>> The "safe crlf" check incorrectly triggered for contents that does
>> not use CRLF as line endings, which has been corrected.
>>
>> Will merge to 'next'.
>>
> 
> Looks like t0027-auto-crlf.sh is failing on Windows:
> https://travis-ci.org/git/git/jobs/312135514
> 
> Could that patch be related?
> 
> - Lars
> 
Chances are high, I will have a look.
Thanks for noticing.

Re: [PATCH 1/1] convert: tighten the safe autocrlf handling

2017-11-24 Thread Torsten Bögershausen

On Fri, Nov 24, 2017 at 12:24:48PM -0500, Eric Sunshine wrote:
> On Fri, Nov 24, 2017 at 11:14 AM,  <tbo...@web.de> wrote:
> > When a text file had been commited with CRLF and the file is commited
> > again, the CRLF are kept if .gitattributs has "text=auto".
> > This is done by analyzing the content of the blob stored in the index:
> > If a '\r' is found, Git assumes that the blob was commited with CRLF.
> >
> > The simple search for a '\r' does not always work as expected:
> > A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary.
> > Now the content is converted into UTF-8. At the next commit Git treats the
> > file as text, the CRLF should be converted into LF, but isn't.
> >
> > Solution:
> > Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found,
> > 0 is returned directly, this is the most common case.
> > If a '\r' is found, the content is analyzed more deeply.
> >
> > Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> > ---
> > diff --git a/convert.c b/convert.c
> > @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum 
> > crlf_action crlf_action,
> > -static int has_cr_in_index(const struct index_state *istate, const char 
> > *path)
> > +static int has_crlf_in_index(const struct index_state *istate, const char 
> > *path)
> >  {
> > unsigned long sz;
> > void *data;
> > -   int has_cr;
> > +   const char *crp;
> > +   int has_crlf = 0;
> >
> > data = read_blob_data_from_index(istate, path, );
> > if (!data)
> > return 0;
> > -   has_cr = memchr(data, '\r', sz) != NULL;
> > +
> > +   crp = memchr(data, '\r', sz);
> > +   if (crp && (crp[1] == '\n')) {
> 
> If I understand correctly, this isn't a NUL-terminated string and it
> might be a binary blob, so if the lone CR in a file resides at the end
> of the file, won't this try looking for LF out-of-bounds? I would have
> expected the conditional to be:
> 
> if (crp && crp - data + 1 < sz && crp[1] == '\n') {
> 
> or any equivalent variation.
> 

The read_blob_data_from_index() function should always append a '\0',
regardless if the blob is binary or not.

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-23 Thread Torsten Bögershausen

On Thu, Nov 23, 2017 at 10:01:40PM +0530, Ashish Negi wrote:
> Thanks for confirming.
> Is it possible to track this via a bug number ?
> It will help me to try out the fix when its available.
> 

No worry, the fix is nearly complete and will come out in a couple of days.

Re: RFC: Native clean/smudge filter for UTF-16 files

2017-11-23 Thread Torsten Bögershausen

On Thu, Nov 23, 2017 at 04:18:59PM +0100, Lars Schneider wrote:
> Hi,
> 
> I am working with a team that owns a repository with lots of UTF-16 files.
> Converting these files to UTF-8 is no good option as downstream applications
> require the UTF-16 encoding. Keeping the files in UTF-16 is no good option
> either as Git and Git related tools (e.g. GitHub) consider the files binary
> and consequently do not render diffs.
> 
> The obvious solution is to setup a clean/smudge filter like this [1]:
> [filter "winutf16"]
> clean = iconv -f utf-16 -t utf-8
> smudge = iconv -f utf-8 -t utf-16
> 
> In general this works well but the "per-file" clean/smudge process invocation 
> can be slow for many files. I could apply the same trick that we used for Git
> LFS and write a iconv that processes all files with a single invocation (see
> "edcc85814c convert: add filter..process option" [2]). 
> 
> Alternatively, I could add a native attribute to Git that translates UTF-16 
> to UTF-8 and back. A conversion function is already available in "mingw.h" [3]
> on Windows. Limiting this feature to Windows wouldn't be a problem from my
> point of view as UTF-16 is only relevant on Windows anyways. The attribute 
> could look like this:
> 
> *.txttext encoding=utf-16
> 

I would probably vote for UTF-16LE.
But we can specify whatever iconv allows, see below.

[]
> Do you think a patch that converts UTF-16 files to UTF-8 via an attribute
> "encoding=utf-16" on Windows would have a chance to get accepted?

Yes.
The thing is that we have convert.c and utf8.c (reencode_string_iconv()
and possible strbuf.c, which feels that they are in charge here.

Having a path using mingw.h would mean that this is Windows only,
and that is not a good solution.
(I sometimes host files on a Linux box, export them via SAMBA to
 Mac and Windows. Having that kind of setup allows to commit
 directly on the Linux fileserver)

But even if you don't have that setup, cross platform is a must, I would say.
There may be possible optimizations using xutftowcsn() and friends under
Windows, but they may come later into the picture (or are not needed)
What file sizes are we talking about ?
100K ?
100M ?

It is even possible to hook into the streaming interface.

> 
> Thanks,
> Lars
>

Re: core.safecrlf warning is confusing[improvement suggestion?]

2017-11-22 Thread Torsten Bögershausen

On Wed, Nov 22, 2017 at 11:01:22AM +0900, Junio C Hamano wrote:
> Torsten Bögershausen <tbo...@web.de> writes:
> 
> >> I want to have LF line endings in the repository and CRLF endings in
> >> the working copy. (Because I use windows-exclusive tools to develop.)
> >
> > Side note: If you ever want to push your repository somewhere,
> > it would be good practice to have a .gitattributes file:
> > ...
> 
> Now we got your attention ;-)
> 
> What would be the BCP we would give if somebody has just a tarball
> without .git that has LF endings?
> 
> $ git init a-project
> $ cd a-project
> $ tar xf ../a-project.tar
> $ git add .
> $ git commit -m 'Initial import'

There is room for small improvements:
 $ cd /tmp
 $ git init a-project
 $ cd a-project
 $ tar xf ../a-project.tar
 $ git -c core.autocrlf=false add .
 $ git commit -m 'Initial import'
 # Make up your mind: is it truly cross-platform ?
   $ echo "* text=auto" >.gitattributes
   # E.g. if you have shell scripts:
   $ echo "*.sh text eol=lf" >>.gitattributes
   # E.g. if you are a git developer:
   $ echo "/GIT-VERSION-GEN eol=lf" >>.gitattributes   
  # Or, is it e.g. a project where a tool needs some line endings
  # visual studio is one example, there are many others:
   $ echo "* -text" >.gitattributes
  # in any case, we need to commit: 
  $ git add .gitattributes && git commit -m "Add .gitattributes"

# Now we have the repo. I we don't want the hammer, simply clone it:
 $ cd $HOME
 $ git clone /tmp/a-project

That should work for project small enough not to fill the disk.
And other adjustments may be needed to the .gitattributes file.
A final check with
$ git ls-files --eol
may give inspiration.

> 
> would achieve one half of the original wish (i.e. "I want to end up
> with repository data in LF eol"); disabling the "safe crlf" before
> running that "git add ." step may also not be a bad idea, because it
> reduces the number of things that can get in the way by one.
> 
> But the above also leaves the "working tree" files in LF eol
> (i.e. it goes against "I want to work with CRLF in my working
> tree").  What would be our recommendation?
> 
> One big-hammer way I can think of is
> 
> $ git rm -f .
> $ git reset --hard
> 
> and that actually may be a good enough solution, given that you'd be
> doing this just once at the beginning of "your" project that starts
> from an inherited code drop.

Re: core.safecrlf warning is confusing[improvement suggestion?]

2017-11-21 Thread Torsten Bögershausen

On Tue, Nov 21, 2017 at 01:18:30PM +0800, Vladimir Nikishkin wrote:
> Hello, everyone.
> 
> I have the following question.
> 
> So I have a fresh git repository after git init, on Windows.
> 
> core.autocrlf is true explicitly, and core.safecrlf is true implicitly.
> 
> I want to have LF line endings in the repository and CRLF endings in
> the working copy. (Because I use windows-exclusive tools to develop.)

Side note: If you ever want to push your repository somewhere,
it would be good practice to have a .gitattributes file:

echo "* text=auto" >>.gitattributes
git add .gitattributes
git commit -m "add .gitattributes"

This can be useful if the repo is pulled to a person who has
core.autocrlf=false, if case [s]he is creating files with CRLF in the
worktree.

> 
> But for start I have my code with LF endings, because I got it from a
> fellow developer, who develops on UNIX, with LF line endings.
> 
> What I expect git to do:
> Commit files as is (LF), keep my files in the working directory with
> LF, but after issuing 'git checkout master' have them converted to
> CRLF (since this is a checkout procedure).
> 
> So I put the source in the working directory and tell git to make
> 
> git diff --stat

As Junio pointed out, it is probably not git diff saying this ?

Typically "git add" gives the warning.

>
> and I see the (ambiguous) warnings:
> 
> 'warning: LF will be replaced by CRLF in filename.m.
> The file will have its original line endings in you working directory.'
> 
> How I read them: "even though you have core.autocrlf=true, LF will be
> replaced by CRLF and the repository will store CRLF files. However,
> after you checkout them again, the CRLFs will be converted back to
> LF(because the files will have original line endings in the working
> directory.)"
> 
>  I feel like it's the opposite of what is actually happening.
> 
> So, would it make sense to change the warning message to? :
> 
> 'warning: When you next checkout this commit, your code will have CRLF
> line endings. However, right now your files will not be altered.'

I could agree.
Do you want to send a patch for this ?
(The code to be changd is in convert.c, plus some test cases)

> 
> -- 
> Yours sincerely, Vladimir Nikishkin

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-16 Thread Torsten Bögershausen

On Thu, Nov 16, 2017 at 12:35:33AM +0530, Ashish Negi wrote:
> On windows :
> > git --version
> git version 2.14.2.windows.2
> 
> On linux :
> > git --version
> git version 2.7.4
> 
> I would like to understand the solution :
> If i understood it correctly : it removes file_name.txt from index, so
> git forgets about it.
> we then add the file again after changing encoding. This time, git
> takes it as utf-8 file and converts crlf to lf when storing it
> internally.
> Right ?

Yes, exactly.
(In a coming release of Git there will be a "git add --renormalize " )

> 
> Thank you for the support.
> 

Thanks for a clean bug report.
Actually it is a bug, I put it on my to do list

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-15 Thread Torsten Bögershausen

On Wed, Nov 15, 2017 at 01:41:42PM +0530, Ashish Negi wrote:
> > If you commit the file, it will be stored with LF in the index,
> This is what i believe is not happening.
> 
> Lets do this with a public repository and steps which are reproducible.
> I have created a repo : https://github.com/ashishnegi/git_encoding
> 
> If you clone this repo in linux and run `git status`, you will find
> that file is modified.

This is what what I get:

 git ls-files --eol
i/lfw/lfattr/text=auto  .gitattributes
i/crlf  w/crlf  attr/text=auto  file_name.txt

(And you get the same, I think)
Newer versions of Git (>=2.10) will -not- treat file_name.txt as changed,
older versions do.
What does "git --version" say on your Linux machine ?

However, if you want to fix it, you want to either end up with

A)
git ls-files --eol
i/lfw/lfattr/text=auto  .gitattributes
i/lfw/crlf  attr/text=auto  file_name.txt

or
B)
git ls-files --eol
i/lfw/lfattr/text=auto  .gitattributes
i/crlf  w/crlf  attr/-text  file_name.txt

(The "attr/-text" means "don't change the line endings")

Both A) or B) will work, typically A) is preferred.

You should be able to achive A) :
 git rm --cached file_name.txt  && git add file_name.txt 
rm 'file_name.txt'
warning: CRLF will be replaced by LF in file_name.txt.
The file will have its original line endings in your working directory.

git ls-files --eol
i/lfw/lfattr/text=auto  .gitattributes
i/lfw/crlf  attr/text=auto  file_name.txt

[snip the rest]

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-14 Thread Torsten Bögershausen

(Back to the beginning)

You have a file ApplicationManifest.xml
It is encoded in UTF-16 (and has CRLF)

You convert it into UTF-8
The file has still CRLF (in the worktree)

Now you add it and make a commit.
Under both Linux and Windows you have "text=auto".

I assume that you have efficiently core.eol=lf under Linux
and core.eol=crlf on Windows.

(That is the default, when you don't change anything)

Now, what happens to the CRLF?
If you commit the file, it will be stored with LF in the index,
on both systems.
On checkout, Windows will convert them into CRLF, but Linux will not.

That why you see
>On linux, during committing i get warning : warning: CRLF will be
>replaced by LF in …file_name..

All in all there is nothing wrong, at least as I see it.

The question remains:
Do you need CRLF in Linux ?
Probably not, but if yes, plase add a line

*.xml text eol=crlf

to your
.gitattributes

Otherwise your .gitconfig looks good to  me.

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-14 Thread Torsten Bögershausen

On 2017-11-14 17:13, Ashish Negi wrote:
> Running the command gives me :
> 
>   git ls-files --eol file_name
>   i/-text w/-text attr/text=auto  file_name
> 

That is strange to me:
According to that, Git would treat the file as text=auto.
And the content is "not next", so there is no need to convert.
Do you have configured any filters ?

Is this a public repo ?
if not: Is there any chance that you provide a public example,
so that we can have a look ?

Re: Changing encoding of a file : What should happen to CRLF in file ?

2017-11-14 Thread Torsten Bögershausen

On 2017-11-14 13:31, Ashish Negi wrote:
> Hello
> 
> I have a cross platform project. I have a utf-16 file in it.
> I changed its encoding to urf-8 and committed. When i pulled the file
> in Linux, it shows that file is modified. This means that the commit
> which changed the encoding does not convert crlf to lf, when new
> format is text based (utf-8).
> 
> Steps to reproduce:
> 
>In windows :
> 
> Change encoding of file from utf-16 to utf-8.
> Commit the change.
> 
>In linux:
> 
> Pull your branch.
> You will see the issue of file being shown as modified even though
> you have not used it.
> 
> 
> If i change the file encoding in linux and commit. Then if i do git
> pull in windows, i don't see the issue.
> In linux, during committing i get warning : warning: CRLF will be
> replaced by LF in …file_name..
> 
> Here are my configuration :
> 
> 
>> git config --global --get core.autocrlf
> 
> false
> 
> 
>> git config  --get core.autocrlf
> 
> false
> 
> 
> 
>> cat E:\work\WindowsFabric\.gitattributes
> 
> 
> # Set the default behavior, in case people don't have core.autocrlf set.
> 
> * text=auto
> *.vcxproj eol=crlf
> *.sh  eol=lf
> 
> # Denote all files that are truly binary and should not be modified.
> *.exe binary
> *.dll binary
> *.pdb binary
> *.ico binary
> *.png binary
> *.jpg binary
> 
> 
>> git --version
> git version 2.14.2.windows.2
> 
> 
> I played around with core.autocrlf values like true and native, but
> that didn't help.
> 
> The behavior is inconsistent across platforms, and windows version is
> giving me problem.
> 
> Can someone suggest right settings for this ? Or is this a bug in Git.
> 

I don't think it is a bug :-)
What does
git ls-files --eol …file_name
give you under Windows ?

Re: [PATCH v2] gpg-interface: strip CR chars for Windows gpg2

2017-11-12 Thread Torsten Bögershausen

On Sun, Nov 12, 2017 at 01:07:10PM +, Jerzy Kozera wrote:
> This fixes the issue with newlines being \r\n and not being displayed
> correctly when using gpg2 for Windows, as reported at
> https://github.com/git-for-windows/git/issues/1249
> 
> Issues with non-ASCII characters remain for further investigation.
> 
> Signed-off-by: Jerzy Kozera 
> ---
> 
> Addressed comments by Junio C Hamano (check for following \n, and
> updated the commit description).
> 
>  gpg-interface.c | 27 ++-
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/gpg-interface.c b/gpg-interface.c
> index 4feacf16e5..ab592af7f2 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -145,6 +145,20 @@ const char *get_signing_key(void)
>   return git_committer_info(IDENT_STRICT|IDENT_NO_DATE);
>  }
>  
> +/* Strip CR from the CRLF line endings, in case we are on Windows. */
> +static void strip_cr(struct strbuf *buffer, size_t bottom) {
> + size_t i, j;

It is not wrong to say "Strip CR from the CRLF".
In Git we often talk about "convert CRLF into LF",

The comment somewhat different to the function name.
The function namd and the name of the parameters can be more in
in line with existing strbuf functions:
(And the opening '{' should go into it's own line:

static void convert_crlf_to_lf(struct strbuf *sb, size_t len)
{
size_t i, j;

An even more generic approach (could be done in a seperate commit)
would be to move the whole function into strbuf.c/strbuf.h,
and it may be called like this.

void strbuf_crlf_to_lf(struct strbuf *sb, size_t len)
{
  /* I would even avoid "i" and "j", and use src and dst or so) */
  size_t src_pos, dst_idx;
}

Thanks for working on this.

[]

Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-12 Thread Torsten Bögershausen

On Fri, Nov 10, 2017 at 09:22:10AM +0900, Junio C Hamano wrote:

> 
> Taking these altogether, perhaps
> 
> Apply the "clean" process freshly to all tracked files to
> forcibly add them again to the index.  This is useful after
> changing `core.autocrlf` configuration or the `text` attribute
> in order to correct files added with wrong CRLF/LF line endings.
> This option implies `-u`.
> 
> Thanks.

OK with the text - I can send a new version the next days.

Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-09 Thread Torsten Bögershausen

[]
> 
> If we had such a term in Documentation/glossary-contents.txt, we
> could even say
> 
>   Add contents of all paths to the index by freshly applying
>   the "clean" process, even to the ones Git may think are
>   unmodified in the working tree since they were added the
>   last time (based on the file timestamps etc.).  This is
>   often useful after updating settings like `core.autocrlf` in
>   the `.git/config` file and the `text` attributes in the
>   `.gitattributes` file to correct the index entries that
>   records lines with CRLF to use LF instead, or changing what
>   the `clean` filter does.  This option implies `-u`.
> 
> The point is to express that the CRLF/LF is a consequence (even
> though it may be the most prominent one from end-users' point of
> view) of a larger processing.

Here is a somwhat shorter description:

Apply the "clean" process freshly to all tracked files.
This is useful after changing `core.autocrlf` or the `text`
attributes in the `.gitattributes` file because
Git may not consider these files as changed.
Correct the files that had been commited with CRLF,
they will from now on have LF instead.
Re-run what the `clean` filter does.
This option implies `-u`.


> 
> > [snip the TC. Adding line endings is good)
> 
> What is TC in this context?

Sorry for confusion: TC means test case.

Re: [PATCH v2 1/1] Introduce git add --renormalize .

2017-11-07 Thread Torsten Bögershausen

[snip]
> 
> > @@ -175,6 +175,12 @@ for "git add --no-all ...", i.e. ignored 
> > removed files.
> > warning (e.g., if you are manually performing operations on
> > submodules).
> >  
> > +--renormalize::
> > +   Normalizes the line endings from CRLF to LF of tracked files.
> > +   This applies to files which are either "text" or "text=auto"
> > +   in .gitattributes (or core.autocrlf is true or input)
> > +   --renormalize implies -u
> > +
> 
> OK.

I think the fact, that clean filters are re-run, and re-evaluated
in case they are changed, should be made more clear here.
I don't know how to explain it better that CRLF conversion and/or filters are
re-applied, this is an attempt:


--renormalize::
Normalizes the line endings from CRLF to LF of tracked files,
if the .gitattributes or core.autocrlf say so.
Additionally the clean and ident filters, if any, are re-run.
--renormalize implies -u





> 
> > +static int renormalize_tracked_files(const struct pathspec *pathspec, int 
> > flags)
> > +{
> > +   int i, retval = 0;
> > +
> > +   for (i = 0; i < active_nr; i++) {
> > +   struct cache_entry *ce = active_cache[i];
> > +
> > +   if (ce_stage(ce))
> > +   continue; /* do not touch unmerged paths */
> > +   if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> > +   continue; /* do not touch non blobs */
> > +   if (pathspec && !ce_path_match(ce, pathspec, NULL))
> > +   continue;
> > +   retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
> 
> We are removing the entry and then adding the same entry under the
> same name, and iteration over the active_cache[] from 0 through
> active_nr should be safe, I guess.
> 
> > ...
> > -   add_new_files = !take_worktree_changes && !refresh_only;
> > +   add_new_files = !take_worktree_changes && !refresh_only && 
> > !add_renormalize;
> 
> If renormalize is given, we will *not* take new files, good.
> 
> > @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char 
> > *prefix)
> >  
> > plug_bulk_checkin();
> >  
> > -   exit_status |= add_files_to_cache(prefix, , flags);
> > +   if (add_renormalize)
> > +   exit_status |= renormalize_tracked_files(, flags);
> > +   else
> > +   exit_status |= add_files_to_cache(prefix, , flags);
> >  
> > if (add_new_files)
> > exit_status |= add_files(, flags);
> 
> OK.
> 
> > ...
> > int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
> >   (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> > +   int newflags = HASH_WRITE_OBJECT;
> > +
> > +   if (flags & HASH_RENORMALIZE)
> > +   newflags |= HASH_RENORMALIZE;
> > ...
> > @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const 
> > char *path, struct stat *st,
> > if (ignore_case) {
> > adjust_dirname_case(istate, ce->name);
> > }
> > +   if (!(flags & HASH_RENORMALIZE)) {
> > +   alias = index_file_exists(istate, ce->name,
> > + ce_namelen(ce), ignore_case);
> > +   if (alias &&
> > +   !ce_stage(alias) &&
> > +   !ie_match_stat(istate, alias, st, ce_option)) {
> > +   /* Nothing changed, really */
> > +   if (!S_ISGITLINK(alias->ce_mode))
> > +   ce_mark_uptodate(alias);
> > +   alias->ce_flags |= CE_ADDED;
> 
> OK, so RENORMALIZE option forces the code to skip the "does the path
> exist already?  maybe we can do without adding it?" check.
> 
> > if (!intent_only) {
> > -   if (index_path(>oid, path, st, HASH_WRITE_OBJECT)) {
> > +   if (index_path(>oid, path, st, newflags)) {
> 
> And then we do hash it.  We later do add_index_entry() on this thing
> and we have OK_TO_REPLACE bit in the add_option, so we are good to go.
> 
> > diff --git a/sha1_file.c b/sha1_file.c
> > index 10c3a0083d..15abb184c2 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const 
> > unsigned char *sha1)
> > return NULL;
> >  }
> >  
> > +
> > +static enum safe_crlf get_safe_crlf(unsigned flags)
> > +{
> > +   if (flags & HASH_RENORMALIZE)
> > +   return SAFE_CRLF_RENORMALIZE;
> > +   else if (flags & HASH_WRITE_OBJECT)
> > +   return safe_crlf;
> > +   else
> > +   return SAFE_CRLF_FALSE;
> > +}
> > +
> > +
> >  int mkdir_in_gitdir(const char *path)
> >  {
> > if (mkdir(path, 0777)) {
> > @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, 
> > size_t size,
> > if ((type == OBJ_BLOB) && path) {
> > struct strbuf nbuf = STRBUF_INIT;
> > if (convert_to_git(_index, path, buf, size, ,
> > -  write_object ? safe_crlf : SAFE_CRLF_FALSE)) 
> > {
> > +  get_safe_crlf(flags))) {
> >

Re: [Bug Report] [includeIf] does not parse case insensitive -> case sensitive symlink gitdir

2017-10-27 Thread Torsten Bögershausen

On Fri, Oct 27, 2017 at 09:55:58AM -0400, Adrian Kappel wrote:
> Hello all, not sure if the issue I've come across is a known bug or
> addressable, but wanted to report it just in-case.

Thanks for the detailed description - my question is inline

> 
> 
> ** Summary
> --
> Using the [includeIf] configuration statement with a symlink'd gitdir
> will not work if the symlink is on a case insensitive volume and the
> location it references is on a case sensitive volume.
> 
> ** Steps to reproduce
> --
> 1. Create symlink (case insensitive -> case sensitive):
> /Users/usera/dev -> /Volumes/CaseSensitive/dev
> 2. Create two files: .gitignore and .gitignore-work, both stored in
> /Users/usera/
> 
> .gitconfig
> -
> [user]
>   name = First Last
> 
>   [includeIf "gitdir:~/dev/work"]
> path = .gitconfig-work
> 
> .gitconfig-work
> 
> [user]
>   email = em...@address.com
> 
> 3. cd into a subfolder of ~/dev/work that has been git initialized.
> Let's say ~/dev/work/repo
> 4. Run git config --includes user.email
> 5. See that nothing is output from the command
> 6. Update the [includeIf] statement in .gitconfig to be the real
> location i.e. "gitdir:/Volumes/CaseSensitive/dev/work/repo"

Didn't you set it up pointing do the real location ?
That is what is written above:
> 1. Create symlink (case insensitive -> case sensitive):
> /Users/usera/dev -> /Volumes/CaseSensitive/dev

(I suspect that people use something like this:
 /Users/usera/dev -> /Volumes/casesensitive/dev
 And in this case it would be the file system which says
 casesensitive != CaseSensitive and Git can't do much about it)

> 7. Rerun the command from [4]
> 8. See that em...@address.com is output from the command
> 
> ** Other variations that were not tested
> --
> - symlink on case sensitive volume referencing a location on a case
> insensitive volume
> 
> ** Environment Information
> --
> git --version: 2.14.1
> OS: macOS Sierra 10.12.6
> 
> 
> If a fix is not feasible or likely to be implemented, I would
> recommend that we update the git site's documentation to reflect this
> gotcha. After verification of course.
> 
> Best,
> Adrian Kappel
> akappel

Re: Consequences of CRLF in index?

2017-10-26 Thread Torsten Bögershausen

On Thu, Oct 26, 2017 at 01:01:25PM +0200, Lars Schneider wrote:
> 
> > On 26 Oct 2017, at 09:09, Johannes Sixt  wrote:
> > 
> > Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> >> I envy you for the blessing of such a clean C++ source that you do not
> >> have any, say, Unix shell script in it. Try this, and weep:
> >>$ printf 'echo \\\r\n\t123\r\n' >a1
> >>$ sh a1
> >>a1: 2: a1: 123: not found
> > 
> > I was bitten by that, too. For this reason, I ensure that shell scripts and 
> > Makefiles begin their life on Linux. Fortunately, modern editors on 
> > Windows, includ^Wand vi, do not force CRLF line breaks, and such files can 
> > be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?
> 
> * -text
> *.sh   text eol=lf
> 

Yes, exactly. and for the snake-lovers:

*.py   text eol=lf

Re: Consequences of CRLF in index?

2017-10-26 Thread Torsten Bögershausen

On Wed, Oct 25, 2017 at 10:13:57AM -0700, Jonathan Nieder wrote:
> Hi again,
> 
> Lars Schneider wrote:
> >> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:
> 
> >> In any event, you also probably want to declare what you're doing
> >> using .gitattributes.  By checking in the files as CRLF, you are
> >> declaring that you do *not* want Git to treat them as text files
> >> (i.e., you do not want Git to change the line endings), so something
> >> as simple as
> >>
> >>* -text
> >
> > That's sounds good. Does "-text" have any other implications?
> > For whatever reason I always thought this is the way to tell
> > Git that a particular file is binary with the implication that
> > Git should not attempt to diff it.
> 
> No other implications.  You're thinking of "-diff".  There is also a
> shortcut "binary" which simply means "-text -diff".

Not 100% the same, as far as I know.
"binary" means: Don't convert line endings, and there is now way to
do a readable diff.
The only thing to tell the user is: The binary blobs are different.

Then we have "text". The "old" version of "text" was "crlf", which
for some people was more intuitive, and less intuitive for others.
"* crlf" is the same as "* text" and means please convert line endings.
And yes, the file is still line oriented.
"* -crlf" means don't touch the line endings, the file is
line-orinted and diff and  merge will work.
"* -text" is the same as "* -crlf"

> 
> Ideas for wording improvements to gitattributes(5) on this subject?

None from me at the moment.

> 
> Thanks,
> Jonathan

Re: Consequences of CRLF in index?

2017-10-24 Thread Torsten Bögershausen

On Tue, Oct 24, 2017 at 11:14:15AM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Lars Schneider wrote:
> 
> > I've migrated a large repo (110k+ files) with a lot of history (177k 
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a 
> > problem
> > as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
> > However, I wonder if there are any "hidden consequences" of this setup?
> > If there are consequences, then I see two options. Either I rebase the repo
> > and fix the line endings for all commits or I add a single commit that fixes
> > the line endings for all files. Both actions require coordination with the
> > users to avoid repo trouble/merge conflicts. The "single fixup commit" 
> > options
> > would also make diffs into the past look bad. Would a single large commit 
> > have
> > any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
>   * -text
> 
> should do the trick.  See gitattributes(5) for details.

Exactly.
The "hidden consequence" may be the other way around:
If you don't specify .gitattributes, then all people who have core.autocrlf=true
will suffer from a runtime penalty.
core.autocrlf=true means "auto".
At each checkout Git needs to figure out that the file has CRLF in the repo,
so that there is no conversion done.
The same happens on "git add" or "git commit" (and sometimes on "git status").

The penalty may be low, but Dscho once reported that it is measurable & painful
on a "big repo".

The other advantage of "* -text" in .gitattributes is that new files are handled
consistantly accross developers.

Those who have "core.autocrlf=false" would produce commits with CRLF for new
files, and those developpers who have core.autocrlf=true would produce files
with LF in the index and CRLF in the worktree.
This may (most probably will) cause confusion later, when things are pushed and
pulled.

> 
> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

That is a tricky thing.
Sometimes you want to see the CLRF - LF as a diff, (represented as "^M"),
and sometimes not.

All in all "* -text" is a good choice for Windows-only projects.

> 
> Thanks and hope that helps,
> Jonathan

Re: Line ending normalization doesn't work as expected

2017-10-06 Thread Torsten Bögershausen

On Fri, Oct 06, 2017 at 09:33:31AM +0900, Junio C Hamano wrote:
> Torsten Bögershausen <tbo...@web.de> writes:
> 
> > Before we put this into stone:
> > Does it make sense to say "renormalize" instead of "rehash" ?
> > (That term does exist already for merge.
> >  And rehash is more a technical term,  rather then a user-point-of-view 
> > explanation)
> 
> I do not mind "renormalize" at all.
> 
> As to the toy patch, I think it needs to (at least by default) turn
> off the add_new_files codepath, and be allowed to work without any
> pathspec (in which case all tracked paths should be renormalized).
> 

OK, then I will pick up your patch in a couple of days/weeks, and push it 
further then
(Documentation, test cases, other ?)

Re: Line ending normalization doesn't work as expected

2017-10-05 Thread Torsten Bögershausen

 
>  builtin/add.c | 42 +-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index 5d5773d5cd..264f84dbe7 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
>  };
>  static int patch_interactive, add_interactive, edit_interactive;
>  static int take_worktree_changes;
> +static int rehash;
>  
>  struct update_callback_data {
>   int flags;
> @@ -121,6 +122,41 @@ int add_files_to_cache(const char *prefix,
>   return !!data.add_errors;
>  }
>  
> +static int rehash_tracked_files(const char *prefix, const struct pathspec 
> *pathspec,
> + int flags)
> +{
> + struct string_list paths = STRING_LIST_INIT_DUP;
> + struct string_list_item *path;
> + int i, retval = 0;
> +
> + for (i = 0; i < active_nr; i++) {
> + struct cache_entry *ce = active_cache[i];
> +
> + if (ce_stage(ce))
> + continue; /* do not touch unmerged paths */
> + if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> + continue; /* do not touch non blobs */
> + if (pathspec && !ce_path_match(ce, pathspec, NULL))
> + continue;
> + string_list_append(, ce->name);
> + }
> +
> + for_each_string_list_item(path, ) {
> + /*
> +  * Having a blob contaminated with CR will trigger the
> +  * safe-crlf kludge, avoidance of which is the primary
> +  * thing this helper function exists.  Remove it and
> +  * then re-add it.  Note that this may lose executable
> +  * bit on a filesystem that lacks it.
> +  */
> + remove_file_from_cache(path->string);
> + add_file_to_cache(path->string, flags);
> + }
> +
> + string_list_clear(, 0);
> + return retval;
> +}
> +
>  static char *prune_directory(struct dir_struct *dir, struct pathspec 
> *pathspec, int prefix)
>  {
>   char *seen;
> @@ -274,6 +310,7 @@ static struct option builtin_add_options[] = {
>   OPT_BOOL('e', "edit", _interactive, N_("edit current diff and 
> apply")),
>   OPT__FORCE(_too, N_("allow adding otherwise ignored files")),
>   OPT_BOOL('u', "update", _worktree_changes, N_("update tracked 
> files")),
> + OPT_BOOL(0, "rehash", , N_("really update tracked files")),
>   OPT_BOOL('N', "intent-to-add", _to_add, N_("record only the fact 
> that the path will be added later")),
>   OPT_BOOL('A', "all", _explicit, N_("add changes from all 
> tracked and untracked files")),
>   { OPTION_CALLBACK, 0, "ignore-removal", _explicit,
> @@ -498,7 +535,10 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>  
>   plug_bulk_checkin();
>  
> - exit_status |= add_files_to_cache(prefix, , flags);
> + if (rehash)
> + exit_status |= rehash_tracked_files(prefix, , flags);
> + else
> + exit_status |= add_files_to_cache(prefix, , flags);
>  
>   if (add_new_files)
>   exit_status |= add_files(, flags);

That looks like a nice one.
Before we put this into stone:
Does it make sense to say "renormalize" instead of "rehash" ?
(That term does exist already for merge.
 And rehash is more a technical term,  rather then a user-point-of-view 
explanation)

Re: Line ending normalization doesn't work as expected

2017-10-04 Thread Torsten Bögershausen

On Wed, Oct 04, 2017 at 11:26:55AM -0500, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gits...@pobox.com> wrote:
> > Torsten Bögershausen <tbo...@web.de> writes:
> >
> >>> $ git rm -r --cached . && git add .
> >>
> >> (Both should work)
> >>
> >> To be honest, from the documentation, I can't figure out the difference 
> >> between
> >> $ git read-tree --empty
> >> and
> >> $ git rm -r --cached .
> >>
> >> Does anybody remember the discussion, why we ended up with read-tree ?
> >
> > We used to use neither, and considered it fine to "rm .git/index" if
> > you wanted to empty the on-disk index file in the old world.  In the
> > modern world, folks want you to avoid touching filesystem directly
> > and instead want you to use Git tools, and the above are two obvious
> > ways to do so.
> >
> > "git read-tree" (without any parameter, i.e. "read these 0 trees and
> > populate the index with it") and its modern and preferred synonym
> > "git read-tree --empty" (i.e. "I am giving 0 trees and I know the
> > sole effect of this command is to empty the index.") are more direct
> > ways to express "I want the index emptied" between the two.
> >
> > The other one, "git rm -r --cached .", in the end gives you the same
> > state because it tells Git to "iterate over all the entries in the
> > index, find the ones that match pathspec '.', and remove them from
> > the index.".  It is not wrong per-se, but conceptually it is a bit
> > roundabout way to say that "I want the index emptied", I would
> > think.
> >
> > I wouldn't be surprised if the "rm -r --cached ." were a lot slower,
> > due to the overhead of having to do the pathspec filtering that ends
> > up to be a no-op, but there shouldn't be a difference in the end
> > result.
> 
> You guys are obviously worlds ahead of me on the internals of things,
> but from my perspective I like to avoid the "plumbing" commands as
> much as I can. Even if I used them, if I have to tell the rest of my
> team "this is the way to do it", they're going to give me dirty looks
> if I ask them to run things like this that make no sense to them.
> That's the argument I have to deal with when it comes to Git's
> usability within the team I manage. So based on this, I'd favor the
> `git rm -r --cached` approach because this is the more common result
> you see in Google, and also makes a little more sense from a high
> level of thought perspective. However, this is just my personal
> opinion. `read-tree --empty` is far less self explanatory IMHO.
> 
> Also let's not forget the second part of the command chain that
> results in the different behavior. In one case, I use `git add` which
> results in proper line ending normalization. In the other case, I do
> `git reset --hard` which does *NOT* result in the line endings
> normalized (`git status` shows no results). In both cases, I'm still
> doing `git rm -r --cached`, so I am doubtful that is the root cause
> for the line ending normalization piece. I'm still trying to
> understand why both give different results (root cause) and also get
> an understanding of what the correct (modern) solution is for line
> ending normalization (not necessarily which is the right way to
> clear/delete the index, which is really AFAIK just a means to this end
> and an implementation detail of sorts for this specific task).

Hopefully I am able to give a useful answer.

"git reset --hard" works like a hammer
and may destroy work that has been done,
in our case the cleaning of the index,
which is needed for normalization since Git 2.10 (or so)

Back to the question:
One solution, which you can tell your team, is this one:
$ git rm -r --cached . && git add .

And as Junio pointed out, this may be slower than needed.
And we don't want "slow" solutions in the official documentation ;-)

Whatever you find on search engines may get stale after a while,
so that we appreciate direct questions here.

(And I will open an issue on Github the next days)

The background is that the CRLF handling in Git changed over the years,
and one effect is that "git reset" is not "allowed" any more.

For the interested reader:
https://github.com/git-for-windows/git/issues/954

Re: Line ending normalization doesn't work as expected

2017-10-03 Thread Torsten Bögershausen

On 2017-10-03 19:23, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 11:26 AM, Torsten Bögershausen <tbo...@web.de> wrote:
>> The short version is, that the instructions on Github are outdated.
>> This is the official procedure (since 2016, Git v2.12 or so)
>> But it should work even with older version of Git.
>>
>> $ echo "* text=auto" >.gitattributes
>> $ git read-tree --empty   # Clean index, force re-scan of working directory
>> $ git add .
>> $ git status# Show files that will be normalized
>> $ git commit -m "Introduce end-of-line normalization"
> 
> Is the way I did it that worked also a valid solution? Or did it only
> work accidentally? Again the command I ran that worked is:
> 
> $ git rm -r --cached . && git add .

(Both should work)

To be honest, from the documentation, I can't figure out the difference between
$ git read-tree --empty
and
$ git rm -r --cached .

Does anybody remember the discussion, why we ended up with read-tree ?

Re: Line ending normalization doesn't work as expected

2017-10-03 Thread Torsten Bögershausen

On 2017-10-03 17:00, Robert Dailey wrote:
> I'm on Windows using Git for Windows v2.13.1. Following github's
> recommended process for fixing line endings after adding a
> `.gitattributes` file[1], I run the following:
> 
> $ rm .git/index && git reset
> 
> Once I run `git status`, I see that no files have changed. Note that I
> know for a fact in my repository, files were committed using CRLF line
> endings (the files in question are C# code files, and no
> .gitattributes was present at the time).
> 
> I also tried this:
> 
> $ git rm -r --cached . && git reset --hard
> 
> However, again `git status` shows no working copy modifications. The
> one thing that *did* work (and I tried this on accident actually) is:
> 
> $ git rm -r --cached . && git add .
> 
> This properly showed all files in my index with line ending
> modifications (I ran `git diff --cached -R` to be sure; the output
> shows `^M` at the end of each line in the diff in this case). Note
> that my global git config has `core.autocrlf` set to `false`, but I
> also tried the top 2 commands above with it set to `true` but it made
> no difference.
> 
> So my question is: Why do the top 2 commands not work, but the third
> one does? To me this all feels like magic / nondeterministic, so I'm
> hoping someone here knows what is going on and can explain the logic
> of it. Also if this is a git config issue, I'm not sure what it could
> be. Note my `.gitattributes` just has this in it:

The short version is, that the instructions on Github are outdated.
This is the official procedure (since 2016, Git v2.12 or so)
But it should work even with older version of Git.

$ echo "* text=auto" >.gitattributes
$ git read-tree --empty   # Clean index, force re-scan of working directory
$ git add .
$ git status# Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"


Could you open an issue on Github ?
(Or can someone @github fix this ?)

> 
> * text=auto
> 
> Thanks in advance.
> 
> 
> [1]: https://help.github.com/articles/dealing-with-line-endings/
>

Re: is there a "symlink" option for cloning a repo in a separate filesystem?

2017-09-23 Thread Torsten Bögershausen

On 2017-09-23 10:22, Robert P. J. Day wrote:
> 
>   reading "man git-clone", and i understand the mechanics of the local
> protocol, so that if i run:
> 
>   $ git clone /path/to/repo
> 
> then "files under .git/objects/ directory are hardlinked to save space
> when possible."
> 
>   but if the repo is in a separate filesystem, or on an NFS mount,
> hardlinks clearly won't work, so what happens then? does it default
> all the way back to regular copies? is there no intermediate symlink
> feature that would still work? (i suspect i am far from the first
> person to wonder this.)
> 
> rday
> 

Isn't that what "--reference" is good for ?

https://git-scm.com/docs/git-clone

or

man git-clone

PS:
Robert P. J. Day 
Does not work from web.de

Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings

2017-09-21 Thread Torsten Bögershausen

On 2017-09-21 18:46, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
>  git-compat-util.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 9bc15b036..cedad4d58 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str)
>  
>  static inline size_t xsize_t(off_t len)
>  {
> - if (len > (size_t) len)
> + size_t size = (size_t) len;
> +
> + if (len != (off_t) size)
>   die("Cannot handle files this big");
> - return (size_t)len;
> + return size;

Hm, can someone help me out ?
Why is the cast not needed ?

>  }
>  
>  __attribute__((format (printf, 3, 4)))
>

Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable

2017-09-20 Thread Torsten Bögershausen

On Tue, Sep 19, 2017 at 01:37:14PM -0700, Jonathan Nieder wrote:
> Torsten Bögershausen <tbo...@web.de> wrote:
> 
> > Some implementations of `echo` support the '-e' option to enable
> > backslash interpretation of the following string.
> > As an addition, they support '-E' to turn it off.
> 
> nit: please wrap the commit message to a consistent line width.
> 
> > However, none of these are portable, POSIX doesn't even mention them,
> > and many implementations don't support them.
> >
> > A check for '-n' is already done in check-non-portable-shell.pl,
> > extend it to cover '-n', '-e' or '-E-'
> >
> > Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> > ---
> >  t/check-non-portable-shell.pl | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> An excellent change.  Thanks for noticing and fixing this.
> 
> Reviewed-by: Jonathan Nieder <jrnie...@gmail.com>

Thanks for the review.
Junio, if you wouldn't mind to squash that in, 
another fix is needed as well(trailing '-' after '-E') :

s/'-n', '-e' or '-E-'/'-n', '-e' or '-E'
   ^

Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-19 Thread Torsten Bögershausen

> 
> Should I just make the variable type itself uintmax_t and then just skip
> the cast altogether? I went with uint64_t because that is what
> getnanotime returned.
> 

That is a bit of taste question (or answer)

Typically you declare the variables in the type you need,
and this is uint64_t.

Let's step back a bit:
To print e.g a variable of type uint32_t, you use  PRIu32 in the format
string, like this:

fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),",

In theory (it is in the later specs, and it exists on many platforms),
there is a PRIu64 as well.

We don't seem to use it in Git, probably because uintmax_t is (more)
portable and understood by all platforms which support Git.
(And beside that, on most platforms uintmax_t is 64 bit).

So my suggestion would be to keep uint64_t and cast the variable into uintmax_t
whenever it is printed.

Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-18 Thread Torsten Bögershausen

On 2017-09-18 15:38, Ben Peart wrote:
> 
> 
> On 9/17/2017 4:02 AM, Junio C Hamano wrote:
>> Ben Peart  writes:
>>
>>> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
>>> new file mode 100644
>>> index 00..482d749bb9
>>> --- /dev/null
>>> +++ b/t/helper/test-dump-fsmonitor.c
>>> @@ -0,0 +1,21 @@
>>> +#include "cache.h"
>>> +
>>> +int cmd_main(int ac, const char **av)
>>> +{
>>> +    struct index_state *istate = _index;
>>> +    int i;
>>> +
>>> +    setup_git_directory();
>>> +    if (do_read_index(istate, get_index_file(), 0) < 0)
>>> +    die("unable to read index file");
>>> +    if (!istate->fsmonitor_last_update) {
>>> +    printf("no fsmonitor\n");
>>> +    return 0;
>>> +    }
>>> +    printf("fsmonitor last update %"PRIuMAX"\n",
>>> istate->fsmonitor_last_update);
>>
>> After pushing this out and had Travis complain, I queued a squash on
>> top of this to cast the argument to (uintmax_t), like you did in an
>> earlier step (I think it was [PATCH 04/12]).
>>
> 
> Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64
> use.  As far as I can tell they both map to the same thing (unsigned long 
> long)
> so there isn't functional difference.
(Just to double-check): This is the way to print "PRIuMAX" correctly
 (on all platforms):

printf("fsmonitor last update %"PRIuMAX"\n",
 (uintmax_t)istate->fsmonitor_last_update);

Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-16 Thread Torsten Bögershausen

On 2017-09-15 21:20, Ben Peart wrote:
> +if [ "$1" != 1 ]
> +then
> + echo -e "Unsupported core.fsmonitor hook version.\n" >&2
> + exit 1
> +fi

The echo -e not portable
(It was detected by a tighter version of the lint script,
 which I have here, but not yet send to the list :-(

This will do:
echo  "Unsupported core.fsmonitor hook version." >&2

Re: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.

2017-09-08 Thread Torsten Bögershausen

On Fri, Sep 08, 2017 at 12:00:50PM -0600, Kevin Willford wrote:
> When using the sparse checkout feature the git reset command will add
> entries to the index that will have the skip-worktree bit off but will
> leave the working directory empty.  File data is lost because the index
> version of the files have been changed but there is nothing that is in the
> working directory.  This will cause the next status call to show either
> deleted for files modified or deleting or nothing for files added.  The
> added files should be shown as untracked and modified files should be
> shown as modified.
> 
> To fix this when the reset is running if there is not a file in the working
> directory and if it will be missing with the new index entry or was not
> missing in the previous version, we create the previous index version of
> the file in the working directory so that status will report correctly
> and the files will be availble for the user to deal with.
> 
> Signed-off-by: Kevin Willford 
[]
> diff --git a/t/t7114-reset-sparse-checkout.sh 
> b/t/t7114-reset-sparse-checkout.sh
> new file mode 100755
> index 00..f2a5426847
> --- /dev/null
> +++ b/t/t7114-reset-sparse-checkout.sh
> @@ -0,0 +1,60 @@
> +#!/bin/sh
> +
> +test_description='reset when using a sparse-checkout'
> +
> +. ./test-lib.sh
> +
> +# reset using a sparse-checkout file
> +
> +test_expect_success 'setup' '
> + test_tick &&

Do we need a test_tick here ?

> + echo "checkout file" >c &&
> + echo "modify file" >m &&
> + echo "delete file" >d &&
> + git add . &&
> + git commit -m "initial commit" &&
> + echo "added file" >a &&
> + echo "modification of a file" >m &&
> + git rm d &&
> + git add . &&
> + git commit -m "second commit" &&
> + git checkout -b endCommit
> +'
> +
> +test_expect_success 'reset when there is a sparse-checkout' '
> + echo "/c" >.git/info/sparse-checkout &&
> + test_config core.sparsecheckout true &&
> + git checkout -b resetBranch &&
> + test_path_is_missing m &&
> + test_path_is_missing a &&
> + test_path_is_missing d &&
> + git reset HEAD~1 &&
> + test "checkout file" = "$(cat c)" &&
> + test "modification of a file" = "$(cat m)" &&
> + test "added file" = "$(cat a)" &&
> + test_path_is_missing d
> +'
> +

Re: Empty directories in Git: Current guidance for historical commits?

2017-09-05 Thread Torsten Bögershausen

On 2017-09-05 14:47, wafflec...@openmail.cc wrote:
> 
> Hello,
> 
> Per the FAQ it's clear that Git (by current design) does not track empty
> directories: 
> https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty_directories.3F
> 
> That's fine and I can fix that for my code going forward via the build system,
> as needed.
> 
> However I'm looking to import a fairly deep (10k+ commits) svn history in to 
> Git
> and I'm wondering what the current best practices are for dealing with empty
> folders?   Meaning to say, I'd like to be able to check out an older revision 
> of
> the code and have it build correctly using the older build system which 
> expects
> certain folders to exist.
> 
> There are a few different answers floating around the net, so I figured I'd
> confirm via the mailing list.
> 
> Is just dropping a ".gitignore" or "README" file in the empty directories 
> during
> conversion still the most reasonable approach? 

A .gitignore will, but may cost 0.001% CPU time when running "git status".
I have seen systems that create a file called ".empty" in every directory.

> If so, is there a way to do this
> automatically during the conversion using "git svn" or the like?

Not what I am aware of.
And even if, the .empty files need to be removed, once the directory is
removed.
And we don't have logic for that, as far as I know.

That would be by suggestion:
convert the repo "as is", with the tools we have.

If you need to continue to work on an old branch (or the master branch),
add the .empty file, where needed, and continue to work in Git, on
a branch as usual.

> 
> 
> Thank you
> David
>

Re: [PATCH v3] Documentation: mention that `eol` can change the dirty status of paths

2017-08-31 Thread Torsten Bögershausen

On 2017-08-31 15:19, Ben Boeckel wrote:

>  This attribute sets a specific line-ending style to be used in the
>  working directory.  It enables end-of-line conversion without any
> -content checks, effectively setting the `text` attribute.
> +content checks, effectively setting the `text` attribute.  Note that
> +setting this attribute on paths which are in the index with CRLF line
> +endings may make the paths to be considered dirty.  Adding the path to

OK - and this leads to another question:

Would it make sense if we allow files with "text eol=CRLF" to continue
to be stored with CRLF in the index?

Skip the normalization for existing files, since the user asked
for CRLF in the working tree (and the internal storage is less interesting)

Should I consider a patch ?

Re: [PATCH] Documentation: mention that `eol` can change the dirty status of paths

2017-08-23 Thread Torsten Bögershausen

On Wed, Aug 23, 2017 at 05:17:41PM -0400, Ben Boeckel wrote:
> When setting the `eol` attribute, paths can change their dirty status
> without any change in the working directory. This can cause confusion
> and should at least be mentioned with a remedy.
> 
> Signed-off-by: Ben Boeckel 
> ---
>  Documentation/gitattributes.txt | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index c4f2be2..3044b71 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -151,7 +151,11 @@ unspecified.
>  
>  This attribute sets a specific line-ending style to be used in the
>  working directory.  It enables end-of-line conversion without any
> -content checks, effectively setting the `text` attribute.
> +content checks, effectively setting the `text` attribute.  Note that
> +setting this attribute on paths which are in the index with different
> +line endings than the attribute indicates the paths to be considered
> +dirty.  Adding the path to the index again will normalize the line
> +endings in the index.
>  

There is one minor comment:
The problem is when files had been commited with CRLF (regardless what your
eol= attribute says)

How about something like this :

  This attribute sets a specific line-ending style to be used in the
  working directory.  It enables end-of-line conversion without any
 -content checks, effectively setting the `text` attribute.
 +content checks, effectively setting the `text` attribute.  Note that
 +setting this attribute on paths which are in the index with CRLF
 +line endings makes the paths to be considered dirty.
 +Adding the path to the index again will normalize the line
 +endings in the index.

Re: Cannot checkout after setting the eol attribute

2017-08-23 Thread Torsten Bögershausen

On Tue, Aug 22, 2017 at 03:44:41PM -0400, Ben Boeckel wrote:
> On Tue, Aug 22, 2017 at 21:13:18 +0200, Torsten Bögershausen wrote:
> > When you set the text attribute (in your case "eol=crlf" implies text)
> > then the file(s) -must- be nomalized and commited so that they have LF
> > in the repo (technically speaking the index)
> 
> This seems like a special case that Git could detect and message about
> somehow.
> 
> > This is what is written about the "eol=crlf" attribute:
> > This setting forces Git to normalize line endings for this
> > file on checkin and convert them to CRLF when the file is
> > checked out.
> > And this is what is implemented in Git.
> 
> Yeah, I read the docs, but the oddities of reset not doing its job
> wasn't clear from this sentence :) .

git reset does it's job - please see below.

The problem is that we need a "git commit" here.
After applying .gitattributes, it may be neccessary to "normalize" the
files. If there is something in the documentation, that can be
improved, please let us know.

> 
> > Long story short:
> > 
> > The following would solve your problem:
> >git init
> >echo $'dos\r' > dos
> >git add dos
> >git commit -m "dos newlines"
> >echo "dos -crlf" > .gitattributes
> >git add .gitattributes
> >git commit -m "add attributes"
> >echo "dos eol=crlf" > .gitattributes
> >git read-tree --empty   # Clean index, force re-scan of working directory
> 
> The fact that plumbing is necessary to dig yourself out of a hole of the
> `eol` attribute changes points to something needing to be changed, even
> if it's only documentation. Could Git detect this and message about it
> somehow when `git reset` cannot fix the working tree?

The thing is, that the working tree is "in a good state":
We want "dos" with CRLF, and that is what we have.
There is nothing that can be improved in the working tree.
What needs to be fixed, is the index. And that needs to be done with
"git add" "git commit."
As Junio pointed out, the read-tree is not ideal
(to fix a single file in a possible dirty working tree)

In your case it looks like this:

echo "dos eol=crlf" > .gitattributes
git add .gitattributes &&
git rm --cached dos && git add dos &&
git commit

> Or maybe it could at least exit with failure instead of success?

I don't know.
It -may- be possible to add a warning in "git reset".
I can have a look at that...

Re: Cannot checkout after setting the eol attribute

2017-08-22 Thread Torsten Bögershausen

On Tue, Aug 22, 2017 at 01:49:18PM -0400, Ben Boeckel wrote:
> Hi,
> 
> I specified the `eol` attribute on some files recently and the behavior
> of Git is very strange.
> 
> Here is the set of commands to set up the repository used for the
> discussion:
> 
> git init
> echo $'dos\r' > dos
> git add dos
> git commit -m "dos newlines"
> echo "dos -crlf" > .gitattributes
> git add .gitattributes
> git commit -m "add attributes"
> echo "dos eol=crlf" > .gitattributes
> git add .gitattributes
> git commit -m "set eol attribute instead"
> 
> The following behaviors are observed:
> 
>   - `git reset --hard` does not make the working directory clean; and
>   - `git rebase` gets *very* confused about the diffs in the working
> tree because `git stash` can't reset the working tree;
> 


> There are probably other oddities lingering about as well. If I commit
> what Git thinks is the difference, the diff (with invisibles made
> visible) is:
> 
> % git diff | cat -A
> diff --git a/dos b/dos$
> index fde2310..4723a1b 100644$
> --- a/dos$
> +++ b/dos$
> @@ -1 +1 @@$
> -dos^M$
> +dos$

Thanks for the goos example!
This is by design.

When you set the text attribute (in your case "eol=crlf" implies text)
then the file(s) -must- be nomalized and commited so that they have LF
in the repo (technically speaking the index)


This is what is written about the "eol=crlf" attribute:
This setting forces Git to normalize line endings for this
file on checkin and convert them to CRLF when the file is
checked out.
And this is what is implemented in Git.

Long story short:

The following would solve your problem:
   git init
   echo $'dos\r' > dos
   git add dos
   git commit -m "dos newlines"
   echo "dos -crlf" > .gitattributes
   git add .gitattributes
   git commit -m "add attributes"
   echo "dos eol=crlf" > .gitattributes
   git read-tree --empty   # Clean index, force re-scan of working directory
   git add dos
   git add .gitattributes
   git commit -m "set eol attribute instead"
   git ls-files --eol


> 
> Seen in 2.9.5 and 2.14.0.rc1.
> 
> --Ben

Re: t5551 hangs ?

2017-08-22 Thread Torsten Bögershausen

On Tue, Aug 22, 2017 at 01:26:25AM -0400, Santiago Torres wrote:
> Hello, Torsten. 
> 
> On Tue, Aug 22, 2017 at 07:10:59AM +0200, Torsten Bögershausen wrote:
> > Hej,
> > I found 2 threads about hanging t5551, one in 2016, and one question
> > from Junio somewhen in 2017.
> > I have it reproducable here:
> > - Debian 8, 64 bit
> > - SSD
> > - Half-modern processor ;-)
> 
> I don't think the SSD/regular disk have anything to do with it. 
> 
> - Are you running tests concurrently? (e.g., with -j x)
> - What happens if you run the test individually (i.e., cd t and
>   ./t5551...)

No concurrency.
That's what I do: ./t5551-http-fetch-smart.sh

> - You probably want to see the version of apache this is running/etc.

The one that comes with Debian -
I am not an expert here, what is it that is interesting ?

> - What happens if you kill the apache processes? 

I am left with these processes:
/home/blabla/pu/git -C too-many-refs fetch -q --tags
/home/blabla/pu/git-remote-http origin http://127.0.0.1:5551/smart/repo.git

> 
> I can't reproduce on my side, but let me see if I can dig a little into
> it.

Thanks, more tips or things I can do are welcome.
t5551 seems to be flaky - from time to time.
It seems that I have it reproducable unstable, so if someone has more
ideas, please.

> 
> Cheers!
> -Santiago.

t5551 hangs ?

2017-08-21 Thread Torsten Bögershausen

Hej,
I found 2 threads about hanging t5551, one in 2016, and one question
from Junio somewhen in 2017.
I have it reproducable here:
- Debian 8, 64 bit
- SSD
- Half-modern processor ;-)

The last thing I can see is:
ok 29 - test allowanysha1inwant with unreachable
---
A simplified ps -ef shows:
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/usr/sbin/apache2 -d /home/blalbla/t/trash 
directory.t5551-http-fetch-smart/httpd -f /home/blalbla/t/lib-httpd/apache.conf 
-c Listen 127.0.0.1:5551 -k start
/home/blalbla/git -C too-many-refs fetch -q --tags
/home/blalbla/git-remote-http origin http://127.0.0.1:5551/smart/repo.git
-
The tests passes on another box (real old laptop, spinning disk)

Anything that can be done to debug it ?

Re: What's cooking in git.git (Aug 2017, #04; Fri, 18)

2017-08-19 Thread Torsten Bögershausen


> * tb/apply-with-crlf (2017-08-17) 3 commits
>  - SQUASH???
>  - apply: file commited with CRLF should roundtrip diff and apply
>  - convert: add SAFE_CRLF_KEEP_CRLF
>  (this branch is tangled with jc/apply-with-crlf.)
> 
>  "git apply" that is used as a better "patch -p1" failed to apply a
>  taken from a file with CRLF line endings to a file with CRLF line
>  endings.  The root cause was because it misused convert_to_git()
>  that tried to do "safe-crlf" processing by looking at the index
>  entry at the same path, which is a nonsense---in that mode, "apply"
>  is not working on the data in (or derived from) the index at all.
>  This has been fixed.
> 
>  Will merge to 'next' after squashing the fix in.
> 
> 
> * rs/t1002-do-not-use-sum (2017-08-15) 1 commit
>  - t1002: stop using sum(1)
> 
>  Test simplification.
> 
>  Will merge to 'next'.
> 
> 
> * sb/sha1-file-cleanup (2017-08-15) 1 commit
>  - sha1_file: make read_info_alternates static
> 
>  Code clean-up.
> 
>  Will merge to 'next'.
> 
> 
> * as/grep-quiet-no-match-exit-code-fix (2017-08-17) 1 commit
>  - git-grep: correct exit code with --quiet and -L
> 
>  "git grep -L" and "git grep --quiet -L" reported different exit
>  codes; this has been corrected.
> 
>  Will merge to 'next'.
> 
> 
> * hv/t5526-andand-chain-fix (2017-08-17) 1 commit
>  - t5526: fix some broken && chains
> 
>  Test fix.
> 
>  Will merge to 'next'.
> 
> 
> * jc/diff-sane-truncate-no-more (2017-08-17) 1 commit
>  - diff: retire sane_truncate_fn
> 
>  Code clean-up.
> 
>  Will merge to 'next'.
> 
> 
> * ks/branch-set-upstream (2017-08-17) 3 commits
>  - branch: quote branch/ref names to improve readability
>  - builtin/branch: stop supporting the "--set-upstream" option
>  - t3200: cleanup cruft of a test
> 
>  "branch --set-upstream" that has been deprecated in Git 1.8 has
>  finally been retired.
> 
>  Will merge to 'next'.
> 
> 
> * mg/format-ref-doc-fix (2017-08-18) 2 commits
>  - Documentation/git-for-each-ref: clarify peeling of tags for --format
>  - Documentation: use proper wording for ref format strings
> 
>  Doc fix.
> 
>  Will merge to 'next'.
> 
> 
> * po/read-graft-line (2017-08-18) 4 commits
>  - commit: rewrite read_graft_line
>  - commit: allocate array using object_id size
>  - commit: replace the raw buffer with strbuf in read_graft_line
>  - sha1_file: fix definition of null_sha1
> 
>  Conversion from uchar[20] to struct object_id continues; this is to
>  ensure that we do not assume sizeof(struct object_id) is the same
>  as the length of SHA-1 hash (or length of longest hash we support).
> 
>  Will merge to 'next'.
> 
> 
> * sb/submodule-parallel-update (2017-08-17) 1 commit
>  - submodule.sh: remove unused variable
> 
>  Code clean-up.
> 
>  Will merge to 'next'.
> 
> 
> * jc/apply-with-crlf (2017-08-16) 6 commits
>  . apply: clarify read_old_data() is about no-index case
>  . apply: localize the CRLF business to read_old_data()
>  . apply: only pay attention to CRLF in the preimage
>  . apply: remove unused field apply_state.flags
>  . apply: file commited with CRLF should roundtrip diff and apply
>  - convert: add SAFE_CRLF_KEEP_CRLF
>  (this branch is tangled with tb/apply-with-crlf.)
> 
>  Will discard as it now is part of the tb/apply-with-crlf topic.
> 
> --
> [Stalled]
> 
> * mg/status-in-progress-info (2017-05-10) 2 commits
>  - status --short --inprogress: spell it as --in-progress
>  - status: show in-progress info for short status
> 
>  "git status" learns an option to report various operations
>  (e.g. "merging") that the user is in the middle of.
> 
>  cf. 
> 
> 
> * nd/worktree-move (2017-04-20) 6 commits
>  - worktree remove: new command
>  - worktree move: refuse to move worktrees with submodules
>  - worktree move: accept destination as directory
>  - worktree move: new command
>  - worktree.c: add update_worktree_location()
>  - worktree.c: add validate_worktree()
> 
>  "git worktree" learned move and remove subcommands.
> 
>  Expecting a reroll.
>  cf. <20170420101024.7593-1-pclo...@gmail.com>
>  cf. <20170421145916.mknekgqzhxffu...@sigill.intra.peff.net>
>  cf. 
> 
> 
> * sg/clone-refspec-from-command-line-config (2017-06-16) 2 commits
>  - Documentation/clone: document ignored configuration variables
>  - clone: respect additional configured fetch refspecs during initial fetch
>  (this branch is used by sg/remote-no-string-refspecs.)
> 
>  "git clone -c var=val" is a way to set configuration variables in
>  the resulting repository, but it is more useful to also make these
>  variables take effect while the initial clone is happening,
>  e.g. these configuration variables could be fetch refspecs.
> 
>  Waiting for a response.
>  cf. <20170617112228.vugswym4o4owf...@sigill.intra.peff.net>
>  cf. 
> 
> 
> * js/rebase-i-final (2017-07-27)

Re: [Patch size_t V3 00/19] use size_t

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:16:12PM +0200, Martin Koegler wrote:
> From: Martin Koegler 
> 
> This patchset is for next [24db08a6e8fed761d3bace7f2d5997806e20b9f7].

I applied it succesfully, and run the test suite on a 32 bit system.
The Laptop reported one brekage in t0040:
t0040-parse-options.sh   not ok 19 - OPT_MAGNITUDE() 3giga

Beside some t5561-http-backend.sh (which are most probably not caused
by this patch.

The raspi had 2 deadlocks, like "git push --signed dst noop ff +noff"
or
trash directory.t5541-http-push-smart/gpghome --sign -u commit...@example.com
Even this most probably not caused by the patch. (and the test is still running)

Well done, and on which platforms did you test ?

Re: [Patch size_t V3 01/19] delta: fix enconding size larger than an "uint" can hold

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:16:13PM +0200, Martin Koegler wrote:
> From: Martin Koegler 
> 
> The current delta code produces incorrect pack objects for files > 4GB.

This may need a little bit more info (E.g. it does not fix anything on
a 32 bit Linux)

How about this:
The current delta code produces incorrect pack objects for files > 4GB
when "unsigned long" has 32 bits and size_t 64 bits.

Re: [PATCH 1/9] Convert pack-objects to size_t

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:22:39PM +0200, Martin Koegler wrote:
> On Mon, Aug 14, 2017 at 10:08:05AM -0700, Junio C Hamano wrote:
> >It may help reducing the maintenance if we introduced obj_size_t
> >that is defined to be size_t for now, so that we can later swap
> >it to ofs_t or some larger type when we know we do need to
> >support objects whose size cannot be expressed in size_t, but I
> >do not offhand know what the pros-and-cons with such an approach
> >would look like.
> 
> Where should the use of obj_size_t end and the use of size_t start? 
> 
> We often determine a object size and then pass it to malloc. 
> We would start with a larger datatyp and then truncate for memory allocation, 
> which use size_t.

The truncation is done with xsize_t:
The "old" sha1_file.c has something like this:
idx_size = xsize_t(st.st_size);

I personally don't think that obj_size_t gives us so much.
There are "objects" which are "on disk", and they may have a length off_t,
And there are "objects" loaded into memory, and they have a length size_t.
And everybody can check that we use the right type.
Additionally I don't like it very much, when object size in memory is counted
in a 64 bit value (and this will happen if  obj_size_ = off_t == 64bit)

Anyhow, to answer your question:
All calles xmalloc() must be prepared like this:

p = xmalloc(xsize_t(size_of_object));

That should do the trick.

> 
> Regards,
> Martin

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Torsten Bögershausen

On Thu, Aug 17, 2017 at 12:12:36AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen <tbo...@web.de> writes:
> 
> > Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
> > like SAFE_CRLF_KEEP_CRLF does.
> 
> My preference is not to use NULL as any hint.  Instead, the "flag"
> parameter we already pass to convert_to_git(), just like the updated
> read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
> not disturb existing CRLF without looking at the istate, should be
> used to tell convert_to_git() to do the opposite, but do so without
> looking at the istate.
> 
> Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
> to invent another flag.  I dunno.

OK, message taken, in short:
I will come up with a new series in a couple of days - 
thanks for the input.

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 11:34:45AM -0700, Junio C Hamano wrote:
> With the previous fixes to CRLF handling in place, read_old_data()
> knows what it wants convert_to_git() to do with respect to CRLF.  In
> fact, this codepath is about applying a patch to a file in the
> filesystem, which may not exist in the index, or may exist but may
> not match what is recorded in the index, or in the extreme case, we
> may not even be in a Git repository.  If convert_to_git() peeked at
> the index while doing its work, it *would* be a bug.
> 
> Pass NULL instead of _index to the function to make sure we
> catch future bugs to clarify this.

Thanks for the work, and now our emails crossed.

Calling convert_to_git(NULL,...) makes Git crash today, see t4124, my
latest version, "LF in repo, CRLF in working tree)
Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
like SAFE_CRLF_KEEP_CRLF does.
The combination of convert_to_git(NULL,...,SAFE_CRLF_KEEP_CRLF) is OK,
but all others must supply an 

At a very first glance the end result may look like this:
- Take my changes in convert.[ch]
- Take your changes/commit in apply.c (except the NULL, see above)
- Take my changes in t4124.

I don't have time to look at this today or tomorrow,
please give a hint if you are working further.

Re: Bug with ignorecase on Git and Cygwin

2017-08-16 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 11:50:47AM +, CHEVALLIER Yves wrote:
> Hi, 
> 
> On Cygwin, the config value `ignorecase` is set to `true` during `git init` 
> and it is not possible to change the default value using templates. 
> 
> The issue was discovered while I was tracking a bunch of source files of a 
> SDK. To track the changes I simply rm all the working directory, unzip the 
> new SDK, then git add . and git commit -am "new sdk". In this process an 
> issue with an assembly file with preprocessing was incorrectly named .s 
> instead of .S. In the next version the correction was made, but Git was 
> unable to detect it. 
> 
> That said I've tried to manually change ignorecase from true to false and I 
> still have the issue. Git on Cygwin cannot detect files renamed with a case 
> change. 
> 
> Is it a bug?
> 

The thing is that the underlying file system (under cygwin, Windows in general, 
or MacOs)  treats a.s the same as a.S
So Git can not do better than the file system.
What you can do:

git mv a.S a.s
git commit

(This can be done by a script)

> # It works on Ubuntu
[]

Re: [PATCH 1/5] convert: initialize attr_action in convert_attrs

2017-08-15 Thread Torsten Bögershausen


> > diff --git a/convert.c b/convert.c
> > index 1012462e3..943d957b4 100644
> > --- a/convert.c
> > +++ b/convert.c
> > @@ -1040,7 +1040,6 @@ static void convert_attrs(struct conv_attrs *ca, 
> > const char *path)
> > ca->crlf_action = git_path_check_crlf(ccheck + 4);
> > if (ca->crlf_action == CRLF_UNDEFINED)
> > ca->crlf_action = git_path_check_crlf(ccheck + 0);
> > -   ca->attr_action = ca->crlf_action;
> 
> I don't think the removal of that line is correct.

Sorry, that went wrong, should have been:
 I think that the line can be removed here.

Re: [PATCH 1/5] convert: initialize attr_action in convert_attrs

2017-08-15 Thread Torsten Bögershausen

On Tue, Aug 15, 2017 at 02:53:01PM +0200, Martin Ågren wrote:
> convert_attrs populates a struct conv_attrs. The field attr_action is
> not set in all code paths, but still one caller unconditionally reads
> it. Since git_check_attr always returns the same value, we'll always end
> up in the same code path and there is no problem right now. But
> convert_attrs is obviously trying not to rely on such an
> implementation-detail of another component.
> 
> Initialize attr_action to CRLF_UNDEFINED in the dead code path.
> 
> Actually, in the code path that /is/ taken, the variable is assigned to
> twice and the first assignment has no effect. That's not wrong, but
> let's remove that first assignment while we're here.
> 
> Signed-off-by: Martin Ågren 
> ---
> I hit a warning about attr_action possibly being uninitialized when
> building with SANITIZE=thread. I guess it's some random interaction
> between code added by tsan, the optimizer (-O3) and the warning
> machinery. (This was with gcc 5.4.0.)
> 
>  convert.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/convert.c b/convert.c
> index 1012462e3..943d957b4 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -1040,7 +1040,6 @@ static void convert_attrs(struct conv_attrs *ca, const 
> char *path)
>   ca->crlf_action = git_path_check_crlf(ccheck + 4);
>   if (ca->crlf_action == CRLF_UNDEFINED)
>   ca->crlf_action = git_path_check_crlf(ccheck + 0);
> - ca->attr_action = ca->crlf_action;

I don't think the removal of that line is correct.

>   ca->ident = git_path_check_ident(ccheck + 1);
>   ca->drv = git_path_check_convert(ccheck + 2);
>   if (ca->crlf_action != CRLF_BINARY) {
> @@ -1058,6 +1057,7 @@ static void convert_attrs(struct conv_attrs *ca, const 
> char *path)
>   } else {
>   ca->drv = NULL;
>   ca->crlf_action = CRLF_UNDEFINED;
> + ca->attr_action = CRLF_UNDEFINED;

But this one can be avoided, when the line
ca->attr_action = ca->crlf_action;
would move completely out of the "if/else" block.

>   ca->ident = 0;
>   }
>   if (ca->crlf_action == CRLF_TEXT)
> -- 
> 2.14.1.151.gdfeca7a7e
> 

Thanks for spotting my mess.
What do you think about the following:


diff --git a/convert.c b/convert.c
index 1012462e3c..fd91b91ada 100644
--- a/convert.c
+++ b/convert.c
@@ -1040,7 +1040,6 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
ca->crlf_action = git_path_check_crlf(ccheck + 4);
if (ca->crlf_action == CRLF_UNDEFINED)
ca->crlf_action = git_path_check_crlf(ccheck + 0);
-   ca->attr_action = ca->crlf_action;
ca->ident = git_path_check_ident(ccheck + 1);
ca->drv = git_path_check_convert(ccheck + 2);
if (ca->crlf_action != CRLF_BINARY) {
@@ -1060,6 +1059,8 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
ca->crlf_action = CRLF_UNDEFINED;
ca->ident = 0;
}
+   /* Save attr and make a decision for action */
+   ca->attr_action = ca->crlf_action;
if (ca->crlf_action == CRLF_TEXT)
ca->crlf_action = text_eol_is_crlf() ? CRLF_TEXT_CRLF : 
CRLF_TEXT_INPUT;
if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_FALSE)

Re: [PATCH 1/9] Convert pack-objects to size_t

2017-08-14 Thread Torsten Bögershausen

On Mon, Aug 14, 2017 at 08:31:50PM +0100, Ramsay Jones wrote:
> 
> 
> On 14/08/17 18:08, Junio C Hamano wrote:
> > Junio C Hamano  writes:
> > 
> >> One interesting question is which of these two types we should use
> >> for the size of objects Git uses.  
> >>
> >> Most of the "interesting" operations done by Git require that the
> >> thing is in core as a whole before we can do anything (e.g. compare
> >> two such things to produce delta, have one in core and apply patch),
> >> so it is tempting that we deal with size_t, but at the lowest level
> >> to serve as a SCM, i.e. recording the state of a file at each
> >> version, we actually should be able to exceed the in-core
> >> limit---both "git add" of a huge file whose contents would not fit
> >> in-core and "git checkout" of a huge blob whose inflated contents
> >> would not fit in-core should (in theory, modulo bugs) be able to
> >> exercise the streaming interface to handle such case without holding
> >> everything in-core at once.  So from that point of view, even size_t
> >> may not be the "correct" type to use.
> > 
> > A few additions to the above observations.
> > 
> >  - We have varint that encodes how far the location from a delta
> >representation of an object to its base object in the packfile.
> >Both encoding and decoding sides in the current code use off_t to
> >represent this offset, so we can already reference an object that
> >is far in the same packfile as a base.
> > 
> >  - I think it is OK in practice to limit the size of individual
> >objects to size_t (i.e. on 32-bit arch, you cannot interact with
> >a repository with an object whose size exceeds 4GB).  Using off_t
> >would allow occasional ultra-huge objects that can only be added
> >and checked in via the streaming API on such a platform, but I
> >suspect that it may become too much of a hassle to maintain.
> 
> In a previous comment, I said that (on 32-bit Linux) it was likely
> that an object of > 4GB could not be handled correctly anyway. (more
> likely > 2GB). This was based on the code from (quite some) years ago.
> In particular, before you added the "streaming API". So, maybe a 32-bit
> arch _should_ be able to handle objects as large as the LFS API allows.
> (Ignoring, for the moment, that I think anybody who puts files of that
> size into an SCM probably gets what they deserve. :-P ).

In general, yes.
I had a patch that allowed a 32 bit linux to commit a file >4GB.
There is however a restriction: The file must not yet be known to the
index. Otherwise the "diff" machinery kicks in, and tries to
compare the "old" and the "new" versions, which means that -both-
must fit into "memory" at the same time. Memory means here the available
address space rather then real memory.
So there may be room for improvements for 32 bit systems, but that is another
story, which can be developped independent of the ulong->size_t conversion.

> 
> The two patches I commented on, however, changed the type of some
> variables from off_t to size_t. In general, the patches did not
> seem to make anything worse, but these type changes could potentially
> do harm. Hence my comment. (I still haven't tried the patches on my
> 32-bit Linux system. I only boot it up about once a week, and I would
> rather wait until the patches are in the 'pu' branch before testing).
> 
> ATB,
> Ramsay Jones
> 

And here (in agreement with the answer from Junio):
We should not change any off_t into size_t, since that means a possible
downgrade.
Whenever an off_t is converted into size_t, a function in git-compat-util.h is
used:

static inline size_t xsize_t(off_t len)
{
if (len > (size_t) len)
die("Cannot handle files this big");
return (size_t)len;
}

Having written this, it may be a good idea to define a similar function
xulong() which is used when we do calls into zlib.

Thanks to Martin for working on this.
If there is a branch on a public repo, I can e.g. run the test suite on
different systems.

Re: [BUG] git am sometimes unable to apply git format-patch output file

2017-08-12 Thread Torsten Bögershausen

On Sat, Aug 12, 2017 at 06:20:23PM +0200, Torsten Bögershausen wrote:
> On Sat, Aug 12, 2017 at 07:02:59PM +0300, Soul Trace wrote:
> > Hello.
> > 
> > Using git i have found that git am command may sometimes fail to apply patch
> > file which was created by the git am command.
> > 
> > 
> > Steps to reproduce:
> 
> Excellent, thanks so much for the detailed bug report.
> This kind of information is really appreciated.
> 
> Why did I say excellent ?
> Because I am working on a patch, which -should- fix exactly this kind of 
> issues.
> I send out a patch earlier this day, but it doesn't fix your issue, even if 
> it should.
> I hope to have a fix soonish.

I need to correct mysef, a litte bit, this doesn't seem to be a bug, but a 
feature.
There are 2 things to be noticed:
- The xml file has been commited with CRLF
- git am strips the CR (because they -may- have been added by a mail program)

There are 2 different solutions:
a) Use git am --keep-cr
b) "Normalize" the xml file(s), and commit them to have LF in the repo,
they can still have CRLF in the work tree, if needed.
This is done by
echo "*.xml text" >>.gitattributes
touch *.xml
git add *.xml .gitattributes
git commit -m "Normalize xml files"


If you really need the xml files with CRLF, use this line instead:
echo "*.xml eol=CRLF" >>.gitattributes

HTH
/Torsten

Re: [BUG] git am sometimes unable to apply git format-patch output file

2017-08-12 Thread Torsten Bögershausen

On Sat, Aug 12, 2017 at 07:02:59PM +0300, Soul Trace wrote:
> Hello.
> 
> Using git i have found that git am command may sometimes fail to apply patch
> file which was created by the git am command.
> 
> 
> Steps to reproduce:

Excellent, thanks so much for the detailed bug report.
This kind of information is really appreciated.

Why did I say excellent ?
Because I am working on a patch, which -should- fix exactly this kind of issues.
I send out a patch earlier this day, but it doesn't fix your issue, even if it 
should.
I hope to have a fix soonish.

Re: [PATCH 1/9] Convert pack-objects to size_t

2017-08-12 Thread Torsten Bögershausen

Thanks for working on this - unfortunatly the patch does not apply on
git.git/master.

Which baseline did you use ?

Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`

2017-08-11 Thread Torsten Bögershausen


>I left it unsaid by mistake, but I think the right thing to use as
>the "previous" to take hint from in the context of "git apply" is
>what is in the working tree, i.e. the result of applying patch in
>step (4) to create a file F in the sample scenario.

>While applying patch in step (5), convert_to_git() should "imagine"
>adding the file F currently in the working tree (i.e. the result of
>step (4)) to the index---if the resulting object in the index would
>have CR, then the safe CRLF logic should refrain from doing CRLF->LF
>conversion.  And it should do so without actually adding neither the
>preimage or the postimage to the index, of course.

(Sorry for the test mail)

What I wanted to say is that this long explanation convinced me to write
a patch and send it out the next days,

>When we are doing "git apply --index", then we _require_ that the
>indexed contents and what is in the working tree matches before
>applying the patch, so it is perfectly fine to let convert_to_git()
>to look at the current index---that is the "previous" one we want to
>take hint from while using the safe CRLF logic.

Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`

2017-08-11 Thread Torsten Bögershausen

Test from mutt

Re: [PATCH v1 1/1] correct apply for files commited with CRLF

2017-08-04 Thread Torsten Bögershausen




On 08/02/2017 11:13 PM, Junio C Hamano wrote:

tbo...@web.de writes:


From: Torsten Bögershausen <tbo...@web.de>

git apply does not find the source lines when files have CRLF in the index
and core.autocrlf is true:
These files should not get the CRLF converted to LF. Because cmd_apply()
does not load the index, this does not work, CRLF are converted into LF
and apply fails.

Fix this in the spirit of commit a08feb8ef0b6,
"correct blame for files commited with CRLF" by loading the index.

As an optimization, skip read_cache() when no conversion is specified
for this path.

What happens when the input to apply is a patch to create a new file
and the patch uses CRLF line endings?  In such a case, shouldn't
core.autocrlf=true cause the patch to be converted to LF and then
succeed in applying?  An extra read_cache() would not help as there
must be no existing index entry for the path in such a case.

If you did "rm .git/index" after you did the "git checkout -- ." in
your test patch, "git apply" ought to succeed, as it is working as
a substitute for "patch -p1" and should not be consulting the index
at all.

I cannot shake this feeling that it is convert_to_git() that needs
to be fixed so that it does not need to refer to the current
contents in the index.  Why does it even need to look at the index?
If it is for the "safe CRLF" thing (which I would think is misguided),
shouldn't it be checking the original file in the working tree, not
the index?  After all, that is what we are patching, not the contents
stored in the index.

Thanks for the review.
My understanding is that there are different things possible with `git 
apply`:

working on files in the index ("cached") files and "worktree" files.

If we work on files in the index, the line endings must match for
the patch to apply, the patch/apply machinery is not forgiving
mismatching line endings. At least not by default.
There is one exception: Use "blank-at-eol" to declare the CR of
the CRLF as a whitespace, and then tell git apply to ignore white space:
`git apply --ignore-whitespace`
My feeling is that this is not self-explaining, but this is a different 
story.


Back to the fix, the read_old_data() from below works on the working tree,
yes, but after convert_to_git().
And that is why we need the index, to fix this very case.

appy.c:
static int load_patch_target(struct apply_state *state,
 struct strbuf *buf,
 const struct cache_entry *ce,
 struct stat *st,
 const char *name,
 unsigned expected_mode)
{
if (state->cached || state->check_index) {
if (read_file_or_gitlink(ce, buf))
return error(_("failed to read %s"), name);
} else if (name) {
if (S_ISGITLINK(expected_mode)) {
if (ce)
return read_file_or_gitlink(ce, buf);
else
return SUBMODULE_PATCH_WITHOUT_INDEX;
} else if (has_symlink_leading_path(name, strlen(name))) {
return error(_("reading from '%s' beyond a symbolic link"), 
name);

} else {
if (read_old_data(st, name, buf))
return error(_("failed to read %s"), name);
}
}
return 0;
}

Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`

2017-08-02 Thread Torsten Bögershausen




On 08/01/2017 10:58 PM, Anthony Sottile wrote:

Here's where I'm hitting the problem described:
https://github.com/pre-commit/pre-commit/issues/570

Note that `git -c core.autocrlf=false` apply patch fixes this
situation, but breaks others.


[]
I wasn't thinking of that - and thanks for the detailed report.
I seems as there are 3 things to be done:
- Make a workaround in your scripts/tools. It seems as if that is 
already done.

- Fix Git.
  My very first investigation shows that a patch like this could fix 
the problem:


diff --git a/apply.c b/apply.c
index f2d599141d..66b8387360 100644
--- a/apply.c
+++ b/apply.c
@@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const 
char *path, struct strbuf *buf)

case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != 
st->st_size)

return error(_("unable to open or read %s"), path);
+   if (would_convert_to_git(_index, path))
+   read_cache();
convert_to_git(_index, path, buf->buf, buf->len, 
buf, 0);

return 0;
default:

  I will probably do some more investigations if this is the core 
problem, so it will take some days or weeks.


- Convince people to normalize their repos and convert all CRLF in the 
repo into LF.

   This may take even longer.

Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`

2017-08-01 Thread Torsten Bögershausen




On 08/01/2017 08:24 PM, Anthony Sottile wrote:

Here's my minimal reproduction -- it's slightly far-fetched in that it
involves*committing crlf*  and
then using `autocrlf=true` (commit lf, check out crlf).

```
#!/bin/bash
set -ex

rm -rf foo
git init foo
cd foo

# Commit crlf into repository
git config --local core.autocrlf false
python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
git add foo
git commit -m "Initial commit with crlf"

# Change whitespace mode to autocrlf, "commit lf, checkout crlf"
git config --local core.autocrlf true
python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'

# Generate a patch, check it out, restore it
git diff --ignore-submodules --binary --no-color --no-ext-diff > patch
python3 -c 'print(open("patch", "rb").read())'
git checkout -- .
# I expect this to succeed, it fails
git apply patch
```

And here's the output:

```
+ rm -rf foo
+ git init foo
Initialized empty Git repository in/tmp/foo/.git/
+ cd foo
+ git config --local core.autocrlf false
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
+ git add foo
+ git commit -m 'Initial commit with crlf'
[master (root-commit) 02d3246] Initial commit with crlf
  1 file changed, 2 insertions(+)
  create mode 100644 foo
+ git config --local core.autocrlf true
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
+ git diff --ignore-submodules --binary --no-color --no-ext-diff
+ python3 -c 'print(open("patch", "rb").read())'
b'diff --git a/foo b/foo\nindex bd956ea..fbf7936 100644\n---
a/foo\n+++ b/foo\n@@ -1,2 +1,5 @@\n 1\r\n 2\r\n+\r\n+\r\n+\r\n'
+ git checkout -- .
+ git apply patch
patch:8: trailing whitespace.

patch:9: trailing whitespace.

patch:10: trailing whitespace.

error: patch failed: foo:1
error: foo: patch does not apply
```

I also tried with `git apply --ignore-whitespace`, but this causes the
line endings of the existing contents to be changed to*lf*  (there may
be two bugs here?)

Thanks,

Anthony



I can reproduce you test case here.

The line

git apply patch

would succeed, if you temporally (for the runtime of the apply command) set
core.autocrlf to false:

git -c core.autocrlf=false apply patch

So this seems to be a bug (in a corner case ?):

Typically repos which had been commited with CRLF should be normalized,
which means that the CRLF in the repo are replaced by LF.
So you test script is a corner case, for which Git has not been designed,
It seems as if "git apply" gets things wrong here.
Especially, as the '\r' is not a whitespace as a white space. but part
of the line ending.
So in my understanding the "--ignore-whitespace" option shouldn't affect
the line endings at all.

Fixes are possible, does anyone have a clue, why the '\r' is handled
like this in apply ?

And out of interest: is this a real life problem ?

Re: Git on macOS shows committed files as untracked

2017-07-14 Thread Torsten Bögershausen



(Please no top-posting)
On 14/07/17 11:45, Elliot Chandler wrote:
For what it's worth, the file looks normal in Gentoo GNU/Linux (name appears 
"ḋἲ╓εﮯ⑏○╓Ӳ" and it seems to work like any other directory).



Thanks for testing -
Normal and Normal ;-)
For me the 6th code point does look strange
The "box" with 04142F:
Code point 4142F is unassigned and is outside any currently defined block range.

So this is not valid Unicode, so we have a problem here under MacOS -
not much Git can do about it.



On Fri, Jul 14, 2017 at 4:41 AM, Torsten Bögershausen <tbo...@web.de 
<mailto:tbo...@web.de>> wrote:




On 14/07/17 06:49, Lutz Roeder wrote:

Using precomposeunicode still reproduces the issue:

Repro steps:

1. Download
https://www.dropbox.com/s/0q5pbpqpckwzj7b/gitstatusrepro.zip?dl=0
<https://www.dropbox.com/s/0q5pbpqpckwzj7b/gitstatusrepro.zip?dl=0>
2. unzip gitstatusrepro.zip && cd gitstatusrepro
3. git reset --hard
4. git -c core.precomposeunicode=true status

On branch master
Untracked files:
(use "git add ..." to include in what will be committed)


A (short) investigation shows that this seems to be invalid unicode,
at least from a MacOSX point of view ?

Unzipping your repo shows this:
  git status -u
   deleted:

"\341\270\213\341\274\262\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\323\262/test.txt"


===
If I run this:
  xx=$(printf

"d\314\207\316\271\314\223\314\200\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\320\243\314\213/")

echo   $xx | iconv -f UTF-8-MAC -t UTF-16 | xxd

iconv: (stdin):1:5: cannot convert
000: feff 1e0b 1f32 2553 03b5 fbaf.2%S
===
So I don't know if we can do something in Git to improve your repo.
How did  you end up with such a directory name ?
And would it be possible to rename it ?

Re: Git on macOS shows committed files as untracked

2017-07-14 Thread Torsten Bögershausen




On 14/07/17 06:49, Lutz Roeder wrote:

Using precomposeunicode still reproduces the issue:

Repro steps:

1. Download https://www.dropbox.com/s/0q5pbpqpckwzj7b/gitstatusrepro.zip?dl=0
2. unzip gitstatusrepro.zip && cd gitstatusrepro
3. git reset --hard
4. git -c core.precomposeunicode=true status

On branch master
Untracked files:
   (use "git add ..." to include in what will be committed)


A (short) investigation shows that this seems to be invalid unicode,
at least from a MacOSX point of view ?

Unzipping your repo shows this:
 git status -u
  deleted: 
"\341\270\213\341\274\262\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\323\262/test.txt" 



===
If I run this:
 xx=$(printf 
"d\314\207\316\271\314\223\314\200\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\320\243\314\213/")


echo   $xx | iconv -f UTF-8-MAC -t UTF-16 | xxd

iconv: (stdin):1:5: cannot convert
000: feff 1e0b 1f32 2553 03b5 fbaf.2%S
===
So I don't know if we can do something in Git to improve your repo.
How did  you end up with such a directory name ?
And would it be possible to rename it ?

Re: Git on macOS shows committed files as untracked

2017-07-13 Thread Torsten Bögershausen

On 13/07/17 01:15, Jeff King wrote:

On Wed, Jul 12, 2017 at 06:21:28PM -0400, roeder@mailnull.com wrote:

In Git on macOS (git version 2.13.2 | brew install git) the status
command will show folders as untracked even though they are committed
and checked out from the repository. Does not reproduce on Windows and
Ubuntu.
[...]

"d\314\207\316\271\314\223\314\200\342\225\223\316\265\357\256\257\360\222\221\217\342\227\213\342\225\223\320\243\314\213/"

Probably the issue has to do with Unicode normalization, and you have
files in your repository that can't be represented on your filesystem.
For example, the first two code-points above are "d" followed by U+0307,
"COMBINING DOT ABOVE". That pair can also be represented as U+1E0B,
"LATIN SMALL LETTER D WITH DOT ABOVE".

I don't recall which form HFS+ normalizes to, but basically what happens
is that Git opens the file with some name, and the filesystem quietly
rewrites that under the hood to a different, normalized name. Then when
Git walks the directory later to ask which files are present, it sees
this other filename that it has no clue about.

Generally the solution is to commit the normalized name. There's some
logic inside Git to "precompose" names to the right normalization, but I
think that only affects new files you add. Existing committed files with
the wrong normalization run into this issue.

-Peff

Thanks for the fast analyzes -
in short:
what does
git -c core.precomposeunicode=true status
say ?

The easiest thing may be to set
git config --global core.precomposeunicode true

Re: Weirdness with git change detection

2017-07-11 Thread Torsten Bögershausen




On 11/07/17 09:34, Jeff King wrote:

On Tue, Jul 11, 2017 at 12:31:25AM -0700, Peter Eckersley wrote:


I did try to test that hypothesis by editing the filter to be a no-op,
but it's possible I go that wrong. My apologies for bugging the list!


Actually I like this kind of feedback, to see how people are using Git,
including the problems they have.



No problem. I actually think it would be interesting if Git could
somehow detect and warn about this situation. But the obvious way to do
that would be to re-run the clean filter directly after checkout. And
doing that all the time is expensive.


Would it be possible to limit the round-trip-check to "git reset --hard" ?
If yes, possibly many users are willing to pay the price, if Git tells
the "your filters don't round-trip". (Including CRLF conversions)



Perhaps some kind of "lint" program would be interesting to warn of
possible misconfigurations. Of course people would have to run it for it
to be useful. :)


What do you have in mind here ?
Don't we need to run some content through the filter(s)?

Re: Weirdness with git change detection

2017-07-10 Thread Torsten Bögershausen




On 11/07/17 01:45, Peter Eckersley wrote:

I have a local git repo that's in some weird state where changes
appear to be detected by "git diff" and prevent operations like "git
checkout" from switching branches, but those changes are not removed
by a "git reset --hard" or "git stash".

Here's an example of the behaviour, with "git reset --hard" failing to
clear a diff in the index:

https://paste.debian.net/975811/

Happy to collect additional debugging information if it's useful.


If possible, we need to clone the repo and debug ourselfs - in other
words, the problem must be reproducible for others.

It the repo public ?
Which OS, Git version are you using ?

Re: Help needed for solving a few issues with building git

2017-07-03 Thread Torsten Bögershausen

On 2017-07-03 15:18, Kaartic Sivaraam wrote:
> Hello all,
> 
> Building without localization support
> 
> I tried to build git from source without localization support by adding
> the following line to the Makefile,
> 
> NO_GETTEXT=1
> 

May be

make NO_GETTEXT=yes

Re: What's cooking in git.git (Jun 2017, #09; Fri, 30)

2017-07-01 Thread Torsten Bögershausen




On 01/07/17 09:39, Ævar Arnfjörð Bjarmason wrote:


On Fri, Jun 30 2017, Junio C. Hamano jotted:


* xz/send-email-batch-size (2017-05-23) 1 commit
  - send-email: --batch-size to work around some SMTP server limit

  "git send-email" learned to overcome some SMTP server limitation
  that does not allow many pieces of e-mails to be sent over a single
  session.

  Waiting for a response.
  cf. 
  cf. 

  """I thought your wish (which I found reasonable) was to record
  whatever information that would help us in the future in the log
  message?  I was waiting for that to happen."""


I think it's fine in lieu of xiaoqiang zhao not being responsive to just
merge this as-is. The info that can help us in the future is in the ML
archive, which should be good enough.


* ab/strbuf-addftime-tzname-boolify (2017-06-24) 3 commits
  - REWORD ONLY SQUASH
  - strbuf: change an always NULL/"" strbuf_addftime() param to bool
  - strbuf.h comment: discuss strbuf_addftime() arguments in order

  strbuf_addftime() is further getting tweaked.

  Waiting for a response.
  cf. 


Meant to get to this after the last "What's Cooking", will submit
another version soon.


* ab/wildmatch (2017-06-23) 1 commit
  - wildmatch: remove unused wildopts parameter

  Prepare the wildmatch API for future enhancements to allow a
  pattern that is repeatedly matched against many strings to be
  precompiled.


[...]


* ex/deprecate-empty-pathspec-as-match-all (2017-06-23) 2 commits
   (merged to 'next' on 2017-06-26 at d026281517)
  + pathspec: die on empty strings as pathspec
  + t0027: do not use an empty string as a pathspec element

  The final step to make an empty string as a pathspec element
  illegal.  We started this by first deprecating and warning a
  pathspec that has such an element in 2.11 (Nov 2016).

  Hopefully we can merge this down to the 'master' by the end of the
  year?  A deprecation warning period that is about 1 year does not
  sound too bad.

  Will cook in 'next'.


I have a longer patch series involving the "wildmatch: remove unused
wildopts parameter" patch (although not conflicting with it) which
assumes:

  1. We'll merge down my "wildmatch: remove unused wildopts parameter" to
 master (doesn't conflict with #3).

  2. This ex/deprecate-empty-pathspec-as-match-all series will get in

  3. My series, which among other things cleans up the wildmatch tests a
 lot (and adds that new wildmatch pre-compile API that was ejected)
 could be reviewed & merged down.

The reason I'm waiting on #3 is because one of the things I'm doing is
improving the wildmatch tests to not only test via calling raw
wildmatch(), but also roundtripping via git-ls-files, and this will
conflict in a very minor way with #2 (the test will need to be changed
from "this warns + works differently" -> "this dies + doesn't work").

But if #2 is something that's going to cook until the end of the year
I'm going to submit this sooner, and then we can just handle the minor
conflict. Is cooking it for that long really the plan?

Also, here's a minor RFC, as part of that series I need to create files
/ directories for each of the tests now in the wildmatch tests, this
involves creating files like "?a?b", "a[]b", "$" etc. I.e. this won't
work on all platforms.

So my WIP is, and I'd like feedback on the viability of the general approach:

 create_test_file() {
file=$1

# `touch .` will succeed but obviously not do what we intend
# here.
test "$file" = "." && return 1
# We cannot create a file with an empty filename.
test "$file" = "" && return 1
# The tests that are testing that e.g. foo//bar is matched by
# foo/*/bar can't be tested on filesystems since there's no
# way we're getting a double slash.
echo "$file" | grep -F '//' && return 1

dirs=$(echo "$file" | sed -r 's!/[^/]+$!!')


sed -r is a GNU extension, isn't it ?
http://pubs.opengroup.org/onlinepubs/7908799/xcu/sed.html

This may work:
sed -e 's!/[^/][^/]*$!!')


(The rest looks good - at first glance)

Re: [PATCH] status: suppress additional warning output in plumbing modes

2017-07-01 Thread Torsten Bögershausen




>On 30/06/17 18:28, Stefan Beller wrote:

The patch makes a lot of sense - thanks for the fast reply.
A question: does the header correspond to the patch ?

< [PATCH] status: suppress additional warning output in plumbing modes
> [PATCH] status: suppress CRLF warnings in porcelain modes

(And may be the comment in the code:)

< / * suppress all additional output in porcelain mode */
> / * suppress CRLF conversion warnings in porcelain mode */


When status is called with '--porcelain' (as implied by '-z'), we promise
to output only messages as described in the man page.

Suppress CRLF warnings.

Signed-off-by: Stefan Beller 
---

Maybe something like this?

  builtin/commit.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/builtin/commit.c b/builtin/commit.c
index 00a01f07c3..3705d5ec6f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1126,6 +1126,11 @@ static void finalize_deferred_config(struct wt_status *s)
die(_("--long and -z are incompatible"));
}
  
+	/* suppress all additional output in porcelain mode */

+   if (status_format == STATUS_FORMAT_PORCELAIN ||
+   status_format == STATUS_FORMAT_PORCELAIN_V2)
+   safe_crlf = SAFE_CRLF_FALSE;
+
if (use_deferred_config && status_format == STATUS_FORMAT_UNSPECIFIED)
status_format = status_deferred_config.status_format;
if (status_format == STATUS_FORMAT_UNSPECIFIED)

Re: Bug with automated processing of git status results

2017-06-30 Thread Torsten Bögershausen




On 30/06/17 11:09, Matthieu Moy wrote:

Сергей Шестаков  writes:


I understand that we can turn off core.safecrlf, but it's
inconvinient.


Note that you can do that without actually changing the config file:

   git -c core.safecrlf=false status ...


Beside that, I would recommend to set up a .gitattributes file:

$ echo "*.xml text eol=lf" >>.gitattributes
$ git add .gitattributes
$ git commit -m "xml files are text with LF line endings"

Re: EOL LF Windows auto.crlf issue

2017-06-24 Thread Torsten Bögershausen




On 24/06/17 18:56, Filip Kucharczyk wrote:

I'm on Windows 10.
auto.crlf in .gitconfig is set to
[core]
autocrlf = true
I've got a git (git version 2.13.1.windows.2) repo.
A linux guy emails me a text with with line endings LF.
I paste this file into my repo.
Now every time I introduce changes to this file and stage it, git tell me:
"warning: LF will be replaced by CRLF in a.txt.


This conversion will happen if you do
rm a.txt
git checkout a.txt

After these 2 steps, the file willhave CRLF in the working tree,
see below.
But if you don't, the file stays as it is in the working tree.


The file will have its original line endings in your working directory."



But when I commit the file git does not replace anyline endings - they
stay LF in the commited file and in my working direcotry.


Yes, core.autocrlf=true tells Git that they should have CRLF in the working 
tree, at least after a clean checkout.

I'm sort of misleaded by this message.

Suggestions how to improve things are of course welcome -
for your case it may help to set
git config core.autocrlf input
for this very repo.



Filip

Re: [PATCH] pathspec: die on empty strings as pathspec

2017-06-23 Thread Torsten Bögershausen



On 23/06/17 20:04, Junio C Hamano wrote:

Junio C Hamano  writes:


Sorry for my mess, see below




I am not sure if we even want the dot there, but at least that is
what the original author of the test intended to do when s/he
decided to pass an empty string as the pathspec.

  t/t0027-auto-crlf.sh | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


I'll queue the following _before_ your "final step", so that the
whole thing can be advanced to 'next' and hopefully to 'master' in
some future releases.

Given that we ourselves did not even notice until now that one of
our scripts get broken by this "final step", even though we kept
issuing the warning message, we may want to re-think our overall
strategy for deprecation.  We've assumed that "keep behaving as
before but warn for a few years and then finally give a hard
failure" would be sufficient, but depending on how the script that
employ our programs hide the warnings from the end users, that may
not be a good enough transition strategy.

At the same time we re-think the deprecation strategy, we also need
to see if we can update our test framework to help us better.
Ideally, we could have caught this existing breakage of passing ""
as pathspec in the test soon after we went into "keep behaving as
before but warn" stage.  We didn't and found it out only when we
switched to "finally give a hard failure" stage.  That is very
unsatisfactory.

Needless to say, neither of the above two comes from _your_ change.
It's just something we need to improve in a larger scope of the
whole project I realized.

Thanks.

-- >8 --
Subject: [PATCH] t0027: do not use an empty string as a pathspec element

In an upcoming update, we will finally make an empty string illegal
as an element in a pathspec; it traditionally meant the same as ".",
i.e. include everything, so update this test that passes "" to pass
a dot instead.

At this point in the test sequence, there is no modified path that
need to be further added before committing; the working tree is
empty except for .gitattributes which was just added to the index.
So we could instead pass no pathspec, but this is a conversion more
faithful to the original.

Signed-off-by: Junio C Hamano 
---
  t/t0027-auto-crlf.sh | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 93725895a4..e41c9b3bb2 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -322,7 +322,7 @@ test_expect_success 'setup master' '
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
-   git commit -m "add .gitattributes" "" &&
+   git commit -m "add .gitattributes" . &&


Reading the context here, there shouldn't be a "pathspec" at all,
as .gitattributes had been added just one line above.

The line should have been from the very beginning:
git commit -m "add .gitattributes"  &&

# What do you think ?



printf "\$Id:  
\$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
printf "\$Id:  
\$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&

Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function

2017-06-19 Thread Torsten Bögershausen

On 2017-06-18 13:47, Lars Schneider wrote:
> 
>> On 18 Jun 2017, at 09:20, Torsten Bögershausen <tbo...@web.de> wrote:
>>
>>
>> On 2017-06-01 10:22, Lars Schneider wrote:
>>> This is useful for the subsequent patch 'convert: add "status=delayed" to
>>> filter process protocol'.
>>
>> May be
>> Collecting all filter error handling is useful for the subsequent patch
>> 'convert: add "status=delayed" to filter process protocol'.
> 
> I think I get your point. However, I feel "Collecting" is not the right word.
> 
> How about:
> "Refactoring filter error handling is useful..."?

OK


> 
>>>
>>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>>> ---
>>> convert.c | 47 ++-
>>> 1 file changed, 26 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/convert.c b/convert.c
>>> index f1e168bc30..a5e09bb0e8 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct 
>>> subprocess_entry *subprocess)
>>> return err;
>>> }
>>>
>>> +static void handle_filter_error(const struct strbuf *filter_status,
>>> +   struct cmd2process *entry,
>>> +   const unsigned int wanted_capability) {
>>> +   if (!strcmp(filter_status->buf, "error")) {
>>> +   /* The filter signaled a problem with the file. */
>>> +   } else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
>>> +   /*
>>> +* The filter signaled a permanent problem. Don't try to filter
>>> +* files with the same command for the lifetime of the current
>>> +* Git process.
>>> +*/
>>> +entry->supported_capabilities &= ~wanted_capability;
>>> +   } else {
>>> +   /*
>>> +* Something went wrong with the protocol filter.
>>> +* Force shutdown and restart if another blob requires 
>>> filtering.
>>> +*/
>>> +   error("external filter '%s' failed", entry->subprocess.cmd);
>>> +   subprocess_stop(_map, >subprocess);
>>> +   free(entry);
>>> +   }
>>> +}
>>> +
>>> static int apply_multi_file_filter(const char *path, const char *src, 
>>> size_t len,
>>>int fd, struct strbuf *dst, const char *cmd,
>>>const unsigned int wanted_capability)
>>> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, 
>>> const char *src, size_t len
>>> done:
>>> sigchain_pop(SIGPIPE);
>>>
>>> -   if (err) {
>>> -   if (!strcmp(filter_status.buf, "error")) {
>>> -   /* The filter signaled a problem with the file. */
>>Note1: Do we need a line with a single ";" here ?
> 
> The single ";" wouldn't hurt but I don't think it is helpful either.
> I can't find anything in the coding guidelines about this...

OK about that. I was thinking to remove the {}, and the we -need- the ";"

> 
> 
>>Question: What should/will happen to the file ?
>>Will Git complain later ? Retry later ?
> 
> The file is not filtered and Git does not try, again. 
> That might be OK for certain filters:
> If "filter.foo.required = false" then this would be OK. 
> If "filter.foo.required = true" then this would cause an error.

Does it make sense to add it as a comment ?
I know, everything is documented otherwise, but when looking at the source
 code only, the context may be useful, it may be not.

> 
> 
>>> -   } else if (!strcmp(filter_status.buf, "abort")) {
>>> -   /*
>>> -* The filter signaled a permanent problem. Don't try 
>>> to filter
>>> -* files with the same command for the lifetime of the 
>>> current
>>> -* Git process.
>>> -*/
>>> -entry->supported_capabilities &= ~wanted_capability;
>> Hm, could this be clarified somewhat ?
>> Mapping "abort" to "permanent problem" makes sense as
>>

Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function

2017-06-18 Thread Torsten Bögershausen


On 2017-06-01 10:22, Lars Schneider wrote:
> This is useful for the subsequent patch 'convert: add "status=delayed" to
> filter process protocol'.

May be
 Collecting all filter error handling is useful for the subsequent patch
 'convert: add "status=delayed" to filter process protocol'.

> 
> Signed-off-by: Lars Schneider 
> ---
>  convert.c | 47 ++-
>  1 file changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index f1e168bc30..a5e09bb0e8 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct 
> subprocess_entry *subprocess)
>   return err;
>  }
>  
> +static void handle_filter_error(const struct strbuf *filter_status,
> + struct cmd2process *entry,
> + const unsigned int wanted_capability) {
> + if (!strcmp(filter_status->buf, "error")) {
> + /* The filter signaled a problem with the file. */
> + } else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
> + /*
> +  * The filter signaled a permanent problem. Don't try to filter
> +  * files with the same command for the lifetime of the current
> +  * Git process.
> +  */
> +  entry->supported_capabilities &= ~wanted_capability;
> + } else {
> + /*
> +  * Something went wrong with the protocol filter.
> +  * Force shutdown and restart if another blob requires 
> filtering.
> +  */
> + error("external filter '%s' failed", entry->subprocess.cmd);
> + subprocess_stop(_map, >subprocess);
> + free(entry);
> + }
> +}
> +
>  static int apply_multi_file_filter(const char *path, const char *src, size_t 
> len,
>  int fd, struct strbuf *dst, const char *cmd,
>  const unsigned int wanted_capability)
> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, 
> const char *src, size_t len
>  done:
>   sigchain_pop(SIGPIPE);
>  
> - if (err) {
> - if (!strcmp(filter_status.buf, "error")) {
> - /* The filter signaled a problem with the file. */
Note1: Do we need a line with a single ";" here ?
Question: What should/will happen to the file ?
Will Git complain later ? Retry later ?
> - } else if (!strcmp(filter_status.buf, "abort")) {
> - /*
> -  * The filter signaled a permanent problem. Don't try 
> to filter
> -  * files with the same command for the lifetime of the 
> current
> -  * Git process.
> -  */
> -  entry->supported_capabilities &= ~wanted_capability;
 Hm, could this be clarified somewhat ?
 Mapping "abort" to "permanent problem" makes sense as
 such, but the only action that is done is to reset
 a capablity.

/*
 * The filter signaled a missing capabilty. Don't try to filter
 * files with the same command for the lifetime of the current
 * Git process.
 */

 # And the there is a question why the answer is "abort" and not
 # "unsupported"
> - } else {
> - /*
> -  * Something went wrong with the protocol filter.
> -  * Force shutdown and restart if another blob requires 
> filtering.
> -  */
> - error("external filter '%s' failed", cmd);
> - subprocess_stop(_map, >subprocess);
> - free(entry);
> - }
> - } else {
> + if (err)
> + handle_filter_error(_status, entry, wanted_capability);
> + else
>   strbuf_swap(dst, );
> - }
>   strbuf_release();
>   return !err;
>  }
>

Re: What's cooking in git.git (Jun 2017, #04; Tue, 13)

2017-06-14 Thread Torsten Bögershausen

On 14.06.17 09:42, Lars Schneider wrote:

> 
>> * ls/filter-process-delayed (2017-06-01) 5 commits
>> - convert: add "status=delayed" to filter process protocol
>> - convert: move multiple file filter error handling to separate function
>> - t0021: write "OUT" only on success
>> - t0021: make debug log file name configurable
>> - t0021: keep filter log files on comparison
>>
>> The filter-process interface learned to allow a process with long
>> latency give a "delayed" response.
>>
>> Needs review.
> 
> I wonder if anyone has a few free cycles to review this:
> http://public-inbox.org/git/20170601082203.50397-1-larsxschnei...@gmail.com/

It's on my todo-list, may be this weekend or so

Re: [PATCH v3] doc: do not use `rm .git/index` when normalizing line endings

2017-06-13 Thread Torsten Bögershausen




On 14/06/17 00:15, Andreas Heiduk wrote:
Looks good to me, one minor typo below


When illustrating how to normalize the line endings, the
documentation in gitattributes tells the user to `rm .git/index`.

This is incorrect for two reasons:

  - Users shouldn't be instructed to mess around with the internal
implementation of Git using raw file system tools like `rm`.

  - Within a submodule or an additional working tree `.git` is just a
file containing a `gitdir: ` pointer into the real `.git`
directory.  Therefore `rm .git/index` does not work.

The purpose or `rm .git/index` instruction is to remove all entries

  ^^

from the index without touching the working tree.  The way to do this
with Git is to use `read-tree --empty`.

[]

Re: [PATCH v2] doc: fix location of index in worktree scenatio

2017-06-12 Thread Torsten Bögershausen


On 06/12/2017 06:06 PM, Junio C Hamano wrote:

Torsten Bögershausen <tbo...@web.de> writes:


Thanks for working on this (and keeping me in cc)

The commit head line does not fully match my expactions:
"doc: fix location of index in worktree scenatio"
"doc:" is OK, but is the "location of index" fixed ?
Actually something that includes the important stuff:

"doc"
"fix"
"normalize the line endings"
"worktree scenatio"

could be more helpful.

How about this as a header for the commit:
"doc: normalize the line endings in a worktree scenatio"

Andreas's patch does not "normalize" anything, though.

 doc: do not encourage `rm .git/index` in an example

 When illustrating how to force normalizing the line endings,
 gitattributes documentation tells the user to `rm .git/index`.

 This is incorrect for two reasons.  We shouldn't be encouraging
 users to futz with the internal implementation of Git using raw
 filesystem tools like "rm" too much.  Also, when ".git" is not a
 directory but a "gitfile" pointing at the real location of the
 real ".git" directory, `rm .git/index` would not work anyway.

 The point of the step in the illustration is to remove all
 entries from the index without touching the working tree, and
 the way to do it with Git is to use `read-tree --empty`.

perhaps?

You _could_ mention "worktree scenario" but that is not the sole
user of the gitfile facility (e.g. a submodule working tree also
uses ".git" that is a gitfile pointing at the real repository
location), and "worktree" is not the real root cause of the problem
("gitfile" is), so I do not think it is essential to do so.  If we
really want to, we can add to the second from the paragraph
something like this:

 ... would not work anyway (the use of ".git" that is "gitfile"
 is often seen in a secondary working tree managed by "git
 worktree" and in a working tree of a submodule).


That's better ... here is my attempt to improve


doc: do not use `rm .git/index` when normalizing line endings

When illustrating how to normalize the line endings, the
documentation in gitattributes tells the user to `rm .git/index`.

This is incorrect for two reasons.  Users shouldn't be instructed
to futz with the internal implementation of Git using raw
file system tools like "rm".
Second, when submodules or second working trees are used, ".git"
is a not a directory but a "gitfile" pointing at the location of the
real ".git" directory, `rm .git/index` does not work.

The point of the step in the illustration is to remove all
entries from the index without touching the working tree, and
the way to do it with Git is to use `read-tree --empty`.

Re: [PATCH v2] doc: fix location of index in worktree scenatio

2017-06-12 Thread Torsten Bögershausen


Thanks for working on this (and keeping me in cc)

The commit head line does not fully match my expactions:
"doc: fix location of index in worktree scenatio"
"doc:" is OK, but is the "location of index" fixed ?
Actually something that includes the important stuff:

"doc"
"fix"
"normalize the line endings"
"worktree scenatio"

could be more helpful.

How about this as a header for the commit:
"doc: normalize the line endings in a worktree scenatio"







On 10/06/17 19:38, Andreas Heiduk wrote:

When setting `.gitattributes` in a second worktree, a plain `rm .git/index`
does not actually delete the index.
This feels somewhat short. setting .gitattributes is (in general) independent 
of the index.
In normalizing line endings case the user needs to do both, fix attribiutes, 
and re-read the work tree, discarding the index.


How about this:

When line endings are normalized in a second worktree, a plain `rm .git/index`
does not actually delete the index.
Fix a long standing bug in the documentaton and use "git read-tree --empty" 
instead-








Signed-off-by: Andreas Heiduk 
Helped-by: Junio C Hamano 
---
  Documentation/gitattributes.txt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 473648386..2a2d7e2a4 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -229,7 +229,7 @@ From a clean working directory:
  
  -

  $ echo "* text=auto" >.gitattributes
-$ rm .git/index # Remove the index to re-scan the working directory
+$ git read-tree --empty   # Clean index, force re-scan of working directory
  $ git add .
  $ git status# Show files that will be normalized
  $ git commit -m "Introduce end-of-line normalization"

Re: Git Daemon on Windows fatal error.

2017-06-01 Thread Torsten Bögershausen




On 31/05/17 21:10, Hector Santos wrote:
Hi, I am relatively new to GIT (coming from CVS and SVN) and I am trying to 
setup "Git Daemon" on windows.


I got it working for Local network communications:

d:\local\wc5\testgit>git clone git://localhost/http clone10
Cloning into 'clone10'...
remote: Counting objects: 526, done.
remote: Compressing objects: 100% (520/520), done.
Receiving objects: 100% (526/526), 1.38 MiB | 0 bytes/s, done.
remote: Total 526 (delta 81), reused 0 (delta 0)
Resolving deltas: 100% (81/81), done.

but it fails over the wire when using the public host domain:

d:\local\wc5\testgit>git clone git://public.example.dom/http clone11
Cloning into 'clone11'...
remote: Counting objects: 526, done.
remote: Compressing objects: 100% (520/520), done.
remote: Total 526 (delta 81), reused 0 (delta 0)
fatal: read error: Invalid argument
fatal: early EOF
fatal: index-pack failed

Sometimes its a different initial fatal error but generally the same. Once or 
twice, a repeat MAY work, but often not.

[] snip

First of all, welcome to Git.
Second, which version of Git are you using ? And which version of Windows ?
Third, I don't think that this has to do with Git Daemon.
   When I look at "read error: Invalid argument", it may be fixed in the
   latest version.
   And that why it is important to know the version you are using.

And, there is a special place to report problems with Git for Windows:
https://github.com/git-for-windows/git/wiki (hope I got it right),
but please feel free to continue here on the mailing list.

Re: Bug report: Corrupt pack file after committing a large file (>4 GB?)

2017-05-26 Thread Torsten Bögershausen

On 2017-05-26 07:51, Yu-Hsuan Chen wrote:
> Dear maintainer,
> 
> There is a bug where committing a large file corrupts the pack file in
> Windows. Steps to recreate are:
> 
> 1. git init
> 2. stage and commit a file larger than 4 GB (not entirely sure about this 
> size)
> 3. git checkout -f
> 
> The file checked out is much smaller than the original file size.
> 
> This behavior is surprising. If git does not support large files, I
> would at least expect an error message when staging or committing. I
> have post a question on StackOverflow regrading this issue, and has
> been confirmed by another user. (question id: 44022897)
> 
> Best regards,
> 
> David Chen
> 
Issues for Git for Windows should, in general, be reported here:
https://github.com/git-for-windows/git/

After 2 seconds of searching, we can find that the 4Gb problem
has already been reported:
https://github.com/git-for-windows/git/issues/1063

And, to my knowledge, it has not been fixed, since it is a
lot of effort to replace the "long" (or unsigned long) in the
Git code base with a better data type.

In other words, thanks for reminding us, more help is needed.

Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Torsten Bögershausen




diff --git a/builtin/clean.c b/builtin/clean.c
index d861f836a..937eb17b6 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -857,6 +857,38 @@ static void interactive_main_loop(void)
}
  }
  
+static void correct_untracked_entries(struct dir_struct *dir)

+{
+   int src, dst, ign;
+
+   for (src = dst = ign = 0; src < dir->nr; src++) {
+   /* skip paths in ignored[] that cannot be inside entries[src] */
+   while (ign < dir->ignored_nr &&
+  0 <= cmp_dir_entry(>entries[src], 
>ignored[ign]))
+   ign++;
+
+   if (ign < dir->ignored_nr &&
+   check_dir_entry_contains(dir->entries[src], 
dir->ignored[ign])) {
+   /* entries[src] contains an ignored path, so we drop it 
*/
+   free(dir->entries[src]);
+   } else {
+   struct dir_entry *ent = dir->entries[src++];
+
+   /* entries[src] does not contain an ignored path, so we 
keep it */
+   dir->entries[dst++] = ent;
+
+   /* then discard paths in entries[] contained inside 
entries[src] */
+   while (src < dir->nr &&
+  check_dir_entry_contains(ent, dir->entries[src]))
+   free(dir->entries[src++]);
+
+   /* compensate for the outer loop's loop control */
+   src--;
+   }
+   }
+   dir->nr = dst;
+}
+


Looking what the function does, would
drop_or_keep_untracked_entries()
make more sense ?

Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Torsten Bögershausen

On 2017-05-22 15:50, Lars Schneider wrote:
> +
> +int async_query_available_blobs(const char *cmd, struct string_list 
> *delayed_paths)
> +{
> + int err;
> + char *line;
> + struct cmd2process *entry;
> + struct child_process *process;
> + struct strbuf filter_status = STRBUF_INIT;
> +
> + entry = find_multi_file_filter_entry(_process_map, cmd);
> + if (!entry) {
> + error("external filter '%s' is not available anymore although "
> +   "not all paths have been filtered", cmd);
> + return 0;
> + }
> + process = >process;
> + sigchain_push(SIGPIPE, SIG_IGN);
> +
> + err = packet_write_fmt_gently(
> + process->in, "command=list_available_blobs\n");
> + if (err)
> + goto done;
> +
> + err = packet_flush_gently(process->in);
> + if (err)
> + goto done;
> +
> + for (;;) {
> + const char* pre = "pathname=";
> + const int pre_len = strlen(pre);
> + line = packet_read_line(process->out, NULL);
> + if (!line)
> + break;
> + err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
> + if (err)
> + goto done;
> + string_list_insert(delayed_paths, xstrdup(line+pre_len));
> + }
> +
> + read_multi_file_filter_status(process->out, _status);
> + err = strcmp(filter_status.buf, "success");
> +
> +done:
> + sigchain_pop(SIGPIPE);
> +
> + if (err || errno == EPIPE) {

This looks strange, at first glance.
Do we set errno to 0 before ?
Or is there a trick that EPIPE can only be reached,
if it is "our" error ?


> + if (!strcmp(filter_status.buf, "error")) {
> + /* The filter signaled a problem with the file. */
> + } else {
> + /*
> +  * Something went wrong with the protocol filter.
> +  * Force shutdown and restart if another blob requires
> +  * filtering.
> +  */
> + error("external filter '%s' failed", cmd);
> + kill_multi_file_filter(_process_map, entry);
> + }
> + }
> + return !err;
> +}
> +

Re: [PATCH v4 4/4] convert: add "status=delayed" to filter process protocol

2017-05-22 Thread Torsten Bögershausen

On 2017-05-22 15:50, Lars Schneider wrote:
> +After Git received the pathnames, it will request the corresponding
> +blobs again. These requests contain a pathname and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +
> +packet:  git> command=smudge
> +packet:  git> pathname=path/testfile.dat
> +packet:  git> 
> +packet:  git>   # empty content!
> +packet:  git< status=success
> +packet:  git< 
> +packet:  git< SMUDGED_CONTENT
> +packet:  git< 
> +packet:  git< 
> +

The documentation mentions "" 2 times.
Is this a bug in the docu ? Or a feature which may need a comment ?

Re: [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension

2017-05-20 Thread Torsten Bögershausen

> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> new file mode 100755
> index 00..d3cffc758f
> --- /dev/null
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -0,0 +1,153 @@
> +#!/bin/sh
> +
> +test_description='git status with file system watcher'
> +
> +. ./test-lib.sh
> +
> +clean_repo () {
> + git reset --hard HEAD
> + git clean -fd
> + rm marker -f

This Works under Linux, but not e.g. Mac OS X. Should be

rm -f marker

> +}

Re: [PATCH v2 1/1] t0027: tests are not expensive; remove t0025

2017-05-08 Thread Torsten Bögershausen




On 09/05/17 03:29, Junio C Hamano wrote:

Johannes Schindelin <johannes.schinde...@gmx.de> writes:


Recent "stress" tests show that t0025 if flaky, reported by Lars Schneider,
larsxschnei...@gmail.com

All tests from t0025 are covered in t0027 already, so that t0025 can be
retiered:


s/retiered/retired/


The execution time for t0027 is 14 seconds under Linux,
and 63 seconds under Mac Os X.
And in case you ask, things are not going significantly faster using a SSD
instead of a spinning disk.

Signed-off-by: Torsten Bögershausen <tbo...@web.de>


Thank you for this patch.

Apart from the tyop, would it be possible to fix the formatting to look
less strange? (Unless you use this to transport a super-secret message
steganographically to an alien planet or some such, of course.)


Ping?


I didn't had the time to send a patch yet -
so either you feel free to amend the commit message locally and queue
that - or I send a new version in  2 weeks.

Re: [PATCH v2 1/9] t7300: skip untracked dirs containing ignored files

2017-05-07 Thread Torsten Bögershausen

On 2017-05-05 12:46, Samuel Lijin wrote:
> If git sees a directory which contains only untracked and ignored
> files, clean -d should not remove that directory. It was recently
> discovered that this is *not* true of git clean -d, and it's possible
> that this has never worked correctly; this test and its accompanying
> patch series aims to fix that.
> 
> Signed-off-by: Samuel Lijin 
> ---
>  t/t7300-clean.sh | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index b89fd2a6a..252c75b40 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -653,4 +653,14 @@ test_expect_success 'git clean -d respects pathspecs 
> (pathspec is prefix of dir)
>   test_path_is_dir foobar
>  '
>  
> +test_expect_failure 'git clean -d skips untracked dirs containing ignored 
> files' '
> + echo /foo/bar >.gitignore &&
> + rm -rf foo &&
> + mkdir -p foo &&
Minor remark:
Do we need the -p here, or can it be dropped?

> + touch foo/bar &&
> + git clean -df &&
> + test_path_is_file foo/bar &&
> + test_path_is_dir foo
> +'
> +
>  test_done
>

Re: not uptodate. Cannot merge

2017-05-05 Thread Torsten Bögershausen

On 2017-05-04 23:40, G. Sylvie Davies wrote:
> Hi,
> 
> My little bitbucket "cherry-pick" button is failing on Windows from a
> "git reset --hard" blowing up.
> 
> My situation:  Git-2.10.2.windows.1 / Bitbucket-4.14.3 / Windows
> 10-10.0-amd64.   But I suspect even more recent Git will have the same
> problem.
> 
> Now, I'm pretty far from Kansas here as you'll see from my "git clone"
> invocation:
> 
> git.exe clone -c core.checkStat=minimal -c core.fileMode=false -c
> core.ignoreStat=true -c core.trustctime=false -c
> core.logAllRefUpdates=false --shared
> C:\Users\gsylvie\dev\bb\target\bitbucket\home\shared\data\repositories\1
> C:\Users\gsylvie\dev\bb\target\bitbucket\home\caches\bbClones\1
> 
> 
> Right after cloning I create a ".git/info/attributes" file containing
> just this one line:
> 
> * -text
This -may- be part of the problem.
In general, it is possible to add attributes on your local copy like
this, but it is not recommendet, at least not from me.
In general, the project should have a .gitattributes file, which
belongs to the project and which travels together with push and pull.
And of course, files should have been "normalized" and have LF in the repo.

In your case:
What does
git config core.autocrlf
say?

[]

Re: [PATCH/RFC 1/1] t0027: Some tests are not expensive

2017-04-30 Thread Torsten Bögershausen




On 01/05/17 05:07, Junio C Hamano wrote:

tbo...@web.de writes:


From: Torsten Bögershausen <tbo...@web.de>
...
The execution time for the non-expansive part is 6..8 seconds under Linux,
and 32 seconds under Mac Os X.

Running the "expensive" version roughly doubles the time.


Hmph, perhaps none of these is truly expensive if that is the case?
I am not talking about ~10 or ~30 seconds in absolute terms, but
reacting to "excluding the ones only halves the time."

The files used in the test vector seem to be all small.



Is the expensiveness due to the fact that there are so many of them,

Yes


and not due to some specific conversion being very slow?

Yes

The easiest solution is probably to simply remove the EXPENSIVE
precondition from t0027 and run it always on all platforms.
At the same time t0025 can be removed.
Unless somebody objects, I will send a V2 the next days.

Re: [PATCH v3 0/5] archive-zip: support files and archives bigger than 4GB

2017-04-30 Thread Torsten Bögershausen

On 2017-04-30 09:53, René Scharfe wrote:
> Am 30.04.2017 um 07:31 schrieb Torsten Bögershausen:
>> Sorry, I was not looking careful enough, the macro `$GIT_UNZIP`
>> gave the impression that an unzip provided by Git (or the Git test 
>> framework) was used :-(
>>
>> $ which unzip
>> /usr/bin/unzip
>>
>> $ unzip -v
>> UnZip 5.52 of 28 February 2005, by Info-ZIP.  Maintained by C. Spieler.  
>> Send
>> bug reports using http://www.info-zip.org/zip-bug.html; see README for 
>> details.
>>
>> Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
>> see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
>>
>> Compiled with gcc 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1) 
>> for Unix on Aug  1 2015.
>>
>> UnZip special compilation options:
>>  COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
>>  SET_DIR_ATTRIB
>>  TIMESTAMP
>>  USE_EF_UT_TIME
>>  USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
>>  USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
>>  VMS_TEXT_CONV
>>  [decryption, version 2.9 of 05 May 2000]
>>
>> UnZip and ZipInfo environment options:
>> UNZIP:  [none]
>>  UNZIPOPT:  [none]
>>   ZIPINFO:  [none]
>>ZIPINFOOPT:  [none]
> 
> OK, so they indeed still ship the old version of unzip that doesn't
> support big files.
> 
>> ok 13 # skip zip archive with files bigger than 4GB (missing ZIPINFO of 
>> EXPENSIVE,UNZIP,ZIPINFO)
> 
> And if you had zipinfo then this test would certainly fail.
> 

After installing unzip via mac ports, I have:

$ which zipinfo
/opt/local/bin/zipinfo

$ which unzip
 /opt/local/bin/unzip

$ unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81) for Unix
Mac OS X on Jan 31 2016.

UnZip special compilation options:
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
VMS_TEXT_CONV
[decryption, version 2.11 of 05 Jan 2007]

UnZip and ZipInfo environment options:
   UNZIP:  [none]
UNZIPOPT:  [none]
 ZIPINFO:  [none]
  ZIPINFOOPT:  [none]


> Anyway, thanks for running these expensive tests!  You could retry
> with unzip version 6.00 and its zipinfo if you want.  But we
> certainly need the following patch:
> 

The 6.0 is not compiled with ZIP64_SUPPORT
After manually doing a copy-paste of your patch from below, tests are skipped:

ok 11 - zip archive with many entries
ok 12 # skip zip archive bigger than 4GB (missing UNZIP_ZIP64_SUPPORT of
EXPENSIVE,UNZIP,UNZIP_ZIP64_SUPPORT)
ok 13 # skip zip archive with files bigger than 4GB (missing UNZIP_ZIP64_SUPPORT
of EXPENSIVE,UNZIP,UNZIP_ZIP64_SUPPORT,ZIPINFO)

[patch snipped]

Re: [PATCH v3 0/5] archive-zip: support files and archives bigger than 4GB

2017-04-29 Thread Torsten Bögershausen




On 30/04/17 00:28, René Scharfe wrote:

Am 29.04.2017 um 23:00 schrieb Torsten Bögershausen:

This fails here under Mac OS:
commit 4cdf3f9d84568da72f1dcade812de7a42ecb6d15
Author: René Scharfe <l@web.de>
Date:   Mon Apr 24 19:33:34 2017 +0200

 archive-zip: support files bigger than 4GB

---
Parts of t5004.log, hope this is helpful:

"$GIT_UNZIP" -t many-big.zip

Archive:  many-big.zip
warning [many-big.zip]:  577175 extra bytes at beginning or within zipfile
   (attempting to process anyway)
error [many-big.zip]:  start of central directory not found;
   zipfile corrupt.
   (please check that you have transferred or created the zipfile in the
   appropriate BINARY mode and that you have compiled UnZip properly)
not ok 12 - zip archive bigger than 4GB
#   
#   # build string containing 65536 characters


Which version of unzip do you have (unzip -v, look for ZIP64_SUPPORT)?
It seems that (some version of?) OS X ships with an older unzip which
can't handle big files:

   
https://superuser.com/questions/114011/extract-large-zip-file-50-gb-on-mac-os-x

Is the following check (zip archive with files bigger than 4GB) skipped,
e.g. because ZIPINFO is missing?  Otherwise I would expect it to fail as
well.

René



Sorry, I was not looking careful enough, the macro `$GIT_UNZIP`
gave the impression that an unzip provided by Git (or the Git test framework) 
was used :-(


$ which unzip
/usr/bin/unzip

$ unzip -v
UnZip 5.52 of 28 February 2005, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1) for Unix 
on Aug  1 2015.


UnZip special compilation options:
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
TIMESTAMP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
VMS_TEXT_CONV
[decryption, version 2.9 of 05 May 2000]

UnZip and ZipInfo environment options:
   UNZIP:  [none]
UNZIPOPT:  [none]
 ZIPINFO:  [none]
  ZIPINFOOPT:  [none]

---
And here is the longer log:
not ok 12 - zip archive bigger than 4GB
#
#   # build string containing 65536 characters
# 
s=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef &&
# 
s=$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s &&
# 
s=$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s &&

#
#   # create blob with a length of 65536 + 1 bytes
#   blob=$(echo $s | git hash-object -w --stdin) &&
#
#   # create tree containing 65500 entries of that blob
#   for i in $(test_seq 1 65500)
#   do
#   echo "100644 blob $blob $i"
#   done >tree &&
#   tree=$(git mktree <tree) &&
#
#   # zip it, creating an archive a bit bigger than 4GB
#   git archive -0 -o many-big.zip $tree &&
#
#   "$GIT_UNZIP" -t many-big.zip  65500 &&
#   "$GIT_UNZIP" -t many-big.zip
#

skipping test: zip archive with files bigger than 4GB
# Pack created with:
#   dd if=/dev/zero of=file bs=1M count=4100 && git hash-object -w file
mkdir -p .git/objects/pack &&
(
cd .git/objects/pack &&
"$GIT_UNZIP" "$TEST_DIRECTORY"/t5004/big-pack.zip
) &&
blob=754a93d6fada4c6873360e6cb4b209132271ab0e &&
size=$(expr 4100 "*" 1024 "*" 1024) &&

# create a tree containing the file
tree=$(echo "100644 blob $blob  big-file" | git mktree) &&

# zip it, creating an archive with a file bigger than 4GB
git archive -o big.zip $tree &&

"$GIT_UNZIP" -t big.zip &&
"$ZIPINFO" big.zip >big.lst &&
grep $size big.lst

ok 13 # skip zip archive with files bigger than 4GB (missing ZIPINFO of 
EXPENSIVE,UNZIP,ZIPINFO)


# failed 1 among 13 test(s)
1..13

Re: [PATCH v3 0/5] archive-zip: support files and archives bigger than 4GB

2017-04-29 Thread Torsten Bögershausen

On 2017-04-24 19:22, René Scharfe wrote:
> The first patch adds (expensive) tests, the next two are cleanups which
> set the stage for the remaining two to actually implement zip64 support
> for offsets and file sizes.
> 
> Half of the series had been laying around for months, half-finished and
> forgotten because I got distracted by the holiday season. :-/
> 
>   archive-zip: add tests for big ZIP archives
>   archive-zip: use strbuf for ZIP directory
>   archive-zip: write ZIP dir entry directly to strbuf
>   archive-zip: support archives bigger than 4GB
>   archive-zip: support files bigger than 4GB
> 
>  archive-zip.c   | 211 
> 
>  t/t5004-archive-corner-cases.sh |  45 +
>  t/t5004/big-pack.zip| Bin 0 -> 7373 bytes
>  3 files changed, 172 insertions(+), 84 deletions(-)
>  create mode 100644 t/t5004/big-pack.zip
> 
This fails here under Mac OS:
commit 4cdf3f9d84568da72f1dcade812de7a42ecb6d15
Author: René Scharfe 
Date:   Mon Apr 24 19:33:34 2017 +0200

archive-zip: support files bigger than 4GB

---
Parts of t5004.log, hope this is helpful:

"$GIT_UNZIP" -t many-big.zip

Archive:  many-big.zip
warning [many-big.zip]:  577175 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [many-big.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
not ok 12 - zip archive bigger than 4GB
#   
#   # build string containing 65536 characters

Re: Bug: Git rename does not work if folder naming was changed from lower to upper case on OSX

2017-04-29 Thread Torsten Bögershausen





after having committed folders with lower case naming, I decided to rename them
to upper-case names. Expecting git to detect them as renamings, it started a
new parallel hierarchy with new files, which I had to add/commit.

It was a kinda strange behavior, which I fixed by rename the folder to
something completely different, commit and rename the folder again to the
desired value.

Is this an actual desired behavior or is it a bug?


It is expected (but may be unexpected), I hope these hints are not too harsh:
You can blame the vendor of the OS,
who decided to design a file system which ignores the case of files.
(Or yourself, because you use it)
(Or yourself, because you can install a HFS+ partition under OSX which
is NOT case insensitive, but few people seem to know about this or use it)
(harsh mode off)

The correct way to rename a file under Git is to use Git:

git init
Initialized empty Git repository in /private/tmp/ttt/.git/
user@Mac:/tmp/ttt>mkdir dir1
user@Mac:/tmp/ttt>echo File >dir1/file
user@Mac:/tmp/ttt>git add dir1/file
user@Mac:/tmp/ttt>git commit -m "add dir1/file"
[master (root-commit) 14d3862] add dir1/file
 1 file changed, 1 insertion(+)
 create mode 100644 dir1/file
user@Mac:/tmp/ttt>git mv dir1/file DIR1/FILE
user@Mac:/tmp/ttt>git commit  "mv dir1/file DIR1/FILE"
user@Mac:/tmp/ttt>git ls-files
DIR1/FILE




Robert

Re: t0025 flaky on OSX

2017-04-25 Thread Torsten Bögershausen





So all in all it seams as if there is a very old race condition here,
which we "never" have seen yet.
Moving the file to a different inode number fixes the test case,
Git doesn't treat it as unchanged any more.


Thanks a lot for investigating this! Would you mind posting the
fix as patch?


That's ongoing.
TC #3 and #4 are fixable, but #5 resists to be cured so far.
I think we need a touch and sleep or so, more the next days (or weeks)

Re: t0025 flaky on OSX

2017-04-24 Thread Torsten Bögershausen

On 2017-04-24 19:00, Torsten Bögershausen wrote:
> On 2017-04-24 18:45, Lars Schneider wrote:
>> Hi,
>>
>> "t0025.3 - crlf=true causes a CRLF file to be normalized" failed 
>> sporadically on next and master recently: 
>> https://travis-ci.org/git/git/jobs/225084459#L2382
>> https://travis-ci.org/git/git/jobs/223830505#L2342
>>
>> Are you aware of a race condition in the code
>> or in the test?
> Not yet - I'll have a look
> 

So,
The test failed under Linux & pu of the day, using Peff's
stress script.

not ok 3 - crlf=true causes a CRLF file to be normalized

The good case (simplified):
$ git status
   modified:   CRLFonly

Untracked files:
.gitattributes


$ git diff | tr "\015" Q
warning: CRLF will be replaced by LF in CRLFonly.
The file will have its original line endings in your working directory.
diff --git a/CRLFonly b/CRLFonly
index 44fc21c..666dbf4 100644
--- a/CRLFonly
+++ b/CRLFonly
@@ -1,7 +1,7 @@
-IQ
-amQ
-veryQ
-veryQ
-fineQ
-thankQ
-youQ
+I
+am
+very
+very
+fine
+thank
+you


The failed case:
$ git status
Untracked files:
.gitattributes

---
$ ls -al -i
total 28
3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 .
3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 CRLFonly
3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:27 .git
3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL

-
#After
$ mv CRLFonly tmp
$ cp tmp CRLFonly
$ ls -al -i
3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:36 .
3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
3401599 -rw-r--r-- 1 tb tb   37 Apr 24 21:36 CRLFonly
3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:36 .git
3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL
3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 tmp

$ git status
modified:   CRLFonly
Untracked files:
.gitattributes
tmp


So all in all it seams as if there is a very old race condition here,
which we "never" have seen yet.
Moving the file to a different inode number fixes the test case,
Git doesn't treat it as unchanged any more.

Re: t0025 flaky on OSX

2017-04-24 Thread Torsten Bögershausen

On 2017-04-24 18:45, Lars Schneider wrote:
> Hi,
> 
> "t0025.3 - crlf=true causes a CRLF file to be normalized" failed 
> sporadically on next and master recently: 
> https://travis-ci.org/git/git/jobs/225084459#L2382
> https://travis-ci.org/git/git/jobs/223830505#L2342
> 
> Are you aware of a race condition in the code
> or in the test?
Not yet - I'll have a look

Re: [PATCH] test-lib: abort when can't remove trash directory

2017-04-24 Thread Torsten Bögershausen


[]


-   cd "$(dirname "$remove_trash")" &&
-   rm -rf "$(basename "$remove_trash")" ||
-   error "Tests passed but test cleanup failed; aborting"
+   cd "$(dirname "$TRASH_DIRECTORY")" &&
+   rm -fr "$TRASH_DIRECTORY" ||
+   error "Tests passed but test cleanup failed; aborting"
+   fi


Yeah, that looks good to me.


Does it ?
Checking the error code of "rm -f" doesn't work as expected from the script:
rm -rf DoesNotExist ; echo $?
0

I think it should be


+   cd "$(dirname "$TRASH_DIRECTORY")" &&
+   rm -r "$TRASH_DIRECTORY" ||
+   error "Tests passed but test cleanup failed; aborting"

Re: [PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-20 Thread Torsten Bögershausen




I think I meant to write "big pidfiles" there.

With xsize_t() gc would die when seeing a pidfile whose size doesn't fit into
size_t.  The version I sent just ignores such files.  However, it would choke
on slightly smaller files that happen to not fit into memory.  And no
reasonable pidfile can be big enough to trigger any of that, so dying on
conversion error wouldn't really be a problem.  Is that what you meant?  It
makes sense, in any case.


In short: Yes.



Thanks,
René

Re: [PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-20 Thread Torsten Bögershausen

On 2017-04-19 22:02, René Scharfe wrote:
> Am 19.04.2017 um 21:09 schrieb Torsten Bögershausen:
>> On 2017-04-19 19:28, René Scharfe wrote:
>> []
>> One or two minor comments inline
>>> diff --git a/builtin/gc.c b/builtin/gc.c
>>> index 2daede7820..4c1c01e87d 100644
>>> --- a/builtin/gc.c
>>> +++ b/builtin/gc.c
>>> @@ -228,21 +228,99 @@ static int need_to_gc(void)
>>>   return 1;
>>>   }
>>>   +struct pidfile {
>>> +struct strbuf buf;
>>> +char *hostname;
>>> +};
>>> +
>>> +#define PIDFILE_INIT { STRBUF_INIT }
>>> +
>>> +static void pidfile_release(struct pidfile *pf)
>>> +{
>>> +pf->hostname = NULL;
>>> +strbuf_release(>buf);
>>> +}
>>> +
>>> +static int pidfile_read(struct pidfile *pf, const char *path,
>>> +unsigned int max_age_seconds)
>>> +{
>>> +int fd;
>>> +struct stat st;
>>> +ssize_t len;
>>> +char *space;
>>> +int rc = -1;
>>> +
>>> +fd = open(path, O_RDONLY);
>>> +if (fd < 0)
>>> +return rc;
>>> +
>>> +if (fstat(fd, ))
>>> +goto out;
>>> +if (time(NULL) - st.st_mtime > max_age_seconds)
>>> +goto out;
>>> +if (st.st_size > (size_t)st.st_size)
>>
>> Minor: we need xsize_t here ?
>> if (st.st_size > xsize_t(st.st_size))
> 
> No, xsize_t() would do the same check and die on overflow, and pidfile_read() 
> is
> supposed to handle big pids gracefully.
This about the file size, isn't it ?
And here xsize_t should be save to use and good practise.

Re: [PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-19 Thread Torsten Bögershausen

On 2017-04-19 19:28, René Scharfe wrote:
[]
One or two minor comments inline
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 2daede7820..4c1c01e87d 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -228,21 +228,99 @@ static int need_to_gc(void)
>   return 1;
>  }
>  
> +struct pidfile {
> + struct strbuf buf;
> + char *hostname;
> +};
> +
> +#define PIDFILE_INIT { STRBUF_INIT }
> +
> +static void pidfile_release(struct pidfile *pf)
> +{
> + pf->hostname = NULL;
> + strbuf_release(>buf);
> +}
> +
> +static int pidfile_read(struct pidfile *pf, const char *path,
> + unsigned int max_age_seconds)
> +{
> + int fd;
> + struct stat st;
> + ssize_t len;
> + char *space;
> + int rc = -1;
> +
> + fd = open(path, O_RDONLY);
> + if (fd < 0)
> + return rc;
> +
> + if (fstat(fd, ))
> + goto out;
> + if (time(NULL) - st.st_mtime > max_age_seconds)
> + goto out;
> + if (st.st_size > (size_t)st.st_size)

Minor: we need xsize_t here ?
if (st.st_size > xsize_t(st.st_size))

> + goto out;
> +
> + len = strbuf_read(>buf, fd, st.st_size);
> + if (len < 0)
> + goto out;
> +
> + space = strchr(pf->buf.buf, ' ');
> + if (!space) {
> + pidfile_release(pf);
> + goto out;
> + }
> + pf->hostname = space + 1;
> + *space = '\0';
> +
> + rc = 0;
> +out:
> + close(fd);
> + return rc;
> +}
> +
> +static int parse_pid(const char *value, pid_t *ret)
> +{
> + if (value && *value) {
> + char *end;
> + intmax_t val;
> +
> + errno = 0;
> + val = strtoimax(value, , 0);
> + if (errno == ERANGE)
> + return 0;
> + if (*end)
> + return 0;
> + if (labs(val) > maximum_signed_value_of_type(pid_t)) {
> + errno = ERANGE;
> + return 0;
> + }
> + *ret = val;
> + return 1;
> + }
> + errno = EINVAL;
> + return 0;
> +}
> +
> +static int pidfile_process_exists(const struct pidfile *pf)
> +{
> + pid_t pid;
> + return parse_pid(pf->buf.buf, ) &&
> + (!kill(pid, 0) || errno == EPERM);
> +}
> +
>  /* return NULL on success, else hostname running the gc */
> -static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
> +static int lock_repo_for_gc(int force, struct pidfile *pf)
>  {
>   static struct lock_file lock;
>   char my_host[128];

Huh ?
should this be increased, may be in another path ?

Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-19 Thread Torsten Bögershausen


>> (Back to the roots)
>> Which criteria do you have in mind: When should a filter process the blob
>> and return it immediately, and when would it respond "delayed" ?
> 
> See above: it's up to the filter. In case of Git LFS: delay if a network call 
> is required.
> 
That make sense.
I try to understand the big picture, and from here try to review
the details.
Does it make sense to mention "git lfs" in the commit message,
and/or add some test code ?

Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-11 Thread Torsten Bögershausen

On 2017-04-11 21:50, Lars Schneider wrote:

[]
>> packet:  git> command=smudge
>> packet:  git> pathname=path/testfile.dat
>> packet:  git> delay-id=1
>> packet:  git> 
>> packet:  git> CONTENT
>> packet:  git> 
>> packet:  git< status=delayed # this means: Git, please feed more
>> packet:  git> 
> Actually, this is how I implemented it first.
> 
> However, I didn't like that because we associate a
> pathname with a delay-id. If the filter does not
> delay the item then we associate a different
> pathname with the same delay-id in the next request. 
> Therefore I think it is better to present the delay-id 
> *only* to the filter if the item is actually delayed.
> 
> I would be surprised if the extra round trip does impact
> the performance in any meaningful way.
> 

2 spontanous remarks:

- Git can simply use a counter which is incremented by each blob
  that is send to the filter.
  Regardless what the filter answers (delayed or not), simply increment a
  counter. (or is this too simple and I miss something?)

- I was thinking that the filter code is written as either "never delay" or
  "always delay".
  "Never delay" is the existing code.
  What is your idea, when should a filter respond with delayed ?
  My thinking was "always", silently assuming the more than one core can be
  used, so that blobs can be filtered in parallel.

>We could do this but I think this would only complicate
>the protocol. I expect the filter to spool results to the
>disk or something.
  Spooling things to disk was not part of my picture, to be honest.
  This means additional execution time when a SSD is used, the chips
  are more worn out...
  There may be situations, where this is OK for some users (but not for others)
  How can we prevent Git from (over-) flooding the filter?
  The protocol response from the filter would be just "delayed", and the filter
  would block Git, right ?
  But, in any case, it would still be nice if Git would collect converted blobs
  from the filter, to free resource here.
  This is more like the "streaming model", but on a higher level:
  Send 4 blobs to the filter, collect the ready one, send the 5th blob to
  the filter, collect the ready one, send the 6th blob to the filter, collect
  ready one


(Back to the roots)
Which criteria do you have in mind: When should a filter process the blob
and return it immediately, and when would it respond "delayed" ?

Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-10 Thread Torsten Bögershausen

On 2017-04-09 21:11, Lars Schneider wrote:
[]
> +
> +packet:  git> command=smudge
> +packet:  git> pathname=path/testfile.dat
> +packet:  git> delay-able=1
> +packet:  git> 
> +packet:  git> CONTENT
> +packet:  git> 
> +packet:  git< status=delayed
> +packet:  git< 
> +packet:  git> delay-id=1
> +packet:  git> 
> +packet:  git< status=success
> +packet:  git< 

(not sure if this was mentioned before)
If a filter uses the delayed feature, I would read it as
a response from the filter in the style:
"Hallo Git, I need some time to process it, but as I have
CPU capacity available, please send another blob,
so I can chew them in parallel."

Can we save one round trip ?

packet:  git> command=smudge
packet:  git> pathname=path/testfile.dat
packet:  git> delay-id=1
packet:  git> 
packet:  git> CONTENT
packet:  git> 
packet:  git< status=delayed # this means: Git, please feed more
packet:  git> 

# Git feeds the next blob.
# This may be repeated some rounds.
# (We may want to restrict the number of rounds for Git, see below)
# After these some rounds, the filter needs to signal:
# no more fresh blobs please, collect some data and I can free memory
# and after that I am able to get a fresh blob.
packet:  git> command=smudge
packet:  git> pathname=path/testfile.dat
packet:  git> delay-id=2
packet:  git> 
packet:  git> CONTENT
packet:  git> 
packet:  git< status=pleaseWait
packet:  git> 

# Now Git needs to ask for ready blobs.

> +
> +
> +If the filter supports the "delay" capability then it must support the
> +"list_available_blobs" command. If Git sends this command, then the
> +filter is expected to return a list of "delay_ids" of blobs that are
> +available. The list must be terminated with a flush packet followed
> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.
> +
> +packet:  git> command=list_available_blobs
> +packet:  git> 
> +packet:  git< 7

Is the "7" the same as the "delay-id=1" from above?
It may be easier to understand, even if it costs some bytes, to answer instead
packet:  git< delay-id=1
(And at this point, may I suggest to change "delay-id" into "request-id=1" ?

> +packet:  git< 13

Same question here: is this the delay-id ?

> +packet:  git< 
> +packet:  git< status=success
> +packet:  git< 
> +
> +
> +After Git received the "delay_ids", it will request the corresponding
> +blobs again. These requests contain a "delay-id" and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +
> +packet:  git> command=smudge
> +packet:  git> pathname=test-delay10.a
> +packet:  git> delay-id=0

Minor question: Where does the "id=0" come from ?

> +packet:  git> 
> +packet:  git>   # empty content!
> +packet:  git< status=success
> +packet:  git< 
> +packet:  git< SMUDGED_CONTENT
> +packet:  git< 
> +packet:  git< 

OK, good.

The quest is: what happens next ?

2 things, kind of in parallel, but we need to prioritize and serialize:
- Send the next blob
- Fetch ready blobs
- And of course: ask for more ready blobs.
(it looks as if Peff and Jakub had useful comments already,
  so I can stop here?)

In general, Git should not have a unlimited number of blobs outstanding,
as memory constraints may apply.
There may be a config variable for the number of outstanding blobs,
(similar to the window size in other protocols) and a variable
for the number of "send bytes in outstanding blobs"
(similar to window size (again!) in e.g TCP)

The number of outstanding blobs is may be less important, and it is more
important to monitor the number of bytes we keep in memory in some way.

Something like "we set a limit to 500K of out standng data", once we are
above the limit, don't send any new blobs.

(No, I didn't look at the code at all, this protocol discussion
is much more interesting)

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1221 matches

Mail list logo