from:"Torsten Bögershausen"

Re: [PATCH] tests: avoid syntax triggering old dash bug

2018-11-27 Thread Torsten Bögershausen

On Wed, Nov 28, 2018 at 01:47:41AM +, brian m. carlson wrote:
> On Tue, Nov 27, 2018 at 05:42:53PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > Avoid a bug in dash that's been fixed ever since its
> > ec2c84d ("[PARSER] Fix clobbering of checkkwd", 2011-03-15)[1] first
> > released with dash v0.5.7 in July 2011.
> > 
> > This fixes 1/2 tests failing on Debian Lenny & Squeeze. The other
> > failure is due to 1b42f45255 ("git-svn: apply "svn.pathnameencoding"
> > before URL encoding", 2016-02-09).
> 
> Are people still using such versions of Debian?  I only see wheezy (7)
> on the mirrors, not squeeze (6) or lenny (5).  It might be better for us
> to encourage users to upgrade to an OS that has security support rather
> than work around bugs in obsolete OSes.

Yes, I have an old PowerPC box to test if code handle endians right.
And to ask people to upgrade does not conflict with supporting older
versions (if that is as easy as this patch).
I think we can have both.

t5570 shaky for anyone ?

2018-11-25 Thread Torsten Bögershausen

After running the  "Git 2.20-rc1" testsuite here on a raspi,
the only TC that failed was t5570.
When the "grep" was run on daemon.log, the file was empty (?).
When inspecting it later, it was filled, and grep would have found
the "extended.attribute" it was looking for.

The following fixes it, but I am not sure if this is the ideal
solution.


diff --git a/t/t5570-git-daemon.sh b/t/t5570-git-daemon.sh
index 7466aad111..e259fee0ed 100755
--- a/t/t5570-git-daemon.sh
+++ b/t/t5570-git-daemon.sh
@@ -192,6 +192,7 @@ test_expect_success 'daemon log records all attributes' '
GIT_OVERRIDE_VIRTUAL_HOST=localhost \
git -c protocol.version=1 \
ls-remote "$GIT_DAEMON_URL/interp.git" &&
+   sleep 1 &&
grep -i extended.attribute daemon.log | cut -d" " -f2- >actual &&
test_cmp expect actual
 '

A slightly better approach may be to use a "sleep on demand":

+   ( grep -i -q extended.attribute daemon.log || sleep 1 ) &&

Re: What's cooking in git.git (Sep 2018, #01; Tue, 4)

2018-11-25 Thread Torsten Bögershausen

On Sun, Nov 25, 2018 at 05:28:35AM +0100, Torsten Bögershausen wrote:
> On Sat, Nov 24, 2018 at 08:33:37PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > 
> > On Wed, Sep 05 2018, Ævar Arnfjörð Bjarmason wrote:
> > 
> > > On Wed, Sep 05 2018, Eric Sunshine wrote:
> 
> []
> 
> > > SunCC used to be ahead of GCC & Clang when it came to certain classes of
> > > warnings, but e.g. now everything it complains about is because it
> > > doesn't understand C as well, e.g. we have quite a few compile warnings
> > > due to code like this, which it claims is unreachable (but isn't):
> > > https://github.com/git/git/blob/v2.19.0-rc2/read-cache.c#L950-L955
> > 
> 
> Wait a second - even if the compiler claims something (wrong)...
> there a still 1+1/2 questions from my side:
> 
> 
> int verify_path(const char *path, unsigned mode)
> {
>   char c;
>^
>   /* Q1: should  "c" be initialized like this: */
>   char c = *path;
> 
>   if (has_dos_drive_prefix(path))
>   return 0;
> 
>   goto inside;
>    /* Q2: and why do we need the "goto" here ? */
>   for (;;) {
>   if (!c)
>   return 1;
>   if (is_dir_sep(c)) {
> inside:

After some re-reading,
I think that the "goto inside" was just hard to read

Out of interest:
would the following make the compiler happy ?


diff --git a/read-cache.c b/read-cache.c
index 49add63fe1..d574d58b9d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -951,17 +951,15 @@ static int verify_dotfile(const char *rest, unsigned mode)
 
 int verify_path(const char *path, unsigned mode)
 {
-   char c;
+   char c = *path ? '/' : '\0';
 
if (has_dos_drive_prefix(path))
return 0;
 
-   goto inside;
for (;;) {
if (!c)
return 1;
if (is_dir_sep(c)) {
-inside:
if (protect_hfs) {
if (is_hfs_dotgit(path))
return 0;

Re: What's cooking in git.git (Sep 2018, #01; Tue, 4)

2018-11-24 Thread Torsten Bögershausen

On Sat, Nov 24, 2018 at 08:33:37PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Sep 05 2018, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Wed, Sep 05 2018, Eric Sunshine wrote:

[]

> > SunCC used to be ahead of GCC & Clang when it came to certain classes of
> > warnings, but e.g. now everything it complains about is because it
> > doesn't understand C as well, e.g. we have quite a few compile warnings
> > due to code like this, which it claims is unreachable (but isn't):
> > https://github.com/git/git/blob/v2.19.0-rc2/read-cache.c#L950-L955
> 

Wait a second - even if the compiler claims something (wrong)...
there a still 1+1/2 questions from my side:


int verify_path(const char *path, unsigned mode)
{
char c;
 ^
/* Q1: should  "c" be initialized like this: */
char c = *path;

if (has_dos_drive_prefix(path))
return 0;

goto inside;
 /* Q2: and why do we need the "goto" here ? */
for (;;) {
if (!c)
return 1;
if (is_dir_sep(c)) {
inside:

Re: BUG: CR marker ^M doesn't show up in '-' lines of diffs when the ending of the removed line is CR+LF

2018-11-24 Thread Torsten Bögershausen

On Sat, Nov 24, 2018 at 03:51:26PM +0100, Frank Schäfer wrote:
[]
> 
> Hmm... is CR-only line termination supported at all ?
> E.g. 'eol' can be set to 'lf' or 'crlf' but not 'cr'...
> 

No, CR-only is not supported, because:
Nobody was implementing it, and that is probably because
the only question abou CR-only (at least what I remember)
was a when an old Mac OS (not the Mac OS X)
was used (which used to use CR instead of LF).

And, such feature may be implemented by writing a filter,
replace CR with LF as "clean" and "LF" with "CR" for smudge.

Re: Cygwin Git with Windows paths

2018-11-20 Thread Torsten Bögershausen

On 20.11.18 01:17, Steven Penny wrote:
> On Sun, Nov 18, 2018 at 11:21 PM Torsten Bögershausen wrote:
>> If nothing works,
>> it may help to add some fprintf(stderr,...) in the functions used
>> by 05b458c104708141d2f:
>>
>> strip_last_component(),
>> get_next_component()
>> real_path_internal()
> 
> I didnt see any "real_path_internal" in the current codebase - however i added
> some "printf" to the other 2 and got this:
> 
> $ git clone git://github.com/benhoyt/goawk 'C:\cygwin64\tmp\goawk'
> get_next_component, next, []
> get_next_component, remaining, [C:\cygwin64\tmp\goawk]
> Cloning into 'C:\cygwin64\tmp\goawk'...
> get_next_component, next, []
> get_next_component, remaining, [C:\cygwin64\tmp\goawk/.git]
> fatal: Invalid path '/usr/local/cache/git/C:\cygwin64\tmp\goawk': No such file
> or directory
> 

Could you please post a "git diff" of your instrumented code,
so that I/we can follow the debugging, especially what the printouts mean?

I think we need to understand what is going on in abspath.c

Re: grep issues

2018-11-19 Thread Torsten Bögershausen

On Sun, Nov 11, 2018 at 03:17:50PM +0200, Orgad Shaneh wrote:
> Hi,
> 
> I found 2 bugs in grep, using Git for Windows 2.19.1 (but noticed
> these several versions ago):
> 
> 1. git grep --recursive on a worktree (without rev) always matches
> against the submodule's HEAD, not its worktree, as it should.
> 2. When core.autocrlf (or eol=crlf) is used, and a file in the
> worktree has CRLF, git grep fails to match $ against EOL.
> 
> For example:
> git init
> echo 'file eol=crlf' > .gitattributes
> echo ABCD > file
> git add file
> git commit -m 'CRLF'
> rm file
> git checkout -f file
> git grep 'D$' file # Nothing
> git grep 'D$' HEAD -- file # Found!
> 
> - Orgad

I can confirm the "2. When core.autocrlf" bug, or should we call
it a non-implemented feature.
It seems as if a convert_to_git() is needed in grep.c,
but I haven't found the time to dig deeper.
Does anybody wants to work on this ?

Re: [PATCH/RFC v2 1/1] Use size_t instead of 'unsigned long' for data in memory

2018-11-19 Thread Torsten Bögershausen

On Tue, Nov 20, 2018 at 06:04:54AM +0100, tbo...@web.de wrote:
> From: Torsten Bögershausen 
> 
> Currently the length of data which is stored in memory is stored
> in "unsigned long" at many places in the code base.
> This is OK when both "unsigned long" and size_t are 32 bits,
> (and is OK when both are 64 bits).
> On a 64 bit Windows system am "unsigned long" is 32 bit, and
> that may be too short to measure the size of objects in memory,
> a size_t is the natural choice.
> 
> Improve the code base in "small steps", as small as possible.
> The smallest step seems to be much bigger than expected.

Ops, it seems as if I send this message out twice -
please ignore the "PATCH/RFC v2 1/1"
Sorry for the noise.

Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-19 Thread Torsten Bögershausen

On Sun, Nov 18, 2018 at 03:18:52PM -0500, Derrick Stolee wrote:
> On 11/17/2018 10:11 AM, tbo...@web.de wrote:
> >From: Torsten Bögershausen 
> >
> >Currently Git users can not commit files >4Gib under 64 bit Windows,
> >where "long" is 32 bit but size_t is 64 bit.
> >Improve the code base in small steps, as small as possible.
> >What started with a small patch to replace "unsigned long" with size_t
> >in one file (convert.c) ended up with a change in many files.
> >
> >Signed-off-by: Torsten Bögershausen 
> >---
> >
> >This needs to go on top of pu, to cover all the good stuff
> >   cooking here.
> 
> Better to work on top of 'master', as the work in 'pu' will be rewritten
> several times, probably.
> 
> >I have started this series on November 1st, since that 2 or 3 rebases
> >   had been done to catch up, and now it is on pu from November 15.
> >
> >I couldn't find a reason why changing "unsigned ling"
> >   into "size_t" may break anything, any thoughts, please ?
> 
> IIRC, the blocker for why we haven't done this already is that "size_t",
> "timestamp_t" and "off_t" are all 64-bit types that give more implied
> meaning, so we should swap types specifically to these as we want. One
> example series does a specific, small change [1].
> 
> If we wanted to do a single swap that removes 'unsigned long' with an
> unambiguously 64-bit type, I would recommend "uint64_t". Later replacements
> to size_t, off_t, and timestamp_t could happen on a case-by-case basis for
> type clarity.
> 
> It may benefit reviewers to split this change into multiple patches,
> starting at the lowest levels of the call stack (so higher 'unsigned long's
> can up-cast to the 64-bit types when calling a function) and focusing the
> changes to one or two files at a time (X.c and X.h, preferrably).
> 
> Since you are talking about the benefits for Git for Windows to accept 4GB
> files, I wonder if we can add a test that verifies such a file will work. If
> you have such a test, then I could help verify that the test fails before
> the change and succeeds afterward.
> 
> Finally, it may be good to add a coccinelle script that replaces 'unsigned
> long' with 'uint64_t' so we can easily fix any new introductions that happen
> in the future.

The plan was never to change "unsigned long" to a 64 bit value in general.
The usage of "unsigned long" (instead of size_t) was (and is) still good
for 32bit systems, where both are 32 bit. (at least all system I am aware of).

For 64 bit systems like Linux or Mac OS it is the same, both are 64 bit.

The only problematic system is Win64, where "unsigned long" is 32 bit,
and therefore we must use size_t to address data in memory.
This is not to be confused with off_t, which is used for "data on disk"
(and nothing else) or timestamp_t which is used for timestamps (and nothing 
else).

I haven't followed the "coccinelle script" development at all, if someone
makes a patch do replace "unsigned long" with size_t, that could replace
my whole patch. (Some of them may be downgraded to "unsigned int" ?)

However, as we need to let  tb/print-size-t-with-uintmax-format make it to
master, otherwise we are not able to print the variables in a portable way. 


> Thanks! I do think we should make this change, but we must be careful. It
> may be disruptive to topics in flight.
> 
> -Stolee
> 
> [1] https://public-inbox.org/git/20181112084031.11769-1-care...@gmail.com/
>

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-11-19 Thread Torsten Bögershausen

On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
> While I don't have an HFS+ volume to test, I suspect this patch should be
> needed for both, even if I have to say thay even the broken output was
> better than the current state.
> 
> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
> 
> Was windows/cygwin tested?
> 
> Carlo
> -- >8 --
> Subject: [PATCH] entry: fix t5061 on macOS
> 
> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
> 2018-08-17) was tested on Linux with an excemption for Windows that needs
> to be expanded for macOS (using APFS), which then would show :
> 
> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'man2/_Exit.2'
>   'man2/_exit.2'
>   'man3/NAN.3'
>   'man3/nan.3'
> 
> Signed-off-by: Carlo Marcelo Arenas Belón 
> ---
>  entry.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..3845f570f7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
> *state,
>  {
>   int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>   trust_ino = 0;
>  #endif
>  
> 

Sorry,
but I can't reproduce your problem here.

Did you test it on Mac ?
If I run t5601 on a case sensitive files system
(Mac, mounted NFS, exported from Linux)
I get:
ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
!MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

And if I run it on a case-insensitive HFS+,
I get
ok 99 - colliding file detection

So what exactly are you trying to fix ?

Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-18 Thread Torsten Bögershausen

On 2018-11-19 00:40, Junio C Hamano wrote:
> Derrick Stolee  writes:
> 
>>> This needs to go on top of pu, to cover all the good stuff
>>>cooking here.
>>
>> Better to work on top of 'master', as the work in 'pu' will be
>> rewritten several times, probably.
> 
> We may not be able to find any good moment to update some codepaths
> with deep callchains that reaches a basic API function that take
> ulong that way, as things are always in motion, but hopefully a lot
> of areas that need changes are rather isolated.  
> 
> For example, the changes I see around "offset" (which is "ulong" and
> the patch wants to change it to "size_t") in archive-tar.c in the
> patch do not have any interaction with the changes in this patch
> outside that single file, and I do not think any topic in-flight
> would interact with this change badly, either.  I didn't carefully
> look at the remainder of the patches, but I have a feeling that many
> can be separated out into independent and focused set of smaller
> patches that can be evaluated on their own.
> 

The archive-tar.c is actually a good example, why a step-by-step update
is not ideal (the code would not work any more on Win64).

If we look here:

static int stream_blocked(const struct object_id *oid)
{
struct git_istream *st;
enum object_type type;
size_t sz;
char buf[BLOCKSIZE];
ssize_t readlen;

st = open_istream(oid, , , NULL);
  ^
if (!st)
return error(_("cannot stream blob %s"), oid_to_hex(oid));
for (;;) {

The sz variable must follow whatever open_istream() uses, so if we start
with archive-tar.c, we must use either size_t or ulong, whatever
open_istream() needs. Otherwise things will break:
archive-tar.c uses ulong, open_istream() size_t, but we are passing pointers
around, and here  != _t

If we only update open_istream(), but not archive-tar.c, then
things are not better:
_t != 

I don't have a good idea how to split the patch.
However, "add a coccinelle script" may be a solution.

Re: Cygwin Git with Windows paths

2018-11-18 Thread Torsten Bögershausen

On 2018-11-19 04:33, Junio C Hamano wrote:
> "Randall S. Becker"  writes:
> 
>>> Torsten Bögershausen  writes:
>>>
>>>> And it may even be that we need a special handling for the "\" to be
>>>> treated as "/".
>>>
>>> I do not do Windows, but is_dir_sep() needs to be tweaked if you want to do
>>> that.
>>
>> Heavy Cygwin user here. It is used in my environment for
>> cross-compilation. Everything should be done using / separators in
>> Cygwin, not \. So /cygdrive/c, /cygdrive/d always prefaces the
>> path rather than C:\ or D:\, which won't parse. It is,
>> essentially, a bash environment, including that git completions
>> work properly. Backslash ends up doing what it would in bash.
> 
> In short, in your opinion, the original message in this thread
> expresses an invalid wish, as C:\path\to\dir\ is not a valid way to
> spell the path to the directory, and it should be written as
> /cygdrive/c/path/to/dir instead?
> 
> How well does this argument work in the real world, when another
> claim in the original message
> 
> This causes problems for any non-Cygwin tools that might call Git:
> 
> http://github.com/golang/go/issues/23155
> 
> is taken into account, I wonder, though?
> 


Back to the original email, where the path embedded in ''
and the bash does not interpret the "\", I think.

>   $ git clone git://github.com/benhoyt/goawk 'C:\cygwin64\tmp\goawk'
>   Cloning into 'C:\cygwin64\tmp\goawk'...
>   fatal: Invalid path '/home/Steven/C:\cygwin64\tmp\goawk': No such file or
>   directory

>It seems the problem is that Git thinks the Windows form path is relative
>because it does not start with "/".

>A Git Bisect reveals this:
>05b458c104708141d2fad211d79703b3b99cc5a8 is the first bad commit
>commit 05b458c104708141d2fad211d79703b3b99cc5a8
>Author: Brandon Williams 
>Date:   Mon Dec 12 10:16:52 2016 -0800


The first question is, does this work under Git for Windows ?

Looking into 05b458c104708141d2fad, it seems as if the following functions
need to be "overridden" for cygwin, similar as we do it for mingw:
 is_dir_sep()
 offset_1st_component()
 find_last_dir_sep()


If nothing works,
it may help to add some fprintf(stderr,...) in the functions used
by 05b458c104708141d2f:

strip_last_component(),
get_next_component()
real_path_internal()

Re: Cygwin Git with Windows paths

2018-11-18 Thread Torsten Bögershausen

On Sun, Nov 18, 2018 at 11:34:04AM -0600, Steven Penny wrote:
> On Sun, Nov 18, 2018 at 11:15 AM Torsten Bögershausen wrote:
> > But it may be that we need to pull in more stuff, similar to mingw,
> > to get the C: stuff working, see
> > "skip_dos_drive_prefix"
> >
> > And it may even be that we need a special handling for the "\" to be treated
> > as "/".
> >
> > If you implement "skip_dos_drive_prefix" similar to mingw,
> > (rename mingw into cygwin) does
> >
> > git clone  C:/my/dir/
> > work ?
> 
> I added this to "compat/cygwin.h":
> 
> #define has_dos_drive_prefix(path) \
>   (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0)
> int mingw_skip_dos_drive_prefix(char **path);
> #define skip_dos_drive_prefix mingw_skip_dos_drive_prefix
> 
> and added this to "compat/cygwin.c":
> 
> int mingw_skip_dos_drive_prefix(char **path) {
>   int ret = has_dos_drive_prefix(*path);
>   *path += ret;
>   return ret;
> }
> 
> but still, these dont work:
> 
> git clone  C:/my/dir
> git clone  'C:\my\dir'

Thanks for testing.
It looks as if there is more work to be done then just a simple patch.

My last question for today:
Does 

git clone  '/cgdrive/c/my/dir'

work ?

Re: Cygwin Git with Windows paths

2018-11-18 Thread Torsten Bögershausen

On Sun, Nov 18, 2018 at 10:23:19AM -0600, Steven Penny wrote:
> On Sun, Nov 18, 2018 at 9:41 AM Torsten Bögershausen wrote:
> > Thanks for the report
> > It seams as if "C:" is not recognized as an absolute path under
> > cygwin.
> > May be it should ?
> >
> > Does the following help ? (fully untested)
> 
> that looks promising - but its not getting pulled in where it needs to be.
> perhaps another file need to be modified to utilize that macro?

The macro should be utilized, see git-compat-util.h:

#if defined(__CYGWIN__)
#include "compat/cygwin.h"
#endif

And further down

#ifndef has_dos_drive_prefix
static inline int git_has_dos_drive_prefix(const char *path)
{
return 0;
}
#define has_dos_drive_prefix git_has_dos_drive_prefix
#endif

#ifndef skip_dos_drive_prefix
static inline int git_skip_dos_drive_prefix(char **path)
{
return 0;
}

But it may be that we need to pull in more stuff, similar to mingw,
to get the C: stuff working, see
"skip_dos_drive_prefix"

And it may even be that we need a special handling for the "\" to be treated
as "/".

If you implement "skip_dos_drive_prefix" similar to mingw,
(rename mingw into cygwin) does

git clone  C:/my/dir/
work ?

That would be a progress, I think.

Re: Cygwin Git with Windows paths

2018-11-18 Thread Torsten Bögershausen

On Sun, Nov 18, 2018 at 07:21:58AM -0800, Steven Penny wrote:
> Cygwin programs can handle Unix form paths:
> 
>$ ls /var
>cache  lib  log  run  tmp
> 
> and also Windows form paths:
> 
>$ ls 'C:\cygwin64\var'
>cache  lib  log  run  tmp
> 
> However current Cygwin Git cannot:
> 
>$ git clone git://github.com/benhoyt/goawk 'C:\cygwin64\tmp\goawk'
>Cloning into 'C:\cygwin64\tmp\goawk'...
>fatal: Invalid path '/home/Steven/C:\cygwin64\tmp\goawk': No such file or
>directory
> 
> It seems the problem is that Git thinks the Windows form path is relative
> because it does not start with "/". A Git Bisect reveals this:
> 
> 05b458c104708141d2fad211d79703b3b99cc5a8 is the first bad commit
> commit 05b458c104708141d2fad211d79703b3b99cc5a8
> Author: Brandon Williams 
> Date:   Mon Dec 12 10:16:52 2016 -0800
> 
>real_path: resolve symlinks by hand
> 
>The current implementation of real_path uses chdir() in order to resolve
>symlinks.  Unfortunately this isn't thread-safe as chdir() affects a
>process as a whole and not just an individual thread.  Instead perform
>the symlink resolution by hand so that the calls to chdir() can be
>removed, making real_path one step closer to being reentrant.
> 
>Signed-off-by: Brandon Williams 
>Signed-off-by: Junio C Hamano 
> 
> This causes problems for any non-Cygwin tools that might call Git:
> 
> http://github.com/golang/go/issues/23155
> 

Thanks for the report
It seams as if "C:" is not recognized as an absolute path under
cygwin.
May be it should ?

Does the following help ? (fully untested)


diff --git a/compat/cygwin.h b/compat/cygwin.h
index 8e52de4644..12814e1edb 100644
--- a/compat/cygwin.h
+++ b/compat/cygwin.h
@@ -1,2 +1,4 @@
 int cygwin_offset_1st_component(const char *path);
 #define offset_1st_component cygwin_offset_1st_component
+#define has_dos_drive_prefix(path) \
+   (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0)

Re: [PATCH v2 1/1] remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)

2018-11-11 Thread Torsten Bögershausen

On Mon, Nov 12, 2018 at 12:50:30PM +0900, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
> >
> > This is a re-semd, the orignal patch was part of a 2
> > patch-series.
> > This patch needed some rework, and here should be
> > the polished version.
> 
> Will queue.

Thanks, is there a chance to kill a typo here ?
s/comopared/compared/
- A temporally variable "size" is used, promoted int uintmax_t and the comopared


> Next time, please refrain from saying "re-send", if you
> changed anything in the patch (or the log message), as the phrase
> implies you are sending the same thing as before (just in case the
> earlier one was not seen, for example).  Marking it as vN+1 like you
> did for this patch and having a reference to the original would make
> it clear, though ;-)

Sorry for the confusion.
The next time I will not send unrelated patches as a series,
so that we have a better "Message-ID:" and "In-Reply-To:"
flow (which should make live 3% easier).

https://public-inbox.org/git/20181029165914.2677-1-tbo...@web.de/
https://public-inbox.org/git/20181109174110.27630-1-tbo...@web.de/

Re: [PATCH v2 1/1] Upcast size_t variables to uintmax_t when printing

2018-11-11 Thread Torsten Bögershausen

On Sun, Nov 11, 2018 at 02:28:35AM -0500, Jeff King wrote:
> On Sun, Nov 11, 2018 at 08:05:04AM +0100, tbo...@web.de wrote:
> 
> > From: Torsten Bögershausen 
> > 
> > When printing variables which contain a size, today "unsigned long"
> > is used at many places.
> > In order to be able to change the type from "unsigned long" into size_t
> > some day in the future, we need to have a way to print 64 bit variables
> > on a system that has "unsigned long" defined to be 32 bit, like Win64.
> > 
> > Upcast all those variables into uintmax_t before they are printed.
> > This is to prepare for a bigger change, when "unsigned long"
> > will be converted into size_t for variables which may be > 4Gib.
> 
> I like the overall direction. I feel a little funny doing this step now,
> and not as part of a series to convert individual variables. But I
> cannot offhand think of any reason that it would behave badly even if
> the other part does not materialize
> 

Hej all,
There may be some background information missing:
- I did a 2-patch series based on this commit in pu:
commit 37c59c3e8fac8bae7ccc5baa148b0e9bae0c8d65
Author: Junio C Hamano 
Date:   Sat Oct 27 16:42:25 2018 +0900

treewide: apply cocci patch

(that patch was never send out, see below)

The week later, I tried to apply it on pu, but that was nearly hopeless,
as too much things had changed on pu.
I had the chance to compile & test it, but decided to take "part2" before
"part1", so to say:
Fix all the printing, and wait for the master branch to settle,
and then do the "unsigned long" -> size_t conversion.
That will probably happen after 2.20.

At the moment, the big "unsigned long" -> size_t conversion is dependend on
 - mk/use-size-t-in-zlib
 - sb/more-repo-in-api,
 - jk/xdiff-interface'
 (and probably more stuff from Duy and others)

How should we handle the big makeover ?

> -Peff

And thanks for reading my stuff

Re: git-rebase is ignoring working-tree-encoding

2018-11-08 Thread Torsten Bögershausen

On Wed, Nov 07, 2018 at 05:38:18AM +0100, Adrián Gimeno Balaguer wrote:
> Hello Torsten,
> 
> Thanks for answering.
> 
> Answering to your question, I removed the comments with "rebase" since
> my reported encoding issue happens on more simpler operations
> (described in the PR), and the problem is not directly related to
> rebasing, so I considered it better in order to avoid unrelated
> confusions.
> 
> Let's get back to the problem. Each system has a default endianness.
> Also, in .gitattributes's working-tree-encoding, Git behaves
> differently depending on the attribute's value and the contents of the
> referenced entry file. When I put the value "UTF-16", then the file
> must have a BOM, or Git complains. Otherwise, if I put the value
> "UTF-16BE" or "UTF-16LE", then Git prohibites operations if file has a
> BOM for that main encoding (UTF-16 here), which can be relate to any
> endianness.
> 
> My very initial goal was, given a UTF-16LE file, to be able to view
> human-readable diffs whenever I make a change on it (and yes, it must
> be Little Endian). Plus, this file had a BOM. Now, what are the
> options with Git currently (consider only working-tree-encoding)? If I
> put working-tree-encoding=UTF-16, then I could view readable diffs and
> commit the file, but here is the main problem: Git looses information
> about what initial endianness the file had, therefore, after
> staging/committing it re-encodes the file from UTF-8 (as stored
> internally) to UTF-16 and the default system endianness. In my case it
> did to Big Endian, thus affecting the project's requirement. That is
> why I ended up writing a fixup script to change the encoding back to
> UTF-16LE.

OK, I think I understand your problem now.
The file format which you ask for could be named "UTF-16-BOM-LE",
but that does not exist in reality.
If you use UTF-16, then there must be a BOM, and if there is a BOM,
then a Unicode-aware application -should- be able to handle it.

Why does your project require such a format ?

> 
> On the other hand, once I set working-tree-encoding=UTF-16LE, then Git
> prohibited me from committing the file and even viewing human-readable
> diffs (the output simply tells it's a binary file). In this sense, the
> internal location of these  errors is within the function of utf8.c I
> made changes to in the PR. I hope I was clearer!
> 
> Finally, Git behaviour around this is based on Unicode standards,
> which is why I acknowledged that my changes violated them after
> refering to a link which is present in the ut8.h file.

[]

Re: git-rebase is ignoring working-tree-encoding

2018-11-06 Thread Torsten Bögershausen

On Mon, Nov 05, 2018 at 07:10:14PM +0100, Torsten Bögershausen wrote:
> On Mon, Nov 05, 2018 at 05:24:39AM +0100, Adrián Gimeno Balaguer wrote:
> 
> []
> 
> > https://github.com/git/git/pull/550
>  
> []
>  
> > This is covered in the mentioned PR above. Thanks for feedback.
> 
> Thanks for the code,
> I will have a look (the next days)
> 
> > 
> > -- 
> > Adrián

Hej Adrián,

I still didn't manage to fully understand your problem.
I tried to convert your test into my understanding,
It can be fetched here (or copied from this message, see below)

https://github.com/tboegi/git/tree/tb.181106_UTF16LE_commit

The commit of an empty file seems to work for me, in the initial
report a "rebase" was mentioned, which is not in the TC ?

Is the following what you intended to test ?

#!/bin/sh
test_description='UTF-16 LE/BE file encoding using working-tree-encoding'


. ./test-lib.sh

# We specify the UTF-16LE BOM manually, to not depend on programs such as iconv.
utf16leBOM=$(printf '\377\376')

test_expect_success 'Stage empty UTF-16LE file as binary' '
>empty_0.txt &&
echo "empty_0.txt binary" >>.gitattributes &&
git add empty_0.txt
'


test_expect_success 'Stage empty file with enc=UTF.16BL' '
>utf16le_0.txt &&
echo "utf16le_0.txt text working-tree-encoding=UTF-16BE" 
>>.gitattributes &&
git add utf16le_0.txt
'


test_expect_success 'Create and stage UTF-16LE file with only BOM' '
printf "$utf16leBOM" >utf16le_1.txt &&
echo "utf16le_1.txt text working-tree-encoding=UTF-16" >>.gitattributes 
&&
git add utf16le_1.txt
'

test_expect_success 'Dont stage UTF-16LE file with only BOM with enc=UTF.16BE' '
printf "$utf16leBOM" >utf16le_2.txt &&
echo "utf16le_2.txt text working-tree-encoding=UTF-16BE" 
>>.gitattributes &&
test_must_fail git add utf16le_2.txt
'

test_expect_success 'commit all files' '
test_tick &&
git commit -m "Commit all 3 files"
'

test_expect_success 'All commited files have the same sha' '
git ls-files -s --eol >tmp1 &&
sed -e "s!  i/none.*!!" actual &&
>expect &&
test_cmp expect actual
'

test_done

Re: git-rebase is ignoring working-tree-encoding

2018-11-05 Thread Torsten Bögershausen

On Mon, Nov 05, 2018 at 05:24:39AM +0100, Adrián Gimeno Balaguer wrote:

[]

> https://github.com/git/git/pull/550
 
[]
 
> This is covered in the mentioned PR above. Thanks for feedback.

Thanks for the code,
I will have a look (the next days)

> 
> -- 
> Adrián

Re: git-rebase is ignoring working-tree-encoding

2018-11-04 Thread Torsten Bögershausen

On Fri, Nov 02, 2018 at 03:30:17AM +0100, Adrián Gimeno Balaguer wrote:
> I’m attempting to perform fixups via git-rebase of UTF-16 LE files
> (the project I’m working on requires that exact encoding on certain
> files). When the rebase is complete, Git changes that file’s encoding
> to UTF-16 BE. I have been using the newer working-tree-encoding
> attribute in .gitattributes. I’m using Git for Windows.
> 
> $ git version
> git version 2.19.1.windows.1
> 
> Here is a sample UTF-16 LE file (with BOM and LF endings) with
> following atributes in .gitattributes:
> 
> test.txt eol=lf -text working-tree-encoding=UTF-16
> 
> I put eol=lf and -text to tell Git to not change the encoding of the
> file on checkout, but that doesn’t even help. Asides, the newer
> working-tree-encoding allows me to view human-readable diffs of that
> file (in GitHub Desktop and Git Bash). Now, note that doing for
> example consecutive commits to the same file does not affect the
> UTF-16 LE encoding. And before I discovered this attribute, the whole
> thing was even worse when squash/fixup rebasing, as Git would modify
> the file with Chinese characters (when manually setting it as text via
> .gitattributes).
> 
> So, again the problem with the exposed .gitattributes line is that
> after fixup rebasing, UTF-16 LE files encoding change to UTF-16 BE.
> 
> For long, I have been working with the involved UTF-16 LE files set as
> binary via .gitattributes (e.g. “test.txt binary”), so that Git would
> not modify the file encoding, but this doesn’t allow me to view the
> diffs upon changes in GitHub Desktop, which I want (and neither via
> git diff).

Thanks for the report.
I have tried to follow the problem from your verbal descriptions
(and the PR) but I need to admit that I don't fully understand the
problem (yet).

Could you try to create some instructions how to reproduce it?
A numer of shell istructions would be great,
in best case some kind of "test case", like the tests in
the t/ directory in Git.

It would be nice to be able to re-produce it.
And if there is a bug, to get it fixed.

Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Torsten Bögershausen

On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> There are three ways to convince cat-file to stream a blob:
> 
>   - cat-file -p $blob
> 
>   - cat-file blob $blob
> 
>   - echo $batch | cat-file --batch
> 
> In the first two, we simply exit with the error code of
> streaw_blob_to_fd(). That means that an error will cause us
> to exit with "-1" (which we try to avoid) without printing
> any kind of error message (which is confusing to the user).
> 
> Instead, let's match the third case, which calls die() on an
> error. Unfortunately we cannot be more specific, as
> stream_blob_to_fd() does not tell us whether the problem was
> on reading (e.g., a corrupt object) or on writing (e.g.,
> ENOSPC). That might be an opportunity for future work, but
> for now we will at least exit with a sane message and exit
> code.
> 
> Signed-off-by: Jeff King 
> ---
>  builtin/cat-file.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 8d97c84725..0d403eb77d 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode,
>   return 0;
>  }
>  
> +static int stream_blob(const struct object_id *oid)

Sorry for nit-picking:
could this be renamed into stream_blob_to_stdout() ?

> +{
> + if (stream_blob_to_fd(1, oid, NULL, 0))

And I wonder if we could make things clearer:
 s/1/STDOUT_FILENO/
 
 (Stolen from fast-import.c)

> + die("unable to stream %s to stdout", oid_to_hex(oid));
> + return 0;
> +}
> +
[]

Re: [PATCH] Prevent warning

2018-10-29 Thread Torsten Bögershausen

Thanks for the patch.
Could you please sign it off ?
The other remark would be if the header line could be written longer than
just
"Prevent warning".
to give people digging into the Git history an initial information,
where the warning occured and from which module it was caused.
May be something like this as a head line ?

gitweb.perl: Fix Odd number of elements warning



On Mon, Oct 29, 2018 at 03:39:30PM +, Pete wrote:
> Prevent the following warning in the web server error log:
> gitweb.cgi: Odd number of elements in anonymous hash at 
> /usr/share/gitweb/gitweb.cgi line 3305
> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 2594a4badb3d7..200647b683225 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3302,7 +3302,7 @@ sub git_get_remotes_list {
>   next if $wanted and not $remote eq $wanted;
>   my ($url, $key) = ($1, $2);
>  
> - $remotes{$remote} ||= { 'heads' => () };
> + $remotes{$remote} ||= { 'heads' => [] };
>   $remotes{$remote}{$key} = $url;
>   }
>   close $fd or return;
> 
> --
> https://github.com/git/git/pull/548

Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode

2018-10-28 Thread Torsten Bögershausen

On Sat, Oct 27, 2018 at 10:48:23AM +0200, Nguyễn Thái Ngọc Duy wrote:
> In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
> can but only in three patterns:
> 
> - '**/' matches zero or more leading directories
> - '/**/' matches zero or more directories in between
> - '/**' matches zero or more trailing directories/files
> 
> When '**' is present but not in one of these patterns, the current
> behavior is consider the pattern invalid and stop matching. In other
> words, 'foo**bar' never matches anything, whatever you throw at it.
> 
> This behavior is arguably a bit confusing partly because we can't
> really tell the user their pattern is invalid so that they can fix
> it. So instead, tolerate it and make '**' act like two regular '*'s
> (which is essentially the same as a single asterisk). This behavior
> seems more predictable.

Nice analyzes.
I have one question here:
If the user specifies '**' and nothhing is found,
would it be better to die() with a useful message
instead of silently correcting it ?

See the the patch below:
> - } else
> - return WM_ABORT_MALFORMED;

Would it be possible to put in the die() here?
As it is outlined so nicely above, a '**' must have either a '/'
before, or behind, or both, to make sense.
When there is no '/' then the user specified something wrong.
Either a '/' has been forgotten, or the '*' key may be bouncing.
I don't think that Git should assume anything here.
(but I didn't follow the previous discussions, so I may have missed
some arguments.)

[]

Re: [PATCH v1 2/2] curl_off_t xcurl_off_t is not portable

2018-10-26 Thread Torsten Bögershausen

On Fri, Oct 26, 2018 at 11:48:38AM +0900, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
> > From: Torsten Bögershausen 
> 
> > Subject: Re: [PATCH v1 2/2] curl_off_t xcurl_off_t is not portable
> 
> That title is misleading; it sounded as if the are these two
> typedefs and they do not work correctly on some platforms, but that
> is not what you are doing with the patch.

OK.

> 
> > Comparing signed and unsigned values is not always portable.
> 
> Is that what the compiler is complaining about?  There is this bit
> in git-compat-util.h:

No, not that either, see below.

> 
> /*
>  * Signed integer overflow is undefined in C, so here's a helper macro
>  * to detect if the sum of two integers will overflow.
>  *
>  * Requires: a >= 0, typeof(a) equals typeof(b)
>  */
> #define signed_add_overflows(a, b) \
> ((b) > maximum_signed_value_of_type(a) - (a))
> 
> which is designed to be fed signed a and signed b.  The macro is
> used in packfile codepaths to compare int, off_t, etc..
> 
> So the statement may be true but it does not seem to have much to do
> with the problem you are seeing with maximum_signed_value_of_type().
> 
> > When  setting
> > DEVELOPER = 1
> > DEVOPTS = extra-all
> >
> > "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with
> > "comparison is always false due to limited range of data type"
> > "[-Werror=type-limits]"
> 
> Then this sounds a bit different from "comparison between signed
> ssize_t len and unsigned maximum_signed_value_of_type() is bad".
> Isn't it saying that "No matter how big you make len, you can never
> go beyond maximum_signed_value_of_type(curl_off_t)"?

I digged a little bit deeper into the raspi, and this is what I find
under
/usr/include/arm-linux-gnueabihf/curl

curlbuild.h:#define CURL_TYPEOF_CURL_OFF_T int64_t
curlbuild.h:typedef CURL_TYPEOF_CURL_OFF_T curl_off_t;

> 
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 762a55a75f..c89fd6d1c3 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -618,9 +618,10 @@ static int probe_rpc(struct rpc_state *rpc, struct 
> > slot_results *results)
> >  }
> >  
> >  static curl_off_t xcurl_off_t(ssize_t len) {
> > -   if (len > maximum_signed_value_of_type(curl_off_t))
> 
> Is the issue that len is signed and maximum_signed_value_of_type()
> gives an unsigned value, and these two are compared?  As we saw
> earlier, signed_add_overflows() is another example that wants a
> mixed comparison.
> 
> I am just wondering if casting len to uintmax_t before comparing
> with maximum_signed_value_of_type() is a simpler solution that can
> safely be cargo-culted to other places without much thinking.

I don't know.
Since ssize_t is 32 bit on the raspi, and curl_off_t is 64 bit,
the test seems not to be needed at all ;-)
I don't know if it makes sense to stop thinking here and if
casting to uintmax_t is the right solution here.

And, I like the easy-to-read xsize_t, which is safe and warm.
Agreed that the commit message is wrong.
I would like to keep the xsize_t aproach, are there more thoughts ?

> 
> "git grep maximum_signed_value_of_type" reports a handful
> comparisons in vcs-svn/, all of which does
> 
>   if (var > maximum_signed_value_of_type(off_t))
> 
> with var of type uintmax_t, which sounds like a sane thing to do.
> 
> Thanks.
> 
> > +   curl_off_t size = (curl_off_t) len;
> > +   if (len != (ssize_t) size)
> > die("cannot handle pushes this big");
> > -   return (curl_off_t) len;
> > +   return size;
> >  }
>

Re: [PATCH] builtin/receive-pack: dead initializer for retval in check_nonce

2018-10-20 Thread Torsten Bögershausen

On Sat, Oct 20, 2018 at 12:08:59AM -0700, Carlo Marcelo Arenas Belón wrote:
> NONCE_BAD is explicitly set when needed with the fallback
> instead as NONCE_SLOP
> 
> Signed-off-by: Carlo Marcelo Arenas Belón 
> ---
>  builtin/receive-pack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 95740f4f0e..ecce3d4043 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -507,7 +507,7 @@ static const char *check_nonce(const char *buf, size_t 
> len)
>   char *nonce = find_header(buf, len, "nonce", NULL);
>   timestamp_t stamp, ostamp;
>   char *bohmac, *expect = NULL;
> - const char *retval = NONCE_BAD;
> + const char *retval;
>  
>   if (!nonce) {
>   retval = NONCE_MISSING;

Thanks for the patch.
The motivation feels a little bit weak, at least to me.
Initializing a variable to "BAD" in the beginning can be a good thing
for two reasons:
- There is a complex if-elseif chain, which should set retval
  in any case, this is at least what I expect taking a very quick look at the
  code:
const char *retval = NONCE_BAD;

if (!nonce) {
retval = NONCE_MISSING;
goto leave;
} else if (!push_cert_nonce) {
retval = NONCE_UNSOLICITED;
goto leave;
} else if (!strcmp(push_cert_nonce, nonce)) {
retval = NONCE_OK;
goto leave;
}
# And here I started to wonder if we should have an else or not.
# Having retval NONCE_BAD set to NONCE:BAD in the beginning makes
# it clear, that we are save without the else.
# As an alternative, we could have coded like this:

const char *retval;

if (!nonce) {
retval = NONCE_MISSING;
goto leave;
} else if (!push_cert_nonce) {
retval = NONCE_UNSOLICITED;
goto leave;
} else if (!strcmp(push_cert_nonce, nonce)) {
retval = NONCE_OK;
goto leave;
} else {
/* Set to BAD, until we know better further down */
retval = NONCE_BAD;
}

# The second reason is that some compilers don't understand this complex
# stuff either, and through out a warning, like
# "retval may be uninitialized" or something in that style.
# This is very compiler dependent.

So yes, the current code may seem to be over-eager and ask for optimization,
but we don't gain more that a couple of nano-seconds or so.
The good thing is that  we have the code a little bit more robust, when changes 
are done
in the future.

Re: [PATCH v2 1/1] zlib.c: use size_t for size

2018-10-14 Thread Torsten Bögershausen

On 15.10.18 06:22, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>>>
>>> For the record, I am not opposed to including the comment _and_ using
>>> xsize_t() to do the cast, giving us an assertion that the comment is
>>> correct.
>>
>> Heh, I haven't found any enthusiasm tonight. Let's see if there
>> are any more comments/opinions.
> 
> OK, in the meantime, I've replaced the thread-starter partch I had
> in 'pu' with your v3.
> 
If that helps to move things forward:  I'm fully fine with v3.

Re: [PATCH v2 1/1] zlib.c: use size_t for size

2018-10-12 Thread Torsten Bögershausen

[]
> Neither v1 nor v2 of this patch compiles on 32 bit Linux; see
> 
>   https://travis-ci.org/git/git/jobs/440514375#L628
> 
> The fixup patch below makes it compile on 32 bit and the test suite
> passes as well, but I didn't thoroughly review the changes; I only
> wanted to get 'pu' build again.
> 

Oh, yes, I didn't test under Linux 32 bit (and neither Windows)
I will try to compose a proper v3 the next days.

[snip the good stuff]

Re: [PATCH v3 4/4] read-cache: speed up index load through parallelization

2018-09-06 Thread Torsten Bögershausen



> diff --git a/read-cache.c b/read-cache.c
> index fcc776aaf0..8537a55750 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1941,20 +1941,212 @@ static void *load_index_extensions(void *_data)
>   return NULL;
>  }
>  
> +/*
> + * A helper function that will load the specified range of cache entries
> + * from the memory mapped file and add them to the given index.
> + */
> +static unsigned long load_cache_entry_block(struct index_state *istate,
> + struct mem_pool *ce_mem_pool, int offset, int nr, void 
> *mmap,
> + unsigned long start_offset, struct strbuf 
> *previous_name)
> +{
> + int i;
> + unsigned long src_offset = start_offset;

I read an unsigned long here:
should that be a size_t instead ?

(And probably even everywhere else in this patch)

> +
> + for (i = offset; i < offset + nr; i++) {
> + struct ondisk_cache_entry *disk_ce;
> + struct cache_entry *ce;
> + unsigned long consumed;
> +
> + disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
> src_offset);
> + ce = create_from_disk(ce_mem_pool, disk_ce, , 
> previous_name);
> + set_index_entry(istate, i, ce);
> +
> + src_offset += consumed;
> + }
> + return src_offset - start_offset;
> +}
> +
> +static unsigned long load_all_cache_entries(struct index_state *istate,
> + void *mmap, size_t mmap_size, unsigned long src_offset)
> +{
> + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> + unsigned long consumed;
> +
> + if (istate->version == 4) {
> + previous_name = _name_buf;
> + mem_pool_init(>ce_mem_pool,
> + 
> estimate_cache_size_from_compressed(istate->cache_nr));
> + } else {
> + previous_name = NULL;
> + mem_pool_init(>ce_mem_pool,
> + estimate_cache_size(mmap_size, 
> istate->cache_nr));
> + }
> +
> + consumed = load_cache_entry_block(istate, istate->ce_mem_pool,
> + 0, istate->cache_nr, mmap, src_offset, 
> previous_name);
> + strbuf_release(_name_buf);
> + return consumed;
> +}
> +
> +#ifndef NO_PTHREADS
> +

Re: [PATCH v3 05/11] t0027: make hash size independent

2018-08-31 Thread Torsten Bögershausen

On Wed, Aug 29, 2018 at 12:56:36AM +, brian m. carlson wrote:
> We transform various object IDs into all-zero object IDs for comparison.
> Adjust the length as well so that this works for all hash algorithms.
> 
> Signed-off-by: brian m. carlson 
> ---
>  t/t0027-auto-crlf.sh | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
> index beb5927f77..0f1235d9d1 100755
> --- a/t/t0027-auto-crlf.sh
> +++ b/t/t0027-auto-crlf.sh
> @@ -14,11 +14,13 @@ compare_files () {
>  compare_ws_file () {
>   pfx=$1
>   exp=$2.expect
> + tmp=$2.tmp
>   act=$pfx.actual.$3
> - tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
> + tr '\015\000abcdef0123456789' QN0 <"$2" >"$tmp" &&
>   tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
> + sed -e "s/*/$ZERO_OID/" "$tmp" >"$exp" &&
>   test_cmp "$exp" "$act" &&
> - rm "$exp" "$act"
> + rm "$exp" "$act" "$tmp"
>  }
>  
>  create_gitattributes () {

I only managed to review the changes in t0027.
Out of interest: why do we use a "tmp" file here?
Would it make more sense  to chain the 'tr' with 'sed' and skip the
tmp file ?

tr '\015\000abcdef0123456789' QN0 <"$2" |
sed -e "s/*/$ZERO_OID/"  >"$exp" &&


Yes, we will loose the exit status of 'tr', I think.
How important is the exit status ?

I don't know, hopefully someone with more experience/knowledge
about shell scripting can help me out here.

Re: Automatic core.autocrlf?

2018-08-31 Thread Torsten Bögershausen

On Fri, Aug 31, 2018 at 07:57:04AM -0400, Randall S. Becker wrote:
> 
> On August 30, 2018 11:29 PM, Jonathan Nieder wrote:
> > Torsten Bögershausen wrote:
> > 
> > > My original plan was to try to "make obsolete"/retire and phase out
> > > core.autocrlf completely.
> > > However, since e.g. egit/jgit uses it
> > > (they don't have support for .gitattributes at all) I am not sure if
> > > this is a good idea either.
> > 
> > Interesting.  [1] appears to have intended to add this feature.
> > Do you remember when it is that you tested?
> > 
> > Feel free to file bugs using [2] for any missing features.
> > 
> > [1] https://git.eclipse.org/r/c/60635
> > [2] https://www.eclipse.org/jgit/support/

Oh, (I thought that I digged into the source recently...)
Thanks for the clarification

> 
> My testing was done on EGit 5.0.1 yesterday, which does not provide a default 
> to core.autocrlf.
> 
> Cheers,
> Randall
> 

OK, then I assume that jgit supports .gitattributes now.
Just to ask a possibly stupid question: did anybody test it ?

Re: Automatic core.autocrlf?

2018-08-30 Thread Torsten Bögershausen

On Thu, Aug 30, 2018 at 09:57:52AM -0500, Robert Dailey wrote:
> On Wed, Aug 29, 2018 at 11:54 PM Jonathan Nieder  wrote:
> >
> > Hi,
> >
> > Robert Dailey wrote:
> >
> > > Is there an 'auto' setting for the 'core.autocrlf' config? Reason I
> > > ask is, I want that setting to be 'input' on linux but 'true' on
> > > Windows.
> >
> > Others are exploring your question about the configuration language,
> > but I want to emphasize some other ramifications.
> >
> > Why do we still have 'core.autocrlf'?  Do 'core.eol' and related
> > settings take care of that need, or is autocrlf still needed?  If
> > core.eol etc do not take care of this need, what should we do to get
> > them to?
> >
> > Thanks, after having run into a few too many autocrlf-related messes,
> > Jonathan
> 
> From my perspective, the confusion is due to the evolution of the
> feature. There's multiple ways to control EOL handling but most of it
> is legacy/backward compatibility, I think. core.autocrlf is a
> fall-back for repos that do not have a .gitattributes. Because
> .gitattributes is optional by design, I'm not sure if getting rid of
> the config options is a good idea.

Good summary. My original plan was to try to "make obsolete"/retire
and phase out core.autocrlf completely.
However, since e.g. egit/jgit uses it
(they don't have support for .gitattributes at all) I am not sure if this
is a good idea either. Opinions are welcome.

> But your point did make me think
> about how `core.autocrlf = true` should probably be a system config
> default for the Git for Windows project. The default for that value
> should be platform-defined. That would make it automatically work the
> way I want, and might solve a lot of the issues where people are
> committing CRLF into repositories on Windows.

Unless I am wrong, that had been the default a long time ago:
Git for Windows (at that time msysgit) had core.autocrlf=true
by default.
While this is a good choice for many repos, some people prefer
core.autocrlf=input.
Others just commit files for Windows-based repos with CRLF,
and the advantage is, that "git diff" doesn't show "^M" somewhere.

I allways encourage people to set up a .gitattributes file.
Does anybody thinks that we can make core.autocrlf obsolete ?

Re: Automatic core.autocrlf?

2018-08-27 Thread Torsten Bögershausen

On Mon, Aug 27, 2018 at 09:10:33AM -0500, Robert Dailey wrote:
> Is there an 'auto' setting for the 'core.autocrlf' config? Reason I
> ask is, I want that setting to be 'input' on linux but 'true' on
> Windows. I have a global .gitconfig that I sync across different
> platforms with Google Drive, and I hate to manage 2 copies of it on
> each platform (linux and Windows). If there's some way to make the
> line ending configuration the same between both, that would be
> awesome.
> 
> Note that I do rely mostly on git attributes files for this, however
> not all upstream repositories have one, and I don't always have
> permission to create one there.

How many repos do you look at, and how big is the "pain level" here?
Would it be possible to submit pull-requests ?

> In those cases, when it falls back to
> configuration for line ending management, I want it to be
> automatically configured based on the host platform.

There is
git config core.eol native

But that is already the default, and it needs the 'text' attribute
to be set (or auto), the simplest solution is to have
* text=auto
in .gitattributes

> 
> Any advice is appreciated here. Unfortunately Google isn't much help
> on this topic, Stack Overflow is a swamp full of different information
> and none of it seems authoritative.


Out of interest, not to blame anybody:
Did you ever look at

https://git-scm.com/docs/git-config

The authoritative answer is in the Git project itself,
under Documentation/config.txt

In summary, if I understand you right,
the solution would be to use .gitattributes in every project.

HTH

Re: [PATCH v2 05/11] t0027: make hash size independent'

2018-08-20 Thread Torsten Bögershausen

On 20.08.18 00:10, Eric Sunshine wrote:
> On Sun, Aug 19, 2018 at 5:57 PM brian m. carlson
>  wrote:
>> On Sun, Aug 19, 2018 at 04:10:21PM -0400, Eric Sunshine wrote:
>>> On Sun, Aug 19, 2018 at 1:54 PM brian m. carlson
 -   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
 +   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp+" &&
>>>
>>> My immediate thought upon reading this was whether "+" is valid in
>>> Windows filenames. Apparently, it is, but perhaps (if you re-roll) it
>>> would make sense to use a character less likely to cause brain
>>> hiccups; for instance, "exp0'.
>>
>> The reason I picked that is that we use it quite a bit in the Makefile,
>> so it seemed like an obvious choice for a temporary file name.  If you
>> feel strongly I can pick something else, but I thought it would be
>> reasonably intuitive for other developers.  Maybe I was wrong, though.
> 
> I don't feel strongly about it. My brain tripped over it probably
> because it's not an idiom in Git tests. In fact, I see just one place
> where "+" has been used like this, in t6026.
> 

Probably "tmp" is a better name than "exp+" 
(Why the '+' ? Is it better that the non-'+' ?)

Anyway,
If we re-order a little bit, can we use a simple '|' instead ?

tr '\015\000abcdef0123456789' QN0 <"$2" |
sed -e "s/+/$ZERO_OID/"   >"$exp" &&
tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Torsten Bögershausen

On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote:

The whole patch looks good to me.
(I was just sending a different version, but your version is better :-)

One minor remark, should the line
warning: the following paths have collided 
start with a capital letter:
Warning: the following paths have collided 

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> In the case that we can't rely on filesystem (via inode number) to do
> this check, fall back to fspathcmp() which is not perfect but should
> not give false positives.
> 
> This patch is tested with vim-colorschemes and Sublime-Gitignore
> repositories on a JFS partition with case insensitive support on
> Linux.

Now even tested under Mac OS/HFS+

[]
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '

My ambition is to run the test under Windows (both CYGWIN and native) next week,
so that we can remove !MINGW and !CYGWIN

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-16 Thread Torsten Bögershausen

On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen  writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> > if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.

Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +  if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +  dup->ce_flags |= CE_MATCHED;
> >> +  break;
> >> +  }
> >> +  }
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
>From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;

for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;

if (dup == ce)
break;

if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
/*
 * Windows sets ino to 0. On other FS ino = 0 will already be
 *  used, so we don't see it for a file in a Git working tree
 */
if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;

/*
 * Fallback for NTFS and other case insenstive FS,
 * which don't use POSIX inums
 */
if (!fspathcmp(dup->name, ce->name))
folded = 1;

if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for th

Re: "Changes not staged for commit" after cloning a repo on macOS

2018-08-16 Thread Torsten Bögershausen

On Thu, Aug 16, 2018 at 12:25:24AM +0430, Hadi Safari wrote:
> Hi everyone!
> 
> I encountered a strange situation on OS X recently. I cloned a repository
> (https://github.com/kevinxucs/Sublime-Gitignore.git), went to folder, and
> saw "Changes not staged for commit" message for four specific files. It
> happened every time I repeated the procedure. I even was able to commit
> those to the repo.
> At first I thought there's something wrong with repo, but I cloned it on
> Ubuntu 16.04 and everything was OK; no "Changes not staged for commit"
> message.
> 
> Does anyone have any idea?
> 
> Thank you.
> 
> Log:
> 
> ```
> $ git clone https://github.com/kevinxucs/Sublime-Gitignore.git
> Cloning into 'Sublime-Gitignore'...
> remote: Counting objects: 515, done.
> remote: Total 515 (delta 0), reused 0 (delta 0), pack-reused 515
> Receiving objects: 100% (515/515), 102.14 KiB | 48.00 KiB/s, done.
> Resolving deltas: 100% (143/143), done.
> $ cd Sublime-Gitignore/
> $ git status
> On branch master
> Your branch is up to date with 'origin/master'.
> 
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   boilerplates/Nanoc.gitignore
> modified:   boilerplates/OpenCart.gitignore
> modified:   boilerplates/SASS.gitignore
> modified:   boilerplates/WordPress.gitignore
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> ```
> 
> The rest of the log is available at
> https://github.com/kevinxucs/Sublime-Gitignore/issues/6. (I don't want this
> email to become too long.)
> 
> -- 
> Hadi Safari
> http://hadisafari.ir/

This repo seams not ment to be cloned onto a file system, which is 
case-insensitive.
For example, (see below), this 2 files a different in the repo, but the file 
system
can not have 'WordPress' and 'Wordpres's as different files or directories at 
the same
time.
This affects typically Mac OS and Windows users.

There is actually some work going on right now to inform the user about this 
problem.
(Thanks Duy !)
If I clone it with a patched Git, I get the following:

git clone https://github.com/kevinxucs/Sublime-Gitignore.git
Cloning into 'Sublime-Gitignore'...
remote: Counting objects: 515, done.
remote: Total 515 (delta 0), reused 0 (delta 0), pack-reused 515
Receiving objects: 100% (515/515), 102.16 KiB | 35.00 KiB/s, done.
Resolving deltas: 100% (143/143), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'boilerplates/Gcov.gitignore'
  'boilerplates/Nanoc.gitignore'
  'boilerplates/OpenCart.gitignore'
  'boilerplates/SASS.gitignore'
  'boilerplates/Sass.gitignore'
  'boilerplates/Stella.gitignore'
  'boilerplates/WordPress.gitignore'
  'boilerplates/Wordpress.gitignore'
  'boilerplates/gcov.gitignore'
  'boilerplates/nanoc.gitignore'
  'boilerplates/opencart.gitignore'
  'boilerplates/stella.gitignore'

Would this text help you ?

I am asking because the development is still ongoing, so things can be improved.

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-15 Thread Torsten Bögershausen

On Sun, Aug 12, 2018 at 11:07:14AM +0200, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> This patch is tested with vim-colorschemes repository on a JFS partition
> with case insensitive support on Linux. This repository has two files
> darkBlue.vim and darkblue.vim.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  v4 removes nr_duplicates (and fixes that false warning Szeder
>  reported). It also hints about case insensitivity as a cause of
>  problem because it's most likely the case when this warning shows up.
> 
>  builtin/clone.c  |  1 +
>  cache.h  |  1 +
>  entry.c  | 28 
>  t/t5601-clone.sh |  8 +++-
>  unpack-trees.c   | 28 
>  unpack-trees.h   |  1 +
>  6 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..0702b0e9d0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
>   memset(, 0, sizeof opts);
>   opts.update = 1;
>   opts.merge = 1;
> + opts.clone = 1;
>   opts.fn = oneway_merge;
>   opts.verbose_update = (option_verbosity >= 0);
>   opts.src_index = _index;
> diff --git a/cache.h b/cache.h
> index 8b447652a7..6d6138f4f1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1455,6 +1455,7 @@ struct checkout {
>   unsigned force:1,
>quiet:1,
>not_new:1,
> +  clone:1,
>refresh_cache:1;
>  };
>  #define CHECKOUT_INIT { NULL, "" }
> diff --git a/entry.c b/entry.c
> index b5d1d3cf23..c70340df8e 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
> stat *st, int skiplen)
>   return lstat(path, st);
>  }
>  
> +static void mark_colliding_entries(const struct checkout *state,
> +struct cache_entry *ce, struct stat *st)
> +{
> + int i;
> +
> + ce->ce_flags |= CE_MATCHED;
> +
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
> + for (i = 0; i < state->istate->cache_nr; i++) {
> + struct cache_entry *dup = state->istate->cache[i];
> +
> + if (dup == ce)
> + break;
> +
> + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> + continue;
> +

Should the following be protected by core.checkstat ? 
if (check_stat) {


> + if (dup->ce_stat_data.sd_ino == st->st_ino) {
> + dup->ce_flags |= CE_MATCHED;
> + break;
> + }
> + }
> +#endif

Another thing is that we switch of the ASCII case-folding-detection-logic
off for Windows users, even if we otherwise rely on icase.
I think we can use fspathcmp() as a fallback. when inodes fail,
because we may be on a network file system.
(I don't have a test setup at the moment, but what happens with inodes
when a Windows machine exports a share to Linux or Mac ?)

Is there a chance to get the fspathcmp() back, like this ?

static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;

for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;

if (dup == ce)
break;

if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;

if (!fspathcmp(dup->name, ce->name))
folded = 1;

#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
if (check_stat && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;
#endif
if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}

Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems

2018-08-03 Thread Torsten Bögershausen

On 2018-08-02 15:43, Duy Nguyen wrote:
> On Wed, Aug 1, 2018 at 11:20 PM Junio C Hamano  wrote:
>>
>> Junio C Hamano  writes:
>>
>>> Jeff King  writes:
>>>
> Presumably we are already in an error codepath, so if it is
> absolutely necessary, then we can issue a lstat() to grab the inum
> for the path we are about to create, iterate over the previously
> checked out paths issuing lstat() and see which one yields the same
> inum, to find the one who is the culprit.

 Yes, this is the cleverness I was missing in my earlier response.

 So it seems do-able, and I like that this incurs no cost in the
 non-error case.
>>>
>>> Not so fast, unfortunately.
>>>
>>> I suspect that some filesystems do not give us inum that we can use
>>> for that "identity" purpose, and they tend to be the ones with the
>>> case smashing characteristics where we need this code in the error
>>> path the most X-<.
>>
>> But even if inum is unreliable, we should be able to use other
>> clues, perhaps the same set of fields we use for cached stat
>> matching optimization we use for "diff" plumbing commands, to
>> implement the error report.  The more important part of the idea is
>> that we already need to notice that we need to remove a path that is
>> in the working tree while doing the checkout, so the alternative
>> approach won't incur any extra cost for normal cases where the
>> project being checked out does not have two files whose pathnames
>> are only different in case (or checking out such an offending
>> project to a case sensitive filesytem, of course).
>>
>> So I guess it still _is_ workable.  Any takers?
> 
> OK so we're going back to the original way of checking that we check
> out the different files on the same place (because fs is icase) and
> try to collect all paths for reporting, yes?

I would say: Yes.

> I can give it another go
> (but of course if anybody else steps up, I'd very gladly hand this
> over)
> 

Not at the moment.

Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems

2018-07-31 Thread Torsten Bögershausen

On Mon, Jul 30, 2018 at 05:27:55PM +0200, Nguyễn Thái Ngọc Duy wrote:
> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what's exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. I have not suggested any way to work around or fix this
> problem. But I guess we could probably have a section in
> Documentation/ dedicated to this problem and point there instead of
> a long advice in this warning.
> 
> Another thing we probably should do is catch in "git checkout" too,
> not just "git clone" since your linux/unix colleage colleague may
> accidentally add some files that your mac/windows machine is not very
> happy with. But then there's another problem, once the problem is
> known, we probably should stop spamming this warning at every
> checkout, but how?
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/clone.c | 41 +
>  1 file changed, 41 insertions(+)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5c439f1394..32738c2737 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -711,6 +711,33 @@ static void update_head(const struct ref *our, const 
> struct ref *remote,
>   }
>  }
>  
> +static void find_duplicate_icase_entries(struct index_state *istate,
> +  struct string_list *dup)
> +{
> + struct string_list list = STRING_LIST_INIT_NODUP;
> + int i;
> +
> + for (i = 0; i < istate->cache_nr; i++)
> + string_list_append(, istate->cache[i]->name);
> +
> + list.cmp = fspathcmp;
> + string_list_sort();
> +
> + for (i = 1; i < list.nr; i++) {
> + const char *cur = list.items[i].string;
> + const char *prev = list.items[i - 1].string;
> +
> + if (dup->nr &&
> + !fspathcmp(cur, dup->items[dup->nr - 1].string)) {
> + string_list_append(dup, cur);
> + } else if (!fspathcmp(cur, prev)) {
> + string_list_append(dup, prev);
> + string_list_append(dup, cur);
> + }
> + }
> + string_list_clear(, 0);
> +}
> +
>  static int checkout(int submodule_progress)
>  {
>   struct object_id oid;
> @@ -761,6 +788,20 @@ static int checkout(int submodule_progress)
>   if (write_locked_index(_index, _file, COMMIT_LOCK))
>   die(_("unable to write new index file"));
>  
> + if (ignore_case) {
> + struct string_list dup = STRING_LIST_INIT_DUP;
> + int i;
> +
> + find_duplicate_icase_entries(_index, );
> + if (dup.nr) {
> + warning(_("the following paths in this repository only 
> differ in case and will\n"
> +   "cause problems because you have cloned it on 
> an case-insensitive filesytem:\n"));

Thanks for the patch.
I wonder if we can tell the users more about the "problems"
and how to avoid them, or to live with them.

This is more loud thinking:

"The following paths only differ in case\n"
"One a case-insensitive file system only one at a time can be present\n"
"You may rename one like this:\n"
"git checkout  && git mv  .1\n"

> + fprintf(stderr, "\t%s\n", dup.items[i].string);

Another question:
Do we need any quote_path() here ?
(This may be overkill, since typically the repos with conflicting names
only use ASCII.)

> + }
> + string_list_clear(, 0);
> + }
> +
>   err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  oid_to_hex(), "1", NULL);
>  
> -- 
> 2.18.0.656.gda699b98b3
>

Re: [PATCH 5/6] pass st.st_size as hint for strbuf_readlink()

2018-07-25 Thread Torsten Bögershausen

On Tue, Jul 24, 2018 at 06:51:39AM -0400, Jeff King wrote:
> When we initially added the strbuf_readlink() function in
> b11b7e13f4 (Add generic 'strbuf_readlink()' helper function,
> 2008-12-17), the point was that we generally have a _guess_
> as to the correct size based on the stat information, but we
> can't necessarily trust it.
> 
> Over the years, a few callers have grown up that simply pass
> in 0, even though they have the stat information. Let's have
> them pass in their hint for consistency (and in theory
> efficiency, since it may avoid an extra resize/syscall loop,
> but neither location is probably performance critical).
> 
> Note that st.st_size is actually an off_t, so in theory we
> need xsize_t() here. But none of the other callsites use it,
> and since this is just a hint, it doesn't matter either way
> (if we wrap we'll simply start with a too-small hint and
> then eventually complain when we cannot allocate the
> memory).

Thanks a lot for the series.

 For the last paragraph I would actually vote the other way around -
 how about something like this ?
 Note that st.st_size is actually an off_t, so we should use
 xsize_t() here. In pratise we don't expect links to be large like that,
 but let's give a good example in the source code and use xsize_t()
 whenever an off_t is converted into size_t.
 This will make live easier whenever someones diggs into 32/64 bit
 "conversion safetyness"

Re: [PATCH 0/1] t7406: avoid failures solely due to timing issues

2018-07-24 Thread Torsten Bögershausen

On Mon, Jul 23, 2018 at 12:10:24PM -0700, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > This fixes a regression test that produces false positives occasionally: 
> > https://git-for-windows.visualstudio.com/git/_build/results?buildId=14035=logs
> >
> 
> [jc: forwrding to Torsten for <208b2ede-4833-f062-16f2-f35b8a8ce...@web.de>]

Yes,
thanks for a "fantastic fast fix" and the integration.

I downloaded and tested it OK (under MacOS) this morning.

t7406-submodule-update shaky ?

2018-07-22 Thread Torsten Bögershausen

It seems that t7406 is sometimes shaky - when checking stderr in test 
case 4.


The order of the submodules may vary, sorting the stderr output makes it

more reliable (and somewhat funny to read).

Does anybody have a better idea ?


[]

cat ../../actual 
2>../../actual2U &&

     sort <../../actual2U >../../actual2
    ) &&
    test_i18ncmp expect actual &&
    test_i18ncmp expect2 actual2
'

Re: [PATCH v4] Documentation: declare "core.ignoreCase" as internal variable

2018-06-28 Thread Torsten Bögershausen

On 28.06.18 13:21, Marc Strapetz wrote:
> The current description of "core.ignoreCase" reads like an option which
> is intended to be changed by the user while it's actually expected to
> be set by Git on initialization only. Subsequently, Git relies on the
> proper configuration of this variable, as noted by Bryan Turner [1]:
> 
> Git on a case-insensitive filesystem (APFS, HFS+, FAT32, exFAT,
> vFAT, NTFS, etc.) is not designed to be run with anything other
> than core.ignoreCase=true.
> 
> [1] https://marc.info/?l=git=152998665813997=2
> mid:cagyf7-gee8jrgpkme9rhkptheq6p1+ebpmmwatmh01uo3bf...@mail.gmail.com
> 
> Signed-off-by: Marc Strapetz 
> ---
>  Documentation/config.txt | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828..c70cfe956 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -390,16 +390,19 @@ core.hideDotFiles::
>   default mode is 'dotGitOnly'.
>  
>  core.ignoreCase::
> - If true, this option enables various workarounds to enable
> + Internal variable which enables various workarounds to enable
>   Git to work better on filesystems that are not case sensitive,
> - like FAT. For example, if a directory listing finds
> - "makefile" when Git expects "Makefile", Git will assume
> + like APFS, HFS+, FAT, NTFS, etc. For example, if a directory listing
> + finds "makefile" when Git expects "Makefile", Git will assume
>   it is really the same file, and continue to remember it as
>   "Makefile".
>  +
>  The default is false, except linkgit:git-clone[1] or linkgit:git-init[1]
>  will probe and set core.ignoreCase true if appropriate when the repository
>  is created.
> ++
> +Git relies on the proper configuration of this variable for your operating
> +and file system. Modifying this value may result in unexpected behavior.
>  
>  core.precomposeUnicode::
>   This option is only used by Mac OS implementation of Git.
> 

Looks good to me

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread Torsten Bögershausen

On 27.06.18 09:54, Steve Groeger wrote:
> Hi, 
> 
> Sorry for incomplete post earlier. Here is the full post:
> 
> 
> In the latest version of git a new attribute has been added, 
> working-tree-encoding. The release notes states: 
> 
> 'The new "working-tree-encoding" attribute can ask Git to convert the
>contents to the specified encoding when checking out to the working
>tree (and the other way around when checking in).'
>  We have been using this attribute on our z/OS systems using a version of git 
> from Rocket software to convert files to EBCDIC for quite a while now. On 
> other platforms (Linux, AIX etc) git ignored this attribute and therefore 
> left the files in ASCII.
> 
> We have common code that is supposed to be usable across different platforms 
> and hence different file encodings. With the full support of the 
> working-tree-encoding in the latest version of git on all platforms, how do 
> we have files converted to different encodings on different platforms?
> I could not find anything that would allow us to say 'if platform = z/OS then 
> encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?
>  
>  
>   
>  
> Thanks
>  Steve Groeger
[]

Did you consider to put a gitattributes file on machine level ?

https://git-scm.com/docs/gitattributes

[snipped the other places where to put gitattributes]
...
Attributes for all users on a system should be placed in the 
$(prefix)/etc/gitattributes file.
















>  Java Runtimes Development
>  IBM Hursley
>  IBM United Kingdom Ltd
>  Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
>  Fax (44) 1962 816800
>  Lotus Notes: Steve Groeger/UK/IBM
>  Internet: groe...@uk.ibm.com  
>
>  
> Unless stated otherwise above:
>  IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598.
>  Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU   
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Re: [PATCH v2] Documentation: declare "core.ignorecase" as internal variable

2018-06-24 Thread Torsten Bögershausen

On Sun, Jun 24, 2018 at 12:44:26PM +0200, Marc Strapetz wrote:
> The current description of "core.ignoreCase" reads like an option which
> is intended to be changed by the user while it's actually expected to
> be set by Git on initialization only. This is especially important for
> Git for Windows, as noted by Bryan Turner [1]:
> 
> Git on Windows is not designed to run with anything other than
> core.ignoreCase=true, and attempting to do so will cause
> unexpected behavior. In other words, it's not a behavior toggle so
> user's can request the functionality to work one way or the other;
> it's an implementation detail that `git init` and `git clone` set
> when a repository is created purely so they don't have to probe
> the file system each time you run a `git` command.

This is a nice explanation, thanaks for that,
Some users of Mac OS or SAMBA will see core.ignoreCase=true, and are not
supposed to change it.

The same explanation (Git for Windows)
is alse valid for HFS+ and APFS under Mac OS and VFAT under all OS.
(or even an ext4 file system under Linux exported to Mac OS using SAMBA)

May be something like this?

 Git on a case insensitve file system (Windows, Mac OS, VFAT, SAMBA)
 is not designed to run with anything other than
 core.ignoreCase=true, and attempting to do so will cause
 unexpected behavior. In other words, it's not a behavior toggle so
.



> 
> [1] https://marc.info/?l=git=152972992729761=2
> 
> Signed-off-by: Marc Strapetz 
> ---
>  Documentation/config.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ab641bf5a..c25693828 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -390,7 +390,7 @@ core.hideDotFiles::
>   default mode is 'dotGitOnly'.
> 
>  core.ignoreCase::
> - If true, this option enables various workarounds to enable
> + Internal variable which enables various workarounds to enable
>   Git to work better on filesystems that are not case sensitive,
>   like FAT. For example, if a directory listing finds
>   "makefile" when Git expects "Makefile", Git will assume
> @@ -399,7 +399,7 @@ core.ignoreCase::
>  +
>  The default is false, except linkgit:git-clone[1] or linkgit:git-init[1]
>  will probe and set core.ignoreCase true if appropriate when the repository
> -is created.
> +is created. Modifying this value afterwards may result in unexpected
> behavior.
> 
>  core.precomposeUnicode::
>   This option is only used by Mac OS implementation of Git.
> -- 
> 2.17.0.rc0.3.gb1b5a51b2

Re: t5562: gzip -k is not portable

2018-06-20 Thread Torsten Bögershausen

On Wed, Jun 20, 2018 at 08:40:06AM -0400, Jeff King wrote:
> On Wed, Jun 20, 2018 at 08:13:06AM +0200, Torsten Bögershausen wrote:
> 
> > Good eyes, thanks.
> > The "-f -c" combo works:
> > 
> > -   gzip -k fetch_body &&
> > +   gzip -f -c fetch_body >fetch_body.gz &&
> > test_copy_bytes 10 fetch_body.gz.trunc &&
> > -   gzip -k push_body &&
> > +   gzip -f -c push_body >push_body.gz &&
> 
> Do we still need "-f"?  With "-c" gzip is only writing to stdout, so we
> should not need to force anything.
> 
> -Peff

You are rigth, -f is not needed.

I was confusing stdout with terminal :-( 
Most often stdout is the terminal, but not here of course.

Re: t5562: gzip -k is not portable

2018-06-20 Thread Torsten Bögershausen

On Tue, Jun 19, 2018 at 04:53:10PM -0400, Jeff King wrote:
> On Tue, Jun 19, 2018 at 10:40:16PM +0200, Torsten Bögershausen wrote:
> 
> > 
> > 
> > On 06/19/2018 08:22 PM, Eric Sunshine wrote:
> > > On Tue, Jun 19, 2018 at 2:06 PM Junio C Hamano  wrote:
> > > > Torsten Bögershausen  writes:
> > > > > t5562 fails here under MacOS:
> > > > > "gzip -k"  is not portable.
> > > Very odd. Stock /usr/bin/gzip on my MacOS 10.12.6 _does_ recognize -k,
> > > and the test does pass.
> > 
> > This is the test box running Mac OS X 10.6 speaking.
> > The -c seems to need even -f.
> > But this doesn't work either:
> > 
> > gzip 1.3.12
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > Copyright (C) 1993 Jean-loup Gailly.
> > This is free software.  You may redistribute copies of it under the terms of
> > the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
> > There is NO WARRANTY, to the extent permitted by law.
> 
> Ah, that's it. "-k" came about in gzip v1.6. That's 5 years old, but
> obviously some people are still running it.
> 
> "-c" goes back quite a while and should be safe, I think.
> 
> > expecting success:
> >     gzip -f -c fetch_body >fetch_body.gz &&
> >     test_copy_bytes 10 fetch_body.gz.trunc &&
> >     gzip -f -c push_body >push_body,gz &&
> >     test_copy_bytes 10 push_body.gz.trunc
> > 
> > ./test-lib.sh: line 632: push_body.gz: No such file or directory
> 
> Typo in the ">" redirect (comma instead of period)?
> 
> -Peff

Good eyes, thanks.
The "-f -c" combo works:

-   gzip -k fetch_body &&
+   gzip -f -c fetch_body >fetch_body.gz &&
test_copy_bytes 10 fetch_body.gz.trunc &&
-   gzip -k push_body &&
+   gzip -f -c push_body >push_body.gz &&

Re: t5562: gzip -k is not portable

2018-06-19 Thread Torsten Bögershausen





On 06/19/2018 08:22 PM, Eric Sunshine wrote:

On Tue, Jun 19, 2018 at 2:06 PM Junio C Hamano  wrote:

Torsten Bögershausen  writes:

t5562 fails here under MacOS:
"gzip -k"  is not portable.

Very odd. Stock /usr/bin/gzip on my MacOS 10.12.6 _does_ recognize -k,
and the test does pass.


This is the test box running Mac OS X 10.6 speaking.
The -c seems to need even -f.
But this doesn't work either:

gzip 1.3.12
Copyright (C) 2007 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Jean-loup Gailly.
prerequisite GZIP ok
expecting success:
    gzip -f -c fetch_body >fetch_body.gz &&
    test_copy_bytes 10 fetch_body.gz.trunc &&
    gzip -f -c push_body >push_body,gz &&
    test_copy_bytes 10 push_body.gz.trunc

./test-lib.sh: line 632: push_body.gz: No such file or directory
not ok 2 - setup, compression related
#
#        gzip -f -c fetch_body >fetch_body.gz &&
#        test_copy_bytes 10 fetch_body.gz.trunc &&
#        gzip -f -c push_body >push_body,gz &&
#        test_copy_bytes 10 push_body.gz.trunc
#



Sigh.  Perhaps -c would help.  Or do BSD implementations also lack -c?

MacOS and BSD do support -c, so this solution would also work (and is
"cleaner" the the other proposal).


diff --git a/t/t5562-http-backend-content-length.sh 
b/t/t5562-http-backend-content-length.sh
@@ -61,9 +61,9 @@ test_expect_success 'setup' '
  test_expect_success GZIP 'setup, compression related' '
-   gzip -k fetch_body &&
+   gzip -c fetch_body >fetch_body.gz &&
 test_copy_bytes 10 fetch_body.gz.trunc &&
-   gzip -k push_body &&
+   gzip -c push_body >push_body.gz &&
 test_copy_bytes 10 push_body.gz.trunc
  '

Re: t5310 broken under Mac OS

2018-06-19 Thread Torsten Bögershausen





On 06/19/2018 07:35 PM, Eric Sunshine wrote:

On Tue, Jun 19, 2018 at 1:33 PM Torsten Bögershausen  wrote:

expecting success:
  git repack -ad &&
  git rev-list --use-bitmap-index --count --all >expect &&
  bitmap=$(ls .git/objects/pack/*.bitmap) &&
  test_when_finished "rm -f $bitmap" &&
  head -c 512 <$bitmap >$bitmap.tmp &&
  mv $bitmap.tmp $bitmap &&
  git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
  test_cmp expect actual &&
  test_i18ngrep corrupt stderr

override r--r--r--  tb/staff for
.git/objects/pack/pack-8886db3fce4f9657c1a43fee7d3ea4f2a4b5be2d.bitmap?
(y/n [n]) not overwritten
error: 'grep corrupt stderr' didn't find a match in:

This is fixed by [1], isn't it?

[1]: https://public-inbox.org/git/20180616191400.gb8...@sigill.intra.peff.net/

Most probably: yes.
I just ran pu if the day (which is 15th of June)
Thanks for a fast response, sorry for the noise

t5310 broken under Mac OS

2018-06-19 Thread Torsten Bögershausen


One test case fails here,
but I am to tired to dig further.


ok 42 - pack reuse respects --incremental

expecting success:
    git repack -ad &&
    git rev-list --use-bitmap-index --count --all >expect &&
    bitmap=$(ls .git/objects/pack/*.bitmap) &&
    test_when_finished "rm -f $bitmap" &&
    head -c 512 <$bitmap >$bitmap.tmp &&
    mv $bitmap.tmp $bitmap &&
    git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
    test_cmp expect actual &&
    test_i18ngrep corrupt stderr

override r--r--r--  tb/staff for 
.git/objects/pack/pack-8886db3fce4f9657c1a43fee7d3ea4f2a4b5be2d.bitmap? 
(y/n [n]) not overwritten

error: 'grep corrupt stderr' didn't find a match in:

not ok 43 - truncated bitmap fails gracefully
#
#        git repack -ad &&
#        git rev-list --use-bitmap-index --count --all >expect &&
#        bitmap=$(ls .git/objects/pack/*.bitmap) &&
#        test_when_finished "rm -f $bitmap" &&
#        head -c 512 <$bitmap >$bitmap.tmp &&
#        mv $bitmap.tmp $bitmap &&
#        git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
#        test_cmp expect actual &&
#        test_i18ngrep corrupt stderr
#

# failed 1 among 43 test(s)
1..43

t5562: gzip -k is not portable

2018-06-19 Thread Torsten Bögershausen


Hej Max,

t5562 fails here under MacOS:
"gzip -k"  is not portable.

The following works (there may be better solutions, I didn't dig into 
the test code)


diff --git a/t/t5562-http-backend-content-length.sh 
b/t/t5562-http-backend-content-length.sh

index 8040d80e04..7befe3885c 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -61,9 +61,13 @@ test_expect_success 'setup' '
 '

 test_expect_success GZIP 'setup, compression related' '
-   gzip -k fetch_body &&
+   cp fetch_body f &&
+   gzip fetch_body &&
+   mv f fetch_body &&
    test_copy_bytes 10 fetch_body.gz.trunc &&
-   gzip -k push_body &&
+   cp push_body f &&
+   gzip push_body &&
+   mv f push_body &&
    test_copy_bytes 10 push_body.gz.trunc

Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-12 Thread Torsten Bögershausen

On Mon, Jun 11, 2018 at 06:46:34PM -0700, Anthony Sottile wrote:
[]
> Anything else for me to do here? (sorry! not super familiar with the process)

Your patch has been picked up by Junio, and is currently merged into the
"pu" branch (proposed updates):

  commit bc8ff8aec33836af3fefe1bcd3f533a1486b793f
  Merge: e69b544a38 6cb09125be
  Author: Junio C Hamano 
  Date:   Tue Jun 12 10:15:13 2018 -0700

  Merge branch 'as/safecrlf-quiet-fix' into jch
  
  Fix for 2.17-era regression.
  
  * as/safecrlf-quiet-fix:
config.c: fix regression for core.safecrlf false

>From there, it will typically progress into next and master,
unless reviewers come with comments and improvements.
You can watch out for "What's cooking in git" messages here on the list
to follow the progress.

>From my experience it will end up in a Git release, but I don't know,
if it will be 2.18 or a later one.

Re: Why is there no force pull?

2018-06-10 Thread Torsten Bögershausen

On Sat, Jun 09, 2018 at 09:01:54PM +0200, Christoph Böhmwalder wrote:
> Hi,
> 
> Since this is a use case that actually comes up quite often in
> day-to-day use, especially among git beginners, I was wondering: is
> there a specific reason why a command like "fetch changes from remote,
> overwriting everything in my current working directory including all
> commits I've made" doesn't exist? Now, I'm quite aware that something
> like
> 
> $ git fetch origin/branch
> $ git reset --hard origin/branch

This is not exactly what you askeded for, but I tend not to recommend
people using "git reset --hard" at all.
Either use a "stash", just in case.

Or, in your case:
$ git fetch origin  && git checkout origin/branch

This will put your working tree onto origin/branch.

As a bonus, in case that you have done commits, which are now no longer
visible, "git reflog" is typically able to find them.

Re: [PATCH] config.c: fix regression for core.safecrlf false

2018-06-06 Thread Torsten Bögershausen

On Mon, Jun 04, 2018 at 01:17:42PM -0700, Anthony Sottile wrote:
> A regression introduced in 8462ff43e42ab67cecd16fdfb59451a53cc8a945 caused
> autocrlf rewrites to produce a warning message despite setting safecrlf=false.
> 
> Signed-off-by: Anthony Sottile 
> ---
>  config.c|  2 +-
>  t/t0020-crlf.sh | 10 ++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/config.c b/config.c
> index fbbf0f8..de24e90 100644
> --- a/config.c
> +++ b/config.c
> @@ -1233,7 +1233,7 @@ static int git_default_core_config(const char *var, 
> const char *value)
>   }
>   eol_rndtrp_die = git_config_bool(var, value);
>   global_conv_flags_eol = eol_rndtrp_die ?
> - CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN;
> + CONV_EOL_RNDTRP_DIE : 0;
>   return 0;
>   }
>  
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> index 71350e0..5f05698 100755
> --- a/t/t0020-crlf.sh
> +++ b/t/t0020-crlf.sh
> @@ -98,6 +98,16 @@ test_expect_success 'safecrlf: git diff demotes 
> safecrlf=true to warn' '
>  '
>  
>  
> +test_expect_success 'safecrlf: no warning with safecrlf=false' '
> + git config core.autocrlf input &&
> + git config core.safecrlf false &&
> +
> + for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
> + git add allcrlf 2>err &&
> + test_must_be_empty err
> +'
> +
> +
>  test_expect_success 'switch off autocrlf, safecrlf, reset HEAD' '
>   git config core.autocrlf false &&
>   git config core.safecrlf false &&
> -- 
> 2.7.4
> 

Looks good to me, thanks for cleaning my mess.

Acked-By: Torsten Bögershausen

Re: [PATCH 04/10] t0027: use $ZERO_OID

2018-06-06 Thread Torsten Bögershausen

On Mon, Jun 04, 2018 at 11:52:23PM +, brian m. carlson wrote:
> Use the ZERO_OID variable to express the all-zeros object ID so that it
> works with hash algorithms of all sizes.
> 
> Signed-off-by: brian m. carlson 
> ---
>  t/t0027-auto-crlf.sh | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
> index beb5927f77..14fcd3f49f 100755
> --- a/t/t0027-auto-crlf.sh
> +++ b/t/t0027-auto-crlf.sh
> @@ -371,13 +371,13 @@ test_expect_success 'setup master' '
>   git checkout -b master &&
>   git add .gitattributes &&
>   git commit -m "add .gitattributes" . &&
> - printf "\$Id:  
> \$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
> - printf "\$Id:  
> \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
> - printf "\$Id:  
> \$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&
> - printf "\$Id:  
> \$\nLINEONE\nLINETWO\rLINETHREE" >LF_mix_CR &&
> - printf "\$Id:  
> \$\r\nLINEONE\r\nLINETWO\rLINETHREE"   >CRLF_mix_CR &&
> - printf "\$Id:  
> \$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul &&
> - printf "\$Id:  
> \$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul &&
> + printf "\$Id: $ZERO_OID \$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
> + printf "\$Id: $ZERO_OID \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
> + printf "\$Id: $ZERO_OID \$\nLINEONE\r\nLINETWO\nLINETHREE"   
> >CRLF_mix_LF &&
> + printf "\$Id: $ZERO_OID \$\nLINEONE\nLINETWO\rLINETHREE" >LF_mix_CR 
> &&
> + printf "\$Id: $ZERO_OID \$\r\nLINEONE\r\nLINETWO\rLINETHREE"   
> >CRLF_mix_CR &&
> + printf "\$Id: $ZERO_OID \$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | 
> q_to_nul >CRLF_nul &&
> + printf "\$Id: $ZERO_OID \$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul 
> >LF_nul &&
>   create_NNO_MIX_files &&
>   git -c core.autocrlf=false add NNO_*.txt MIX_*.txt &&
>   git commit -m "mixed line endings" &&

Nothing wrong with the patch.
There is, however, a trick in t0027 to transform the different IDs back to a 
bunch of '0'.
The content of the file use only uppercase letters, and all lowercase ad digits
are converted like this:

compare_ws_file () {
pfx=$1
exp=$2.expect
act=$pfx.actual.$3
tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
test_cmp "$exp" "$act" &&
rm "$exp" "$act"
}

In the long term the 'tr' may need an additional 'sed' expression.

Re: [PATCH 01/10] t: add tool to translate hash-related values

2018-06-06 Thread Torsten Bögershausen

Some style nite inline

On Mon, Jun 04, 2018 at 11:52:20PM +, brian m. carlson wrote:
> Add a test function helper, test_translate, that will produce its first
> argument if the hash in use is SHA-1 and the second if its argument is
> NewHash.  Implement a mode that can read entries from a file as well for
> reusability across tests.
> 
> For the moment, use the length of the empty blob to determine the hash
> in use.  In the future, we can change this code so that it can use the
> configuration and learn about the difference in input, output, and
> on-disk formats.
> 
> Implement two basic lookup charts, one for common invalid or synthesized
> object IDs, and one for various facts about the hash function in use.
> 
> Signed-off-by: brian m. carlson 
> ---
>  t/test-lib-functions.sh | 40 
>  t/translate/hash-info   |  9 +
>  t/translate/oid | 15 +++
>  3 files changed, 64 insertions(+)
>  create mode 100644 t/translate/hash-info
>  create mode 100644 t/translate/oid
> 
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 2b2181dca0..0e7067460b 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1147,3 +1147,43 @@ depacketize () {
>   }
>   '
>  }
> +
> +test_translate_f_ () {
> + local file="$TEST_DIRECTORY/translate/$2" &&

Unless I'm wrong, we don't use the "local" keyword ?

> + perl -e '

The bare "perl" is better spelled as "$PERL_PATH"


> + $delim = "\t";
> + ($hoidlen, $file, $arg) = @ARGV;
> + open($fh, "<", $file) or die "open: $!";
> + while (<$fh>) {
> + # Allow specifying other delimiters.
> + $delim = $1 if /^#!\sdelimiter\s(.)/;
> + next if /^#/;
> + @fields = split /$delim/, $_, 3;
> + if ($fields[0] eq $arg) {
> + print($hoidlen == 40 ? $fields[1] : $fields[2]);
> + last;
> + }
> + }
> + ' "$1" "$file" "$3"
> +}
> +
> +# Without -f, print the first argument if we are using SHA-1 and the second 
> if
> +# we're using NewHash.
> +# With -f FILE ARG, read the (by default) tab-delimited file from
> +# t/translate/FILE, finding the first field matching ARG and printing either 
> the
> +# second or third field depending on the hash in use.
> +test_translate () {
> + local hoidlen=$(printf "%s" "$EMPTY_BLOB" | wc -c) &&
> + if [ "$1" = "-f" ]

Style nit, please avoid [] and use test:
if test "$1" = "-f"

And more [] below

> + then
> + shift &&
> + test_translate_f_ "$hoidlen" "$@"
> + else
> + if [ "$hoidlen" -eq 40 ]
> + then
> + printf "%s" "$1"
> + else
> + printf "%s" "$2"
> + fi
> + fi
> +}
> diff --git a/t/translate/hash-info b/t/translate/hash-info
> new file mode 100644
> index 00..36cbd9a8eb
> --- /dev/null
> +++ b/t/translate/hash-info
> @@ -0,0 +1,9 @@
> +# Various facts about the hash algorithm in use for easy access in tests.
> +#
> +# Several aliases are provided for easy recall.
> +rawsz20  32
> +oidlen   20  32
> +hexsz40  64
> +hexoidlen40  64
> +zero 
> 
> +zero-oid 
> 
> diff --git a/t/translate/oid b/t/translate/oid
> new file mode 100644
> index 00..8de0fd64af
> --- /dev/null
> +++ b/t/translate/oid
> @@ -0,0 +1,15 @@
> +# These are some common invalid and partial object IDs used in tests.
> +001  0001
> 0001
> +002  0002
> 0002
> +003  0003
> 0003
> +004  0004
> 0004
> +005  0005
> 0005
> +006  0006
> 0006
> +007  0007
> 0007
> +# All zeros or Fs missing one or two hex segments.
> +zero-1   000 
> 000
> +zero-2   00  
>

Re: Regression (?) in core.safecrlf=false messaging

2018-06-04 Thread Torsten Bögershausen

On 04.06.18 17:43, Anthony Sottile wrote:
> On Mon, Jun 4, 2018 at 12:55 AM, Torsten Bögershausen  wrote:
>>
>> Does the following patch fix your problem ?
>>
>> diff --git a/config.c b/config.c
>> index 6f8f1d8c11..c625aec96a 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -1233,7 +1233,7 @@ static int git_default_core_config(const char *var, 
>> const char *value)
>> }
>> eol_rndtrp_die = git_config_bool(var, value);
>> global_conv_flags_eol = eol_rndtrp_die ?
>> -   CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN;
>> +   CONV_EOL_RNDTRP_DIE : 0;
>> return 0;
>> }
>>
> 
> 
> Yes!  After applying that patch:
> 
> ```
> 
> $ PATH=$PWD/prefix/bin:$PATH bash test.sh
> + git --version
> git version 2.18.0.rc1.dirty
> + rm -rf repo
> + git init repo
> Initialized empty Git repository in /tmp/git/repo/.git/
> + cd repo
> + git config core.autocrlf input
> + git config core.safecrlf false
> + echo -en 'foo\r\nbar\r\n'
> + git add f
> 
> ```
> 
> Anthony
> 
Good.
Do you want to send the patch to the list ?
(You don't have too, if you don't want,
but as you already did all the work...)

Re: Regression (?) in core.safecrlf=false messaging

2018-06-04 Thread Torsten Bögershausen



Does the following patch fix your problem ?

diff --git a/config.c b/config.c
index 6f8f1d8c11..c625aec96a 100644
--- a/config.c
+++ b/config.c
@@ -1233,7 +1233,7 @@ static int git_default_core_config(const char *var, const 
char *value)
}
eol_rndtrp_die = git_config_bool(var, value);
global_conv_flags_eol = eol_rndtrp_die ?
-   CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN;
+   CONV_EOL_RNDTRP_DIE : 0;
return 0;
}

Re: Regression (?) in core.safecrlf=false messaging

2018-06-04 Thread Torsten Bögershausen

(as usual, no top-posting here, please see my answers at the end)

On Sun, Jun 03, 2018 at 10:54:00PM -0700, Anthony Sottile wrote:
> I'm a bit unclear if I was depending on undocumented behaviour or not
> here but it seems the messaging has recently changed with respect to
> `core.safecrlf`
> 
> My reading of the documentation
> https://git-scm.com/docs/git-config#git-config-coresafecrlf (I might
> be wrong here)
> 
> - core.safecrlf = true -> fail hard when converting
> - core.safecrlf = warn -> produce message when converting
> - core.safecrlf = false -> convert silently
> 
> (note that these are all only relevant when `core.autocrlf = true`)
> 
> I've created a small script to demonstrate:
> 
> ```
> set -euxo pipefail
> 
> git --version
> 
> rm -rf repo
> git init repo
> cd repo
> git config core.autocrlf input
> git config core.safecrlf false
> echo -en 'foo\r\nbar\r\n' > f
> git add f
> ```
> 
> When run against 2.16.4:
> 
> ```
> $ PATH=$PWD/prefix/bin:$PATH bash test.sh
> + git --version
> git version 2.16.4
> + rm -rf repo
> + git init repo
> Initialized empty Git repository in /tmp/git/repo/.git/
> + cd repo
> + git config core.autocrlf input
> + git config core.safecrlf false
> + echo -en 'foo\r\nbar\r\n'
> + git add f
> ```
> 
> (notice how there are no messages about crlf conversion while adding
> -- this is what I expect given I have core.safecrlf=false)
> 
> 
> When run against master:
> 
> ```console
> $ PATH=$PWD/prefix/bin:$PATH bash test.sh
> + git --version
> git version 2.18.0.rc0.42.gc2c7d17
> + rm -rf repo
> + git init repo
> Initialized empty Git repository in /tmp/git/repo/.git/
> + cd repo
> + git config core.autocrlf input
> + git config core.safecrlf false
> + echo -en 'foo\r\nbar\r\n'
> + git add f
> warning: CRLF will be replaced by LF in f.
> The file will have its original line endings in your working directory.
> ```
> 
> A `git bisect` shows this as the first commit which breaks this
> behaviour: 8462ff43e42ab67cecd16fdfb59451a53cc8a945
> 
> https://github.com/git/git/commit/8462ff43e42ab67cecd16fdfb59451a53cc8a945
> 
> The commit appears to be a refactor (?) that shouldn't have changed behaviour?
> 
> Here's the script I was using to bisect:
> https://i.fluffy.cc/2L4ZtqV3cHfzNRkKPbHgTcz43HMQJxdK.html
> 
> ```
> git bisect start
> git bisect bad v2.17.0
> git bisect good v2.16.4
> git bisect run ./test.sh
> ```
> 
> Noticed this due to test failures here:
> https://github.com/pre-commit/pre-commit/pull/762#issuecomment-394236625
> 
> Thanks,
> 
> Anthony

Thanks so much for the report, and the detailed analyzes that has been done.
Good work, I would say.

This looks very much as an (unplanned) regression.
I will have a look within the next days, as soon as my time allows it.

Re: git diff: meaning of ^M at line ends ?

2018-05-18 Thread Torsten Bögershausen

On 15.05.18 20:04, Frank Schäfer wrote:
> Am 14.05.2018 um 20:13 schrieb Torsten Bögershausen:
>> ^M is the representation of a "Carriage Return" or CR.
>> Under Linux/Unix/Mac OS X a line is terminated with a single
>> "line feed", LF.
>>
>> Windows typically uses CRLF at the end of the line.
>> "git diff" uses the LF to detect the end of line,
>> leaving the CR alone.
>>
>> Nothing to worry about.
> Thanks, I already suspected something like that.
> Has this behavior been changed/added recently ?

That is a good question.
There is, to my knowledge, no intentional change.

> I didn't observe it before, although the project I'm currently looking
> into has always been using CR+LF...

Do you mean that older versions did behave differently ?
Do you have a version number for the "old" handling ?

> 
> Why does the ^M only show up in '+' lines ?
> When changing the line end from CR+LF to LF, the diff looks like this:

> 
> -blahblah
> +blahblah
> 
> But I would expect it to be
> 
> -blahblah^M
> +blahblah

May be this helps (I haven't tested it) ?
git config  core.whitespace cr-at-eol

Re: git diff: meaning of ^M at line ends ?

2018-05-14 Thread Torsten Bögershausen

On 14.05.18 18:08, Frank Schäfer wrote:
> What does ^M at the end of lines in the output of 'git diff' mean ?
> 
> Thanks,
> Frank
> 

^M is the representation of a "Carriage Return" or CR.
Under Linux/Unix/Mac OS X a line is terminated with a single
"line feed", LF.

Windows typically uses CRLF at the end of the line.
"git diff" uses the LF to detect the end of line,
leaving the CR alone.

Nothing to worry about.

If you want, you can commit those files with
CRLF in the working tree, and LF in the repo.

More information may be found here:

https://git-scm.com/docs/gitattributes

(Or ask more questions here, if needed)

Re: Can not save changes into stash

2018-05-09 Thread Torsten Bögershausen

[Please no top posting here]
On 09.05.18 15:27, KES wrote:
> I even can not drop local changes:
> 
> $ git checkout local.conf 
> error: pathspec 'local.conf' did not match any file(s) known to git.
> 
> $ git log local.conf
> commit 6df8bab88fd703c6859954adc51b2abaad8f59ec
> Author: Eugen Konkov 
> Date:   Wed May 9 15:31:02 2018 +0300
> 
> Implement module which allow override target option
> 
> Most usefull to configure application on developer host
> 
> 
It may be help, if we have some more information,
either to re-produce your issue or to help with debugging.

Is this a public repo ?
Which Git version do you use ?
Which OS (Linux, Mac OS, Windows anything else ?)
What does "git status" say ?
What does "git diff" say ?

> 
> 09.05.2018, 16:25, "KES" :
>> How to reproduce:
>>
>> $ git update-index --skip-worktree conf/local.conf
>> $ git pull
>> Updating 0cd50c7..bde58f8
>> error: Your local changes to the following files would be overwritten by 
>> merge:
>> conf/local.conf
>> Please commit your changes or stash them before you merge.
>> Aborting
>> $ git stash save
>> No local changes to save

Re: [PATCH v1 1/1] test: Correct detection of UTF8_NFD_TO_NFC for APFS

2018-04-30 Thread Torsten Bögershausen

On 30.04.18 17:33, Elijah Newren wrote:
> Hi,
> 
> On Sun, Apr 29, 2018 at 11:35 PM,  <tbo...@web.de> wrote:
>> From: Torsten Bögershausen <tbo...@web.de>
>>
>> On HFS (which is the default Mac filesystem prior to High Sierra),
>> unicode names are "decomposed" before recording.
>> On APFS, which appears to be the new default filesystem in Mac OS High
>> Sierra, filenames are recorded as specified by the user.
>>
>> APFS continues to allow the user to access it via any name
>> that normalizes to the same thing.
>>
>> This difference causes t0050-filesystem.sh to fail two tests.
>>
>> Improve the test for a NFD/NFC in test-lib.sh:
>> Test if the same file can be reached in pre- and decomposed unicode.
>>
>> Reported-By: Elijah Newren <new...@gmail.com>
>> Signed-off-by: Torsten Bögershausen <tbo...@web.de>
>> ---
>>  t/test-lib.sh | 7 +--
>>  1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index ea2bbaaa7a..e206250d1b 100644
>> --- a/t/test-lib.sh
>> +++ b/t/test-lib.sh
>> @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC '
> 
> I'm not sure what "NFD" and "NFC" stand for, but I suspect the test
> prerequisite name may be specific to how HFS handled things.  If so,
> should it be renamed from UTF8_NFD_TO_NFC to something else, such as
> UTF8_NORMALIZATION?

NFD and NFC both come from the unicode standard, and are just taken
"as is" into the Git world:
https://en.wikipedia.org/wiki/Unicode_equivalence

If you are otherwise happy with the patch, would it be possible
to run it on your system ?
(I don't have a High Sierra box, but I am confident that the test work
 for you).

The other comments may be addressed later, may be.
In any case, they should go into a different commit.

Re: [PATCH v1 1/1] test: Correct detection of UTF8_NFD_TO_NFC for APFS

2018-04-30 Thread Torsten Bögershausen

On 30.04.18 09:56, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
>> From: Torsten Bögershausen <tbo...@web.de>
>>
>> On HFS (which is the default Mac filesystem prior to High Sierra),
>> unicode names are "decomposed" before recording.
>> On APFS, which appears to be the new default filesystem in Mac OS High
>> Sierra, filenames are recorded as specified by the user.
>>
>> APFS continues to allow the user to access it via any name
>> that normalizes to the same thing.
>>
>> This difference causes t0050-filesystem.sh to fail two tests.
>>
>> Improve the test for a NFD/NFC in test-lib.sh:
>> Test if the same file can be reached in pre- and decomposed unicode.
>>
>> Reported-By: Elijah Newren <new...@gmail.com>
>> Signed-off-by: Torsten Bögershausen <tbo...@web.de>
>> ---
>>  t/test-lib.sh | 7 +--
>>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> Thanks.  
> 
> Wouldn't it logically make more sense to check for the target being
> an existing file with "-f"?  It is not an essential part of the test
> for the target to be "readable", but "can be stat(2)ed with the
> other UTF-8 representation" is.

That make sense.
Would you like to amend the patch ?

Re: BUG report: unicode normalization on APFS (Mac OS High Sierra)

2018-04-26 Thread Torsten Bögershausen

On 26.04.18 18:48, Elijah Newren wrote:
> On HFS (which appears to be the default Mac filesystem prior to High
> Sierra), unicode names are "normalized" before recording.  Thus with a
> script like:
> 
> mkdir tmp
> cd tmp
> 
> auml=$(printf "\303\244")
> aumlcdiar=$(printf "\141\314\210")
> >"$auml"
> 
> echo "auml:  " $(echo -n "$auml" | xxd)
> echo "aumlcdiar: " $(echo -n "$aumlcdiar" | xxd)
> echo "Dir contents:  " $(echo -n * | xxd)
> 
> echo "Stat auml: " "$(stat -f "%i   %Sm   %Su %N" "$auml")"
> echo "Stat aumlcdiar:" "$(stat -f "%i   %Sm   %Su %N" "$aumlcdiar")"
> 
> We see output like:
> 
> auml:   : c3a4 ..
> aumlcdiar:  : 61cc 88 a..
> Dir contents:   : 61cc 88 a..
> Stat auml:  857473   Apr 26 09:40:40 2018   newren ä
> Stat aumlcdiar: 857473   Apr 26 09:40:40 2018   newren ä
> 
> On APFS, which appears to be the new default filesystem in Mac OS High
> Sierra, we instead see:
> 
> auml:   : c3a4 ..
> aumlcdiar:  : 61cc 88 a..
> Dir contents:   : c3a4 ..
> Stat auml:  8591766636   Apr 26 09:40:59 2018   newren ä
> Stat aumlcdiar: 8591766636   Apr 26 09:40:59 2018   newren ä
> 
> i.e. APFS appears to record the filename as specified by the user, but
> continues to allow the user to access it via any name that normalizes
> to the same thing.  This difference causes t0050-filesystem.sh to fail
> the final two tests.  I could change the "UTF8_NFD_TO_NFC" flag
> checking in test-lib.sh to instead test the exit code of stat to make
> it pass these two tests, but I have no idea if there are problems
> elsewhere that this would just be papering over.
> 
> I dislike Mac OS and avoid it, so I'd prefer to find someone else
> motivated to fix this.  If no one is, I may eventually try to fix this
> up...in a year or three from now.  But is someone else interested?
> Would this serve as a good microproject for our microprojects list (or
> are the internals hairy enough that this is too big of a project for
> that list)?
> 
> 
> Elijah
> 

Hm,
thanks for the report.
I don't have a high sierra box, but I can probably get one.
t0050 -should- pass automagically, so I feel that I can do something.
Unless someone is faster of course.

Is it possible that  you run
debug=t verbose=t ./t0050-filesystem.sh 
and send the output to me ?

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-04-05 Thread Torsten Bögershausen

On 01.04.18 15:24, Lars Schneider wrote:
>> TRUE or false are values, but just wrong ones.
>> If this test is removed, the user will see "failed to encode "TRUE" to 
>> "UTF-8",
>> which should give enough information to fix it.
> 
> I see your point. However, I would like to stop the processing right
> there for these invalid values. How about 
> 
>   error(_("true/false are no valid working-tree-encodings"));
> 
> I think that is the most straight forward/helpful error message
> for the enduser (I consider the term "boolean" but dismissed it
> as potentially confusing to folks not familiar with the term).
> 
> OK with you?

Yes.

Another thing that came up recently, independent of your series:

What should happen if a user specifies "UTF-8" and the file
has an UTF-8 encoded BOM ?
I ask because I stumbled over such a file coming from a Windows
which the java compiler under Linux didn't accept.

And because some tools love to put an UTF-8 encoded BOM
into text files.

The clearest thing would be to extend the BOM check in 5/9
to cover UTF-32, UTF-16 and UTF-8.

Are there any plans to do so?

And thanks for the work.

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-18 Thread Torsten Bögershausen

Some comments inline

On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Git recognizes files encoded with ASCII or one of its supersets (e.g.
> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
> interpreted as binary and consequently built-in Git text processing
> tools (e.g. 'git diff') as well as most Git web front ends do not
> visualize the content.
> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the

Minor comment:
"Git converts the content"
Everywhere else (?) "encodes or reencodes" is used.
"Git reencodes the content" may be more consistent.


[No comments on the .gitattributes]

>  
> diff --git a/convert.c b/convert.c
> index b976eb968c..aa59ecfe49 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -7,6 +7,7 @@
>  #include "sigchain.h"
>  #include "pkt-line.h"
>  #include "sub-process.h"
> +#include "utf8.h"
>  
>  /*
>   * convert.c - convert a file when checking it out and checking it in.
> @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct 
> text_stat *stats,
>  
>  }
>  
> +static const char *default_encoding = "UTF-8";
> +
> +static int encode_to_git(const char *path, const char *src, size_t src_len,
> +  struct strbuf *buf, const char *enc, int conv_flags)
> +{
> + char *dst;
> + int dst_len;
> + int die_on_error = conv_flags & CONV_WRITE_OBJECT;
> +
> + /*
> +  * No encoding is specified or there is nothing to encode.
> +  * Tell the caller that the content was not modified.
> +  */
> + if (!enc || (src && !src_len))
> + return 0;

(This may have been discussed before.
 As we checked (enc != NULL) I think we can add here:)
if (is_encoding_utf8(enc))
return 0;

> +
> + /*
> +  * Looks like we got called from "would_convert_to_git()".
> +  * This means Git wants to know if it would encode (= modify!)
> +  * the content. Let's answer with "yes", since an encoding was
> +  * specified.
> +  */
> + if (!buf && !src)
> + return 1;
> +
> + dst = reencode_string_len(src, src_len, default_encoding, enc,
> +   _len);
> + if (!dst) {
> + /*
> +  * We could add the blob "as-is" to Git. However, on checkout
> +  * we would try to reencode to the original encoding. This
> +  * would fail and we would leave the user with a messed-up
> +  * working tree. Let's try to avoid this by screaming loud.
> +  */
> + const char* msg = _("failed to encode '%s' from %s to %s");
> + if (die_on_error)
> + die(msg, path, enc, default_encoding);
> + else {
> + error(msg, path, enc, default_encoding);
> + return 0;
> + }
> + }
> +
> + strbuf_attach(buf, dst, dst_len, dst_len + 1);
> + return 1;
> +}
> +
> +static int encode_to_worktree(const char *path, const char *src, size_t 
> src_len,
> +   struct strbuf *buf, const char *enc)
> +{
> + char *dst;
> + int dst_len;
> +
> + /*
> +  * No encoding is specified or there is nothing to encode.
> +  * Tell the caller that the content was not modified.
> +  */
> + if (!enc || (src && !src_len))
> + return 0;

 Same as above:
if (is_encoding_utf8(enc))
return 0;

> +
> + dst = reencode_string_len(src, src_len, enc, default_encoding,
> +   _len);
> + if (!dst) {
> + error("failed to encode '%s' from %s to %s",
> + path, default_encoding, enc);
> + return 0;
> + }
> +
> + strbuf_attach(buf, dst, dst_len, dst_len + 1);
> + return 1;
> +}
> +
>  static int crlf_to_git(const struct index_state *istate,
>  const char *path, const char *src, size_t len,
>  struct strbuf *buf,
> @@ -978,6 +1051,25 @@ static int ident_to_worktree(const char *path, const 
> char *src, size_t len,
>   return 1;
>  }
>  
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> + const char *value = check->value;
> +
> + if (ATTR_UNSET(value) || !strlen(value))
> + return NULL;
> +


> + if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
> + error(_("working-tree-encoding attribute requires a value"));
> + return NULL;
> + }

TRUE or false are values, but just wrong ones.
If this test is removed, the user will see "failed to encode "TRUE" to "UTF-8",
which should give enough information to fix it.

> +
> + /* Don't encode to the default encoding */
> + if (!strcasecmp(value, default_encoding))
> + return NULL;
 Same as above ?:

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread Torsten Bögershausen

On Tue, Mar 06, 2018 at 03:37:16PM -0800, Junio C Hamano wrote:
> Lars Schneider  writes:
> 
> > After thinking about it I wonder if we should barf on "utf16" without
> > dash. Your Linux iconv would handle this correctly. My macOS iconv would 
> > not.
> > That means the repo would checkout correctly on your machine but not on 
> > mine.
> >
> > What do you think?
> 
> To be bluntly honest, I prefer not to have excess "sanity checks";
> there is no need to barf on utf16 in a project run by those who are
> without macOS friends, for example.  For that matter, while I do not
> hate it so much to reject it, I am not all that supportive of this
> "The consortium says without -LE or -BE suffix there must be BOM,
> and this lacks one, so barf loudly" step in this topic myself.

Loosing or adding a BOM couild render a file useless, depending
on the SW that reads and processes it.

The main reason for being critacal is to make sure that the
material in the working-tree does a safe roundtrip when commited,
pushed, pulled, checked out...

The best thing we can do is probably to follow the specification as
good as possible.

Having a clear specification (UTF-16LE/BE has no BOM, UTF-16 has none,
UTF-16 has a BOM) would even open the chance to work around a buggy
iconv library, because Git can check the BOM.
If, and only if, a platform messes it up, and we want to
make a (compile time) workaround in Git for this very platform.

A consistant behavior across all platforms is a good thing,
sometimes I have a network share mounted to Linux, MacOS and
Windows and it doesn't matter under which Git on which
machine I checkout or commit.

Oh, I see, there is a new patch coming...

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-03-04 Thread Torsten Bögershausen

On 2018-02-28 14:21, Jeff King wrote:
> On Wed, Feb 28, 2018 at 09:20:05AM +0100, Torsten Bögershausen wrote:
> 
>>>   2. auto-detect utf-16 (your patch)
>>>  - Just Works for existing repositories storing utf-16
>>>
>>>  - carries some risk of kicking in when people would like it not to
>>>(e.g., when they really do want a binary patch that can be
>>>applied).
>>
>> The binary patch is still supported, but that detail may need some more 
>> explanation
>> in the commit message. Please see  t4066-diff-encoding.sh
> 
> Yeah, but if you don't have binary-patches enabled we'd generate a bogus
> patch. Which, granted, without that you wouldn't be able to apply the
> patch either. But somehow it feels funny to me to generate something
> that _looks_ like a patch but you can't actually apply.
> 
> I also think we'd want a plan for this to be used consistently in other
> diff-like tools. E.g., "git blame" uses textconv for the starting file
> content, and it would be nice for this to kick in then, too. Ditto for
> things like grep, pickaxe, etc.
> 
> I have some patches that reuse some of the textconv infrastructure for
> this, which should mostly make it "just work" everywhere. They need a
> little more polishing before I post them, but you can take a look at:
> 
>   https://github.com/peff/git.git jk/textconv-utf16
> 
> if you want.
> 
> -Peff
> 

Thanks for your work (I actually found some time to take look)

I am looking at the code to put 2 or 3 things on top of it:
- test case(s)
- documentation
- teach diff to add a line "b is converted to UTF-8 from UTF-16"
- teach apply to reads & understands the encoding line and throws
  in a "reencode_string_len() like your patch does

This would keep "git diff | git apply" happy.
All in all the changes do not look too invasive, at least from my point of view.

t006 broken under Mac OS

2018-03-03 Thread Torsten Bögershausen

Hej,
Beside that t1006 has a broken indentation (mixed spaces and TABs at
 the beginning of the line, I get 4 errors here under Mac OS:

not ok 15 - Check %(refname) gives empty output
not ok 36 - Check %(refname) gives empty output
not ok 58 - Check %(refname) gives empty output
not ok 89 - Check %(refname) gives empty output


Running with debug and verbose shows that the empty files are not empty.
The characters in the non-empty file are outside the ASCII range,
so I copy  the stuff in here after running `od -c`  on the log file.
And I don't have a clue, where this stuff comes from - but I get different
"crap" with each run - seams as if there is a read behind a buffer ?



15:
00060000  \n   @   @   -   1   +   1   @   @  \n   -  \n
0006020+ 361   r   0 360 024 254  \n   n   o   t   o   k   1
00060405   -   C   h   e   c   k   %   (   r   e   f   n
0006060a   m   e   )   g   i   v   e   s   e   m   p   t   y
0006100o   u   t   p   u   t  \n   #  \t  \n   #  \t  \t   e   c


36:
0016220-   1   +   1   @   @  \n   -  \n   + 225   x   <   e
0016240  240   @  \n   n   o   t   o   k   3   6   -   C
0016260h   e   c   k   %   (   r   e   f   n   a   m   e   )
0016300g   i   v   e   s   e   m   p   t   y   o   u   t   p
0016320u   t  \n   #  \t  \n   #  \t  \t   e   c   h   o   "   $

58:
0027120   \n   @   @   -   1   +   1   @   @  \n   -  \n   +
0027140  302 206  \a   4 220 311  \n   n   o   t   o   k   5   8
0027160-   C   h   e   c   k   %   (   r   e   f   n   a
0027200m   e   )   g   i   v   e   s   e   m   p   t   y
0027220o   u   t   p   u   t  \n   #  \t  \n   #  \t  \t   e   c   h

89:
00431600   0   0  \n   @   @   -   1   +   1   @   @  \n
0043200-  \n   +   p 034 276 034   !   ;  \n   n   o   t   o   k
00432208   9   -   C   h   e   c   k   %   (   r   e
0043240f   n   a   m   e   )   g   i   v   e   s   e   m   p
0043260t   y   o   u   t   p   u   t  \n   #  \t  \n   #  \t  \t

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-28 Thread Torsten Bögershausen

On Tue, Feb 27, 2018 at 04:25:38PM -0500, Jeff King wrote:
> On Tue, Feb 27, 2018 at 10:05:17PM +0100, Torsten Bögershausen wrote:
> 
> > The other question is:
> > Would this help showing diffs of UTF-16 encoded files on a "git hoster",
> > github/bitbucket/ ?
> 
> Almost. There's probably one more thing needed. We don't currently read
> in-tree .gitattributes when doing a diff in a bare repository. And most
> hosting sites will store bare repositories.
> 
> And of course it would require the users to actually set the attributes
> themselves.
> 
> > Or would the auto-magic UTF-16 avoid binary patch that I send out be more 
> > helpful ?
> > Or both ?
> > Or the w-t-e encoding ?
> 
> Of the three solutions, I think the relative merits are something like
> this:
> 
>   1. baked-in textconv (my patch)
> 
>  - reuses an existing diff feature, so minimal code and not likely to
>break things
> 
>  - requires people to add a .gitattributes entry
> 
>  - needs work to make bare-repo .gitattributes work (though I think
>this would be useful for other features, too)
> 
>  - has a run-time cost at each diff to do the conversion
> 
>  - may sometimes annoy people when it doesn't kick in (e.g.,
>emailed patches from format-patch won't have a readable diff)
> 
>  - doesn't combine with other custom-diff config (e.g., utf-16
>storing C code should still use diff=c funcname rules, but
>wouldn't with my patch)
> 
>   2. auto-detect utf-16 (your patch)
>  - Just Works for existing repositories storing utf-16
> 
>  - carries some risk of kicking in when people would like it not to
>(e.g., when they really do want a binary patch that can be
>applied).

The binary patch is still supported, but that detail may need some more 
explanation
in the commit message. Please see  t4066-diff-encoding.sh
  test_expect_success 'diff --binary against local change' '
 cp file2 file &&
 test_tick &&
 cat >expect <<-\EOF &&
 diff --git a/file b/file
 index 
26acf09b0aad19fb22566956d1a39cb4e2a3b420..e98d27acfb90cfcfc84fcc5173baa4aa7828290f
 100644
 GIT binary patch
 literal 28
 ecmezW?;ArgLn;Fo!ykquAe{qbJq3!C0BHb{ln3Pi

 literal 32
 icmezW?+HT@Lpnn$kmO?c#!w7oaWVX1NCMJ1Ko$VA_z0~4

  EOF
 git diff --binary file >actual &&
 test_cmp expect actual

> 
>I think it would probably be OK if this kicked in only when
>ALLOW_TEXTCONV is set (the default for porcelain), and --binary
>is not (i.e., when we would otherwise just say "binary
>files differ").

The user can still use "git diff" (Where auto-detection of UTF-16 kicks in
and replaces "binary files differ" with an UTF-8 diff.
When the user wants a patch, "git diff --binary" will generate a binary patch,
as before.
The only thing which is missing is the line "binary files differ", which may be 
a
regression. I can re-add it in V2.

> 
>  - similar to (1), carries a run-time cost for each diff, and users
>may sometimes still see binary diffs
> 
>   3. w-t-e (Lars's patch)
> 
>  - requires no server-side modifications; the diff is plain vanilla
> 
>  - works everywhere you diff, plumbing and porcelain
> 
>  - does require people to add a .gitattributes entry
> 
>  - run-time cost is per-checkout, not per-diff
> 
> So I can see room for (3) to co-exist alongside the others. Between (1)
> and (2), I think (2) is probably the better direction.
> 
> -Peff

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-27 Thread Torsten Bögershausen

On Mon, Feb 26, 2018 at 03:46:35PM -0500, Jeff King wrote:
> On Mon, Feb 26, 2018 at 06:35:33PM +0100, Torsten Bögershausen wrote:
> 
> > > diff --git a/userdiff.c b/userdiff.c
> > > index dbfb4e13cd..48fa7e8bdd 100644
> > > --- a/userdiff.c
> > > +++ b/userdiff.c
> > > @@ -161,6 +161,7 @@ IPATTERN("css",
> > >"-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
> > >"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> > >  ),
> > > +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
> > >  { "default", NULL, -1, { NULL, 0 } },
> > >  };
> > >  #undef PATTERNS
> > 
> > The patch looks like a possible step into the right direction -
> > some minor notes: "utf8" is better written as "UTF-8", when talking
> > to iconv.h, same for utf16.
> > 
> > But, how do I activate the diff ?
> > I have in .gitattributes
> > XXXenglish.txt diff=UTF-16
> > 
> > and in .git/config
> > [diff "UTF-16"]
> >   command = iconv:UTF-16
> > 
> > 
> > What am I doing wrong ?
> 
> After applying the patch, if I do:
> 
>   git init
>   echo hello | iconv -f utf8 -t utf16 >file
>   git add file
>   git commit -m one
>   echo goodbye | iconv -f utf8 -t utf16 >file
>   git add file
>   git commit -m two
> 
> then:
> 
>   git log -p
> 
> shows "binary files differ" but:
> 
>   echo "file diff=utf16" >.gitattributes
>   git log -p
> 
> shows text diffs. I assume you tweaked the patch before switching to
> the UTF-16 spelling in your example. Did you use a plumbing command to
> show the diff? textconv isn't enabled for plumbing, because the
> resulting patches cannot actually be applied (in that sense an encoding
> switch is potentially special, since in theory one could convert to the
> canonical text format, apply the patch, and then convert back).
> 
> -Peff

Thanks for helping me out.
I didn't use "git log -p", but a simple "git diff".
(And after re-using utf16 with lowercase, it works as you described it)

I wasn't aware of "git log -p", something learned (or re-learned)

The other question is:
Would this help showing diffs of UTF-16 encoded files on a "git hoster",
github/bitbucket/ ?

Or would the auto-magic UTF-16 avoid binary patch that I send out be more 
helpful ?
Or both ?
Or the w-t-e encoding ?

Questions over questions.

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-26 Thread Torsten Bögershausen

On Sun, Feb 25, 2018 at 08:44:46PM -0500, Jeff King wrote:
> On Sat, Feb 24, 2018 at 04:18:36PM +0100, Lars Schneider wrote:
> 
> > > We always use the in-repo contents when generating 'diff'.  I think
> > > by "attribute to be used in diff", what you are reallying after is
> > > to convert the in-repo contents to that encoding _BEFORE_ running
> > > 'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
> > > the place will not diff well with the xdiff machinery, but if you
> > > first convert it to UTF-8 and have xdiff work on it, you can get
> > > reasonable result out of it.  It is unclear what encoding you want
> > > your final diff output in (it is equally valid in such a set-up to
> > > desire your patch output in UTF-16 or UTF-8), but assuming that you
> > > want UTF-8 in your patch output, perhaps we do not have to break
> > > gitk users by hijacking the 'encoding' attribute.  Instead what you
> > > want is a single bit that says between in-repo or working tree which
> > > representation should be given to the xdiff machinery.
> > 
> > I fear that we could confuse users with an additional knob/bit that
> > defines what we diff against. Git always diff'ed against in-repo 
> > content and I feel it should stay that way.
> 
> Well, except for textconv. You can already do this:
> 
>   echo "foo diff=utf16" >.gitattributes
>   git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
> 
> We could make that easier to use and much more efficient by:
> 
>   1. Allowing a special syntax for textconv filters that kicks off an
>  internal iconv.
> 
>   2. Providing baked-in config for utf16.
> 
> The patch below provides a sketch. But I think Torsten raised a good
> point that you might want the encoding conversion to be independent of
> other diff characteristics (so, e.g., you might say "this is utf16 but
> once converted treat it like C code for finding funcnames, etc").
> 
> ---
> diff --git a/diff.c b/diff.c
> index 21c3838b25..04032e059c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5968,6 +5968,21 @@ struct diff_filepair *diff_unmerge(struct diff_options 
> *options, const char *pat
>   return pair;
>  }
>  
> +static char *iconv_textconv(const char *encoding, struct diff_filespec *spec,
> + size_t *outsize)
> +{
> + char *ret;
> + int outsize_int; /* this really should be a size_t */
> +
> + if (diff_populate_filespec(spec, 0))
> + die("unable to load content for %s", spec->path);
> + ret = reencode_string_len(spec->data, spec->size,
> +   "utf-8", /* should be log_output_encoding? */
> +   encoding, _int);
> + *outsize = outsize_int;
> + return ret;
> +}
> +
>  static char *run_textconv(const char *pgm, struct diff_filespec *spec,
>   size_t *outsize)
>  {
> @@ -5978,6 +5993,9 @@ static char *run_textconv(const char *pgm, struct 
> diff_filespec *spec,
>   struct strbuf buf = STRBUF_INIT;
>   int err = 0;
>  
> + if (skip_prefix(pgm, "iconv:", ))
> + return iconv_textconv(pgm, spec, outsize);
> +
>   temp = prepare_temp_file(spec->path, spec);
>   *arg++ = pgm;
>   *arg++ = temp->name;
> diff --git a/userdiff.c b/userdiff.c
> index dbfb4e13cd..48fa7e8bdd 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -161,6 +161,7 @@ IPATTERN("css",
>"-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
>"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
>  ),
> +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS

The patch looks like a possible step into the right direction -
some minor notes: "utf8" is better written as "UTF-8", when talking
to iconv.h, same for utf16.

But, how do I activate the diff ?
I have in .gitattributes
XXXenglish.txt diff=UTF-16

and in .git/config
[diff "UTF-16"]
  command = iconv:UTF-16


What am I doing wrong ?

Re: Duplicate safecrlf warning for racily clean index entry

2018-02-20 Thread Torsten Bögershausen

On Tue, Feb 20, 2018 at 08:42:26AM -0500, Matt McCutchen wrote:
> I noticed that if a file subject to a safecrlf warning is added to the
> index in the same second that it is created, resulting in a "racily
> clean" index entry, then a subsequent "git add" command prints another
> safecrlf warning.  I reproduced this on the current "next"
> (499d7c4f91).  The procedure:
> 
> $ git init
> $ git config core.autocrlf true
> $ echo foo >file1 && git add file1 && git add file1
> warning: LF will be replaced by CRLF in file1.
> The file will have its original line endings in your working directory.
> warning: LF will be replaced by CRLF in file1.
> The file will have its original line endings in your working directory.
> $ echo bar >file2 && sleep 1 && git add file2 && git add file2
> warning: LF will be replaced by CRLF in file2.
> The file will have its original line endings in your working directory.
> 
> This came up when I ran the test suite for Braid on Windows
> (https://github.com/cristibalan/braid/issues/77).

I think a .gitattributes file could/should be used. I'll answer
there seperatly.

> 
> The phenomenon actually seems to be more general: touching the file
> causes the next "git add" to print a safecrlf warning, suggesting that
> the warning occurs whenever the index entry is dirty.  One could argue
> that a new warning is reasonable after touching the file, but it seems
> clear that "racy cleanliness" is an implementation detail that
> shouldn't have user-visible nondeterministic effects.
> 
> In either case, if "git update-index --refresh" (or "git status") is
> run before "git add", then "git add" does not print the warning.  On
> the other hand, if line endings in the working tree file are changed,
> then git shows the file as having an unstaged change, even though the
> content that would be added to the index after CRLF conversion is
> identical.  So it seems that git remembers the pre-conversion file
> content and uses it for "git update-index --refresh" and would just
> need to use it for "git add" as well.
> 
> Thoughts about the proposed change?  Does someone want to work on it or
> give me a pointer to where to get started?

Good analyzes, thanks for that.

I don't hava a pointer, but what should happen ?
2 warnings for 2 "git add" should be OK, I think.

1 warning is part of the optimization, that Git does to handle
hundrets and thousands of files efficciently.

Is the 1/2 warning  real live problem  ?

> 
> Thanks,
> Matt

Re: [PATCH v7 0/7] convert: add support for different encodings

2018-02-16 Thread Torsten Bögershausen

On Fri, Feb 16, 2018 at 03:42:35PM +0100, Lars Schneider wrote:
[]
> 
> Agreed. However, people using ShiftJIS are not my target audience.
> My target audience are:
> 
> (1) People that have to encode their text files in UTF-16 (for 
> whatever reason - usually because of legacy processes or tools).
> 
> (2) People that want to see textual diffs of their UTF-16 encoded 
> files in their Git tools without special adjustments (on the 
> command line, on the web, etc).
> 
> That was my primary motivation. The fact that w-t-e supports any
> other encoding too is just a nice side effect. I don't foresee people
> using other w-t-encodings other than UTF-16 in my organization.
> 
> I have the suspicion that the feature could be useful for the Git
> community at large. Consider this Stack Overflow question:
> https://stackoverflow.com/questions/777949/can-i-make-git-recognize-a-utf-16-file-as-text
> 
> This question was viewed 42k times and there is no good solution.
> I believe w-t-e could be a good solution.
> 

If it was only about a diff of UTF-16 files, I may suggest a patch.
I simply copy-paste it here for review, if someone thinks that it may
be useful, I can send it as a real patch/RFC.

git show HEAD


commit 9f7d43f29eaf6017b7b16261ce91d8ef182cf415
Author: Torsten Bögershausen <tbo...@web.de>
Date:   Fri Feb 2 15:35:23 2018 +0100

Auto diff of UTF-16 files in UTF-8

When an UTF-16 file is commited and later changed, `git diff` shows
"Binary files XX and YY differ".

When the user wants a diff in UTF-8, a textconv needs to be specified
in .gitattributes and the textconv must be configured.

A more user-friendly diff can be produced for UTF-16 if
- the user did not use `git diff --binary`
- the blob is identified as binary
- the blob has an UTF-16 BOM
- the blob can be converted into UTF-8

Enhance the diff machinery to auto-detect UTF-16 blobs and show them
as UTF-8, unless the user specifies `git diff --binary` which creates
a binary diff.

diff --git a/diff.c b/diff.c
index fb22b19f09..51831ee94d 100644
--- a/diff.c
+++ b/diff.c
@@ -3192,6 +3192,10 @@ static void builtin_diff(const char *name_a,
strbuf_reset();
}
 
+   if (one && one->reencoded_from_utf16)
+   strbuf_addf(, "a is converted to UTF-8 from 
UTF-16\n");
+   if (two && two->reencoded_from_utf16)
+   strbuf_addf(, "b is converted to UTF-8 from 
UTF-16\n");
mf1.size = fill_textconv(textconv_one, one, );
mf2.size = fill_textconv(textconv_two, two, );
 
@@ -3611,8 +3615,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->size = size;
s->should_free = 1;
}
-   }
-   else {
+   if (!s->binary && buffer_is_binary(s->data, s->size) &&
+   buffer_has_utf16_bom(s->data, s->size)) {
+   int outsz = 0;
+   char *outbuf;
+   outbuf = reencode_string_len(s->data, (int)s->size,
+"UTF-8", "UTF-16", );
+   if (outbuf) {
+   if (s->should_free)
+   free(s->data);
+   if (s->should_munmap)
+   munmap(s->data, s->size);
+   s->should_munmap = 0;
+   s->data = outbuf;
+   s->size = outsz;
+   s->reencoded_from_utf16 = 1;
+   s->should_free = 1;
+   }
+   }
+   } else {
enum object_type type;
if (size_only || (flags & CHECK_BINARY)) {
type = sha1_object_info(s->oid.hash, >size);
@@ -3629,6 +3650,19 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->data = read_sha1_file(s->oid.hash, , >size);
if (!s->data)
die("unable to read %s", oid_to_hex(>oid));
+   if (!s->binary && buffer_is_binary(s->data, s->size) &&
+   buffer_has_utf16_bom(s->data, s->size)) {
+   int outsz = 0;
+   char *buf;
+   buf = reencode_string_len(s->data, (int)s->size,
+ "UTF-8", "UTF-16", );
+   if (buf) {
+

Re: Line ending normalization doesn't work as expected

2018-02-16 Thread Torsten Bögershausen

On Thu, Feb 15, 2018 at 09:24:40AM -0600, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano  wrote:

[]
> 
> Sorry to bring this old thread back to life, but I did notice that
> this causes file modes to reset back to 644 (from 755) on Windows
> version of Git. Is there a way to `$ git read-tree --empty && git add
> .` without mucking with file permissions?

No problem with the delay, under the time we had the chance to improve Git:

>Git 2.16 Release Notes
>==
>[]
>* "git add --renormalize ." is a new and safer way to record the fact
>   that you are correcting the end-of-line convention and other
>   "convert_to_git()" glitches in the in-repository data.

Could you upgrade to Git 2.16.1 (or higher, just take the latest)
and try with
git add --renormalize .
?

Re: [PATCH v6 5/7] convert: add 'working-tree-encoding' attribute

2018-02-10 Thread Torsten Bögershausen

On Fri, Feb 09, 2018 at 02:28:28PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Git recognizes files encoded with ASCII or one of its supersets (e.g.
> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
> interpreted as binary and consequently built-in Git text processing
> tools (e.g. 'git diff') as well as most Git web front ends do not
> visualize the content.
> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.
> 
> Signed-off-by: Lars Schneider 
> ---
>  Documentation/gitattributes.txt  |  66 
>  convert.c| 157 -
>  convert.h|   1 +
>  sha1_file.c  |   2 +-
>  t/t0028-working-tree-encoding.sh | 210 
> +++
>  5 files changed, 434 insertions(+), 2 deletions(-)
>  create mode 100755 t/t0028-working-tree-encoding.sh
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 30687de81a..4ecdcd4859 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -272,6 +272,72 @@ few exceptions.  Even though...
>catch potential problems early, safety triggers.
>  
>  
> +`working-tree-encoding`
> +^^^
> +
> +Git recognizes files encoded with ASCII or one of its supersets (e.g.
> +UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
> +interpreted as binary and consequently built-in Git text processing
> +tools (e.g. 'git diff') as well as most Git web front ends do not
> +visualize the content.
> +
> +In these cases you can tell Git the encoding of a file in the working
> +directory with the `working-tree-encoding` attribute. If a file with this
> +attribute is added to Git, then Git reencodes the content from the
> +specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
> +content in its internal data structure (called "the index"). On checkout
> +the content is reencoded back to the specified encoding.
> +
> +Please note that using the `working-tree-encoding` attribute may have a
> +number of pitfalls:
> +
> +- Git clients that do not support the `working-tree-encoding` attribute

A client to Git ?
Or may be "third party Git implementations"

> +  will checkout the respective files UTF-8 encoded and not in the
> +  expected encoding. Consequently, these files will appear different
> +  which typically causes trouble. This is in particular the case for
> +  older Git versions and alternative Git implementations such as JGit
> +  or libgit2 (as of February 2018).
> +
> +- Reencoding content requires resources that might slow down certain
> +  Git operations (e.g 'git checkout' or 'git add').
> +
> +Use the `working-tree-encoding` attribute only if you cannot store a file
> +in UTF-8 encoding and if you want Git to be able to process the content
> +as text.
> +
> +As an example, use the following attributes if your '*.proj' files are
> +UTF-16 encoded with byte order mark (BOM) and you want Git to perform
> +automatic line ending conversion based on your platform.
> +
> +
> +*.proj   text working-tree-encoding=UTF-16
> +
> +
> +Use the following attributes if your '*.proj' files are UTF-16 little
> +endian encoded without BOM and you want Git to use Windows line endings
> +in the working directory. Please note, it is highly recommended to
> +explicitly define the line endings with `eol` if the `working-tree-encoding`
> +attribute is used to avoid ambiguity.
> +
> +
> +*.proj   working-tree-encoding=UTF-16LE text eol=CRLF
> +
> +
> +You can get a list of all available encodings on your platform with the
> +following command:

One question:
 +*.projtext working-tree-encoding=UTF-16
vs
*.proj  working-tree-encoding=UTF-16LE text eol=CRLF

Technically the order of attributes doesn't matter, but that is not what we
want to demonstrate here and now.
I would probably move the "text" attribute to the end of the line.
So that readers don't start to wonder if the order is important.

> +
> +
> +iconv --list
> +
> +
> +If you do not know the encoding of a file, then you can use the `file`
> +command to guess the encoding:
> +
> +
> +file foo.proj
> +
> +
> +
>  `ident`
>  ^^^
>  
> diff --git a/convert.c b/convert.c
> index b976eb968c..dc9e2db6b5 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -7,6 +7,7 @@
>  #include "sigchain.h"
>  #include "pkt-line.h"
>  #include "sub-process.h"
> +#include "utf8.h"
>  
>  /*
>   * convert.c - convert a file

Re: Crash when clone includes magic filenames on Windows

2018-02-10 Thread Torsten Bögershausen

On Sat, Feb 10, 2018 at 03:55:58AM -0500, Jeffrey Walton wrote:
> Hi Everyone,
> 
> I'm seeing this issue on Windows: https://pastebin.com/YfB25E4T . It
> seems the filename AUX is the culprit. Also see
> https://blogs.msdn.microsoft.com/oldnewthing/20031022-00/?p=42073 .
> (Thanks to Milleneumbug on Stack Overflow).
> 
> I did not name the file, someone else did. I doubt the filename will be 
> changed.
> 
> Searching is not turning up much information:
> https://www.google.com/search?q=git+"magic+filenames"+windows
> 
> Does anyone know how to sidestep the issue on Windows?
> 
> Jeff

Thanks for the report.

(Typically nobody (tm) here on the list opens a web-browser to look at external
 material, so here is a shortened version of the pastebin:)

error: unable to create file crypto_stream/lexv2/e/v2/schwabe/sparc-2/e/aux.c: 
No such file or directory
error: unable to create file crypto_stream/lexv2/e/v2/schwabe/sparc-2/e/aux.s: 
No such file or directory
Segmentation fault:  99% (26526/26793)

There are actually 2 problems:
- The filenames named aux.c
  It could be that git -c core.longpaths=true clone xxx
  works, but I don't have a Windows box to test at the moment-

- The crash
  Which Git version do you use?
  It may be a good idea to report it here
  https://github.com/git-for-windows/git

Re: [PATCH v2] name-hash: properly fold directory names in adjust_dirname_case()

2018-02-08 Thread Torsten Bögershausen

On Thu, Feb 08, 2018 at 02:23:33PM -0500, Ben Peart wrote:

[]

> -
> +test_expect_success CASE_INSENSITIVE_FS 'add directory (with different 
> case)' '
> + git reset --hard initial &&
> + mkdir -p dir1/dir2 &&
> + echo > dir1/dir2/a &&
> + echo > dir1/dir2/b &&

Thanks for working on this-
One and a half style nits:
The ">" should be close to the file name:
echo >dir1/dir2/a &&
But then we don't even look into the file, so that the "\" produced by echo is 
not needed.
A simple 
>dir1/dir2/a &&
is all that is needed.

Re: [PATCH v1] name-hash: properly fold directory names in adjust_dirname_case()

2018-02-08 Thread Torsten Bögershausen

On Wed, Feb 07, 2018 at 07:41:56PM -0500, Ben Peart wrote:
[]

> diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> index b29d749bb7..219c96594c 100755
> --- a/t/t0050-filesystem.sh
> +++ b/t/t0050-filesystem.sh
> @@ -80,7 +80,17 @@ test_expect_success 'merge (case change)' '
>   git merge topic
>  '
>  
> -
> +test_expect_success CASE_INSENSITIVE_FS 'add directory (with different 
> case)' '
> + git reset --hard initial &&
> + mkdir -p dir1 &&
> + mkdir -p dir1/dir2 &&
> + touch dir1/dir2/a &&
> + touch dir1/dir2/b &&
> + git add dir1/dir2/a &&
> + git add dir1/DIR2/b &&
> + camel=$(git ls-files | grep dir2) &&
> + test $(echo "$camel" | wc -l) = 2
> +'
>  

There is nothing wrong with with "wc -l", but:
a more new-style would probably use test_line_count() here.

My personal favorite would be to spell out what we expect and run a diff.
When it fails, we can see what fails, and the function would look
like this:


test_expect_success CASE_INSENSITIVE_FS 'add directory (with different case)' '
git reset --hard initial &&
mkdir -p dir1 &&
mkdir -p dir1/dir2 &&
touch dir1/dir2/a &&
touch dir1/dir2/b &&
git add dir1/dir2/a &&
git add dir1/DIR2/b &&
git ls-files | grep dir2 | sort >actual &&
cat >expected <<-\EOF &&
dir1/dir2/a
dir1/dir2/b
EOF
test_cmp expected actual
'

Re: [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding

2018-02-06 Thread Torsten Bögershausen

On Fri, Feb 02, 2018 at 11:17:04AM -0800, Junio C Hamano wrote:
> Torsten Bögershausen <tbo...@web.de> writes:
> 
> > There are 2 opposite opionions/user expectations here:
> >
> > a) They are binary in the working tree, so git should leave the line endings
> >as is. (Unless specified otherwise in the .attributes file)
> > ...
> > b) They are text files in the index. Git will convert line endings
> >if core.autocrlf is true (or the .gitattributes file specifies "-text")
> 
> I sense that you seem to be focusing on the distinction between "in
> the working tree" vs "in the index" while contrasting.  The "binary
> vs text" in your "binary in wt, text in index" is based on the
> default heuristics without any input from end-users or the project
> that uses Git that happens to contain such files.  If the users and
> the project that uses Git want to treat contents in a path as text,
> it is text even when it is (re-)encoded to UTF-16, no?
> 
> Such files may be (mis)classified as binary with the default
> heuristics when there is no help from what is written in the
> .gitattributes file, but here we are talking about the case where
> the user explicitly tells us it is in UTF-16, right?  Is there such a
> thing as UTF-16 binary?

I don't think so, by definiton UTF-16 is ment to be text.
(this means that git ls-files --eol needs some update, I can have a look)

Do we agree that UTF-16 is text ?
If yes, could Git assume that the "text" attribute is set when
working-tree-encoding is set ?

I would even go a step further and demand that the user -must- make a decision
about the line endings for working-tree-encoded files:
working-tree-encoding=UTF-16 # illegal, die()
working-tree-encoding=UTF-16 text=auto   # illegal, die()
working-tree-encoding=UTF-16 -text   # no eol conversion
working-tree-encoding=UTF-16 text# eol according to core.eol
working-tree-encoding=UTF-16 text eol=lf # LF
working-tree-encoding=UTF-16 text eol=crlf   # CRLF

What do you think ?

contrib/completion/git-completion.bash: declare -g is not portable

2018-02-03 Thread Torsten Bögershausen

Hej Duy,
After running t9902-completion.sh on Mac OS I got a failure
in this style:

.../projects/git/git.pu/t/../contrib/completion/git-completion.bash: line 300:
declare: -g: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
--- expected2018-02-03 17:10:18.0 +
+++ out 2018-02-03 17:10:18.0 +
@@ -1,15 +1,2 @@
---quiet
---detach
---track
---orphan=
---ours
---theirs
---merge
---conflict=
---patch
---ignore-skip-worktree-bits
---ignore-other-worktrees
---recurse-submodules
---progress
 --no-track
 --no-recurse-submodules
not ok 92 - double dash "git checkout"

What is "declare -g" good for ?

Re: [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding

2018-01-31 Thread Torsten Bögershausen

[]
> > That is a good one.
> > If you ever plan a re-roll (I don't at the moment) the *.proj extemsion
> > make much more sense in Documentation/gitattributes that *.tx
> > There no text files encoded in UTF-16 wich are called xxx.txt, but those
> > are non-ideal examples. *.proj makes good sense as an example.
> 
> OK, I'll do that. Would that fix the problem which this patch tries to 
> address for you?
> (I would also explicitly add a paragraph to discuss line endings)

Please let me see the patch first, before I can have a comment.

But back to the more general question:

How should Git handle the line endings of UTF-16 files in the woring-tree,
which are UTF-8 in the index?


There are 2 opposite opionions/user expectations here:

a) They are binary in the working tree, so git should leave the line endings
   as is. (Unless specified otherwise in the .attributes file)
b) They are text files in the index. Git will convert line endings
   if core.autocrlf is true (or the .gitattributes file specifies "-text")

My feeling is that both arguments are valid, so let's ask for opinions
and thoughts of others.
Erik, Junio, Johannes, Johannes, Jeff, Ramsay, everybody:
What do yo think ?

Re: [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding

2018-01-30 Thread Torsten Bögershausen

On Tue, Jan 30, 2018 at 12:23:47PM +0100, Lars Schneider wrote:
> 
> > On 29 Jan 2018, at 21:19, tbo...@web.de wrote:
> > 
> > From: Torsten Bögershausen <tbo...@web.de>
> > 
> > UTF-16 encoded files are treated as "binary" by Git, and no CRLF
> > conversion is done.
> > When the UTF-16 encoded files are converted into UF-8 using the new
> s/UF-8/UTF-8/
> 
> 
> > "working-tree-encoding", the CRLF are converted if core.autocrlf is true.
> > 
> > This may lead to confusion:
> > A tool writes an UTF-16 encoded file with CRLF.
> > The file is commited with core.autocrlf=true, the CLRF are converted into 
> > LF.
> > The repo is pushed somewhere and cloned by a different user, who has
> > decided to use core.autocrlf=false.
> > He uses the same tool, and now the CRLF are not there as expected, but LF,
> > make the file useless for the tool.
> > 
> > Avoid this (possible) confusion by ignoring core.autocrlf for all files
> > which have "working-tree-encoding" defined.
> 
> Maybe I don't understand your use case but I think this will generate even 
> more confusion because that's not what I would expect as a user. I think Git 
> should behave consistently independent of the used encoding. Here are my 
> arguments:

To start with: I have probably seen too many repos with CRLF messed up.

> 
>   (1) Legacy users are *not* affected. If you don't use the 
> "working-tree-encoding"
>   attribute then nothing changes for you.

People who don't use "working-tree-encoding" are not affected,
I never ment to state that.

I am thinking about people who use "working-tree-encoding" without thinking
about line endings.
Or the ones that have in mind that core.autocrlf=true will leave the
line endings for UTF-16 encoded files as is, but that changes as soon as they
are converted into UTF-8 and the "auto" check is now done
-after- the conversion. I would find that confusing.

> 
>   (2) If you use the "working-tree-encoding" attribute *and* you want to 
> ensure 
>   your file keeps CRLF then you can define that in the attributes too. 
> E.g.:
>   
>   *.proj textworking-tree-encoding=UTF-16 eol=crlf

That is a good one.
If you ever plan a re-roll (I don't at the moment) the *.proj extemsion
make much more sense in Documentation/gitattributes that *.tx
There no text files encoded in UTF-16 wich are called xxx.txt, but those
are non-ideal examples. *.proj makes good sense as an example.


> 
> - Lars
> 
> 
> 
> > The user can still use a .gitattributes file and specify the line endings
> > like "text=auto", "text", or "text eol=crlf" and let that .gitattribute
> > file travel together with push and clone.
> > 
> > Change convert.c to e more careful, simplify the initialization when
> > attributes are retrived (and none are specified) and update the 
> > documentation.
> > 
> > Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> > ---
> > Documentation/gitattributes.txt |  9 ++---
> > convert.c   | 15 ---
> > 2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/Documentation/gitattributes.txt 
> > b/Documentation/gitattributes.txt
> > index a8dbf4be3..3665c4677 100644
> > --- a/Documentation/gitattributes.txt
> > +++ b/Documentation/gitattributes.txt
> > @@ -308,12 +308,15 @@ Use the `working-tree-encoding` attribute only if you 
> > cannot store a file in
> > UTF-8 encoding and if you want Git to be able to process the content as
> > text.
> > 
> > +Note that when `working-tree-encoding` is defined, core.autocrlf is 
> > ignored.
> > +Set the `text` attribute (or `text=auto`) to enable CRLF conversions.
> > +
> > Use the following attributes if your '*.txt' files are UTF-16 encoded
> > -with byte order mark (BOM) and you want Git to perform automatic line
> > -ending conversion based on your platform.
> > +with byte order mark (BOM) and you want Git to perform line
> > +ending conversion based on core.eol.
> > 
> > 
> > -*.txt  text working-tree-encoding=UTF-16
> > +*.txt  working-tree-encoding=UTF-16 text
> > 
> > 
> > Use the following attributes if your '*.txt' files are UTF-16 little
> > diff --git a/convert.c b/convert.c
> > index 13fad490c..e7f11d1db 100644
> > --- a/convert.c
> > +++ b/convert.c
> > @@ -1264,15 +1264,24 @@ static void convert_attrs(struct conv_attrs *ca, 
> > const char *

Re: Cloned repository has file changes -> bug?

2018-01-27 Thread Torsten Bögershausen

On Sat, Jan 27, 2018 at 08:59:50PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Jan 27 2018, Filip Jorissen jotted:
> 
> > I think our git repository is bugged. The reason why I say this is the
> > following. When cloning the repository, the newly cloned repository
> > immediately has file changes[...].
> 
> If you run this:
> 
> git ls-files | tr '[:upper:]' '[:lower:]' | sort | uniq -D | grep '^'
> 
> You'll see that the reason is that you have files that differ only in
> case.
> 
> You are using a Mac, and Macs by default think that files that are
> different binary strings are the same file, since they don't consider
> case to be relevant. The file FOO, foo and FoO and fOo are all the same
> file as far as your Mac is concerned, but would be 4 different files on
> Linux.
> 
> > How can I fix the repository?
> 
> You could check it out on a OS that considers files that differ in case
> to be different files, e.g. on Linux, move them around, push it, and new
> clones should work on your Mac.
> 
> Alternatively I hear that you can create a loopback case-sensitive FS
> image on Macs.

You can even fix the repo locally.
There are 2 files with uppercase/lowercase collisions.
I show you how to fix one off these, the other one goes similar.
After that, do a commit and a push and pull request.



Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
modified:   
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Utilities_Psychrometrics_Functions_Examples_saturationPressure.txt

no changes added to commit (use "git add" and/or "git commit -a")
user@mac:/tmp/IDEAS> git ls-files -s | grep -i 
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
100644 f56cfcf14aa4b53dfc5ecfb488366f721c94c8e2 0   
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
100644 e345e1372111d034b4c5a1c75eb791340b93f55e 0   
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
user@mac:/tmp/IDEAS> git mv 
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
 
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump2.txt
user@mac:/tmp/IDEAS> git checkout  
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump2.txt
user@mac:/tmp/IDEAS> git checkout 
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
user@mac:/tmp/IDEAS> git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD ..." to unstage)

renamed:
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump.txt
 -> 
IDEAS/Resources/ReferenceResults/Dymola/ideas_Fluid_HeatExchangers_GroundHeatExchangers_Borefield_Examples_MultipleBoreholesWithHeatPump2.txt

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   
IDEAS/Resources/ReferenceResults/Dymola/IDEAS_Utilities_Psychrometrics_Functions_Examples_saturationPressure.txt

user@mac:/tmp/IDEAS>

Re: [PATCH v4 0/6] convert: add support for different encodings

2018-01-23 Thread Torsten Bögershausen

On Sat, Jan 20, 2018 at 04:24:12PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Hi,
> 
> Patches 1-4 and 6 are preparation and helper functions.
> Patch 5 is the actual change.

I (still) have 2 remarks on convert.c - to make live easier,
I will send a small "on top" patch the next days.

Re: [PATCH v2 0/6] Force pipes to flush immediately on NonStop platform

2018-01-20 Thread Torsten Bögershausen

On Fri, Jan 19, 2018 at 12:33:59PM -0500, randall.s.bec...@rogers.com wrote:
> From: "Randall S. Becker" 
> 
> * wrapper.c: called setbuf(stream,0) to force pipe flushes not enabled by
>   default on the NonStop platform.
> 
> Signed-off-by: Randall S. Becker 
> ---
>  wrapper.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/wrapper.c b/wrapper.c
> index d20356a77..671cbb4b4 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -403,6 +403,9 @@ FILE *xfdopen(int fd, const char *mode)
>   FILE *stream = fdopen(fd, mode);
>   if (stream == NULL)
>   die_errno("Out of memory? fdopen failed");
> +#ifdef __TANDEM
> + setbuf(stream,0);
> +#endif

Reading the commit message, I would have expected someting similar to

#ifdef FORCE_PIPE_FLUSHES
setbuf(stream,0);
#endif

(Because other systems may need the tweak as well, some day)
Of course you need to change that in the Makefile and config.mak.uname

>   return stream;
>  }
>  
> -- 
> 2.16.0.31.gf1a482c
>

Re: [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags

2018-01-08 Thread Torsten Bögershausen

On Mon, Jan 08, 2018 at 11:47:20PM +0100, Lars Schneider wrote:
> 
> > On 08 Jan 2018, at 22:28, Junio C Hamano  wrote:
> > 
> > lars.schnei...@autodesk.com writes:
> > 
> >> diff --git a/sha1_file.c b/sha1_file.c
> >> index afe4b90f6e..dcb02e9ffd 100644
> >> --- a/sha1_file.c
> >> +++ b/sha1_file.c
> >> @@ -75,14 +75,14 @@ static struct cached_object *find_cached_object(const 
> >> unsigned char *sha1)
> >> }
> >> 
> >> 
> >> -static enum safe_crlf get_safe_crlf(unsigned flags)
> >> +static int get_conv_flags(unsigned flags)
> >> {
> >>if (flags & HASH_RENORMALIZE)
> >> -  return SAFE_CRLF_RENORMALIZE;
> >> +  return CONV_EOL_RENORMALIZE;
> >>else if (flags & HASH_WRITE_OBJECT)
> >> -  return safe_crlf;
> >> +  return global_conv_flags_eol | CONV_WRITE_OBJECT;
> > 
> > This macro has not yet introduced at this point (it appears in 6/7
> > if I am not mistaken).
> 
> Nice catch. I'll fix that in the next iteration.
> 
> Is it OK if I send the next iteration soon or would you prefer
> it if I wait until after 2.16 release?
> 
> Plus, is it ok to keep the base of the series or would you prefer
> it if I rebase it to the latest master (because of a minor conflict)?
> 
> Thanks,
> Lars

I noticed the missing macro as well, while doing the rebase
to git.git/master, but forget to mention it here on the list

Lars, if you want, please have a look here:
https://github.com/tboegi/git/tree/180108-1858-For-lars-schneider-encode-V3C

Re: [PATCH v3 0/7] convert: add support for different encodings

2018-01-08 Thread Torsten Bögershausen

On Mon, Jan 08, 2018 at 03:38:48PM +0100, Lars Schneider wrote:
[]
> > Some comments:
> > 
> > I would like to have the CRLF conversion a little bit more strict -
> > many users tend to set core.autocrlf=true or write "* text=auto"
> > in the .gitattributes.
> > Reading all the effort about BOM markers and UTF-16LE, I think there
> > should ne some effort to make the line endings round trip.
> > Therefore I changed convert.c to demand that the "text" attribute
> > is set to enable CRLF conversions.
> > (If I had submitted the patch, I would have demanded
> > "text eol=lf" or "text eol=crlf", but the test case t0028 indicates
> > that there is a demand to produce line endings as configured in core.eol)
> 
> But wouldn't that be inconvenient for the users? E.g. if I add a UTF-16
> file on Windows with CRLF then it would be nice if Git would automatically
> convert the line endings to LF on Linux, no?
> 
> IOW: Why should we handle text files that have a defined checkout-encoding
> differently compared to UTF-8 encoded text files? Wouldn't that be unexpected
> to the user?
> 
> Thanks,
> Lars

The problem is, if user A has core.autocrlf=true and user B
core.autocrlf=false.
(The line endings don't show up as expected at user B)

Having said that in all shortness, you convinced me:
If text=auto, we care about line endings.
If text,  we care about line endings.

If the .gitattributes don't say anything about text,
we don't convert eol.
(In other words: we don't look at core.autocrlf, when checkout-encoding
is defined)

A new branch is push to github/tboegi

Re: [PATCH v3 0/7] convert: add support for different encodings

2018-01-07 Thread Torsten Bögershausen

On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Hi,
> 
> Patches 1-5 and 6 are helper functions and preparation.
> Patch 6 is the actual change.
> 
> I am still torn between "checkout-encoding" and "working-tree-encoding"
> as attribute name. I am happy to hear arguments for/against one or the
> other.

checkout-encoding is probably misleading, as it is even the checkin-encoding.

What is wrong with working-tree-encoding ?
I think the 2 "-".

What was wrong with workingtree-encoding ?
Or
workdir-encoding ?



> 
> Changes since v2:
> 
> * Added Torsten's crlfsave refactoring patch (patch 5)
>   @Torsten: I tried to make the commit message more clean, added
> some comments to and renamed conv_flags_eol to
> global_conv_flags_eol.
> 
> * Improved documentation and commit message (Torsten)

Good, thanks.
> 
> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
> 
> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
> 
> * Made BOM arrays static (Ramsay)


Some comments:

I would like to have the CRLF conversion a little bit more strict -
many users tend to set core.autocrlf=true or write "* text=auto"
in the .gitattributes.
Reading all the effort about BOM markers and UTF-16LE, I think there
should ne some effort to make the line endings round trip.
Therefore I changed convert.c to demand that the "text" attribute
is set to enable CRLF conversions.
(If I had submitted the patch, I would have demanded
"text eol=lf" or "text eol=crlf", but the test case t0028 indicates
that there is a demand to produce line endings as configured in core.eol)

Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B

Here is a inter-diff against your version:

 diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
 index 1bc03e69c..b8d9f91c8 100644
 --- a/Documentation/gitattributes.txt
 +++ b/Documentation/gitattributes.txt
 @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text 
processing
  tools (e.g. 'git diff') as well as most Git web front ends do not
  visualize the content.
  
 -In these cases you can teach Git the encoding of a file in the working
 +In these cases you can tell Git the encoding of a file in the working
  directory with the `checkout-encoding` attribute. If a file with this
  attributes is added to Git, then Git reencodes the content from the
  specified encoding to UTF-8 and stores the result in its internal data
 @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot 
store a file in
  UTF-8 encoding and if you want Git to be able to process the content as
  text.
  
 +Note that when `checkout-encoding` is defined, by default the line
 +endings are not converted. `text=auto` and core.autocrlf are ignored.
 +Set the `text` attribute to enable CRLF conversions.
 +
  Use the following attributes if your '*.txt' files are UTF-16 encoded
 -with byte order mark (BOM) and you want Git to perform automatic line
 -ending conversion based on your platform.
 +with byte order mark (BOM).
  
  
 -*.txt text checkout-encoding=UTF-16
 +*.txt checkout-encoding=UTF-16
  
  
  Use the following attributes if your '*.txt' files are UTF-16 little
 -endian encoded without BOM and you want Git to use Windows line endings
 -in the working directory.
 +endian encoded without BOM and you want Git to use LF in the repo and
 +CRLF in the working directory.
  
  
  *.txt checkout-encoding=UTF-16LE text eol=CRLF
 diff --git a/convert.c b/convert.c
 index 13f766d2a..1e29f515e 100644
 --- a/convert.c
 +++ b/convert.c
 @@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char 
*path, enum crlf_action crlf_
}
  }
  
  
  static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
 @@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate,
 * cherry-pick.
 */
if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
 -  has_cr_in_index(istate, path))
 +  has_crlf_in_index(istate, path))
convert_crlf_into_lf = 0;
}
if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
 @@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
ca->crlf_action = git_path_check_crlf(ccheck + 0);
ca->ident = git_path_check_ident(ccheck + 1);
ca->drv = git_path_check_convert(ccheck + 2);
 +  ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
if (ca->crlf_action != CRLF_BINARY) {
enum eol eol_attr = git_path_check_eol(ccheck + 3);
 -

Re: [PATCH v3 1/1] convert_to_git(): checksafe becomes int conv_flags

2018-01-05 Thread Torsten Bögershausen

On 2018-01-05 20:00, Lars Schneider wrote:
> 
>> On 01 Jan 2018, at 22:59, tbo...@web.de wrote:
>>
>> From: Torsten Bögershausen <tbo...@web.de>
>>
>> When calling convert_to_git(), the checksafe parameter has been used to
>> check if commit would give a non-roundtrip conversion of EOL.
>>
>> When checksafe was introduced, 3 values had been in use:
>> SAFE_CRLF_FALSE: no warning
>> SAFE_CRLF_FAIL:  reject the commit if EOL do not roundtrip
>> SAFE_CRLF_WARN:  warn the user if EOL do not roundtrip
>>
>> Already today the integer value 0 is passed as the parameter checksafe
>> instead of the correct enum value SAFE_CRLF_FALSE.
>>
>> Turn the whole call chain to use an integer with single bits, which
>> can be extended in the next commits:
>> - The global configuration variable safe_crlf is now conv_flags_eol.
>> - The parameter checksafe is renamed into conv_flags.
>>
>> Helped-By: Lars Schneider <larsxschnei...@gmail.com>
>> Signed-off-by: Torsten Bögershausen <tbo...@web.de>
>> ---
>> This is my suggestion.
>> (1) The flag bits had been renamed.
>> (2) The (theoretical ?) mix of WARN/FAIL is still there,
>>I am not sure if this is a real problem.
>>
>> (3) There are 2 reasons that CONV_EOL_RENORMALIZE is set.
>>Either in a renormalizing merge, or by running
>>git add --renormalize .
>>Therefor HASH_RENORMALIZE is not the same as CONV_EOL_RENORMALIZE.
> 
> Can you elaborate a bit? I am diving into the code but I am still confused.
> 
> I also noticed that the "flags" integer is potentially double booked with
> the following values (see read-cache.c:add_to_index()):
> 
> #define ADD_CACHE_VERBOSE 1
> #define ADD_CACHE_PRETEND 2
> #define ADD_CACHE_IGNORE_ERRORS   4
> 
> #define HASH_WRITE_OBJECT 1
> #define HASH_FORMAT_CHECK 2
> #define HASH_RENORMALIZE  4
> 
> Is this intentional? 

All these flags have a different context,
and the right #define must be used in the right context/function call.
So there is no intention.
You start with 1, then use 2 and so on.

The same way as family Schmidt and family Meier both call
their child "Hans".
There is no connection.


> 
> Thanks,
> Lars
> 
> 
> More context:
>   https://public-inbox.org/git/96b6cd4c-0a0c-47f5-922d-b8bafb832...@gmail.com/
>   (3) We kind of replicate some flags defined in cache.h:
>  #define HASH_WRITE_OBJECT 1
>  #define HASH_RENORMALIZE  4
> 
>

Re: [PATCH v2 4/5] convert: add support for 'checkout-encoding' attribute

2017-12-29 Thread Torsten Bögershausen

I probably need to look at convert.c more closer, some other comments inline.


On Fri, Dec 29, 2017 at 04:22:21PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).

UTF-8 is too strict, the text from below is more correct:
 +Git recognizes files encoded with ASCII or one of its supersets (e.g.
 +UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
 +interpreted as binary and consequently built-in Git text processing
 +tools (e.g. 'git diff') as well as most Git web front ends do not
 +visualize the content.


> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will

Minor question about "canonical":
Would this mean the same ?
 ...then Git converts the content into  UTF-8.


> reverse the conversion.
> 
> Signed-off-by: Lars Schneider 
> ---
>  Documentation/gitattributes.txt |  59 
>  apply.c |   2 +-
>  blame.c |   2 +-
>  combine-diff.c  |   2 +-
>  convert.c   | 196 ++-
>  convert.h   |   8 +-
>  diff.c  |   2 +-
>  sha1_file.c |   5 +-
>  t/t0028-checkout-encoding.sh| 197 
> 
>  9 files changed, 460 insertions(+), 13 deletions(-)
>  create mode 100755 t/t0028-checkout-encoding.sh
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 30687de81a..0039bd38c3 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -272,6 +272,65 @@ few exceptions.  Even though...
>catch potential problems early, safety triggers.
>  
>  
> +`checkout-encoding`
> +^^^
> +
> +Git recognizes files encoded with ASCII or one of its supersets (e.g.
> +UTF-8 or ISO-8859-1) as text files.  All other encodings are usually
> +interpreted as binary and consequently built-in Git text processing
> +tools (e.g. 'git diff') as well as most Git web front ends do not
> +visualize the content.
> +

> +In these cases you can teach Git the encoding of a file in the working
  teach ? tell ? 
> +directory with the `checkout-encoding` attribute. If a file with this
> +attributes is added to Git, then Git reencodes the content from the
> +specified encoding to UTF-8 and stores the result in its internal data
> +structure.

Minor Q:
> its internal data structure.
Should we simply write "stores the result in the index" ?

> On checkout the content is encoded back to the specified
> +encoding.

> +
> +Please note that using the `checkout-encoding` attribute has a number
> +of drawbacks:
> +
> +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
> +  errors as the conversion might not be round trip safe.
> +
> +- Reencoding content requires resources that might slow down certain
> +  Git operations (e.g 'git checkout' or 'git add').
> +
> +- Git clients that do not support the `checkout-encoding` attribute or
> +  the used encoding will checkout the respective files as UTF-8 encoded.
> +  That means the content appears to be different which could cause
> +  trouble. Affected clients are older Git versions and alternative Git
> +  implementations such as JGit or libgit2 (as of January 2018).
> +
> +Use the `checkout-encoding` attribute only if you cannot store a file in
> +UTF-8 encoding and if you want Git to be able to process the content as
> +text.
> +

I would maybe rephrase a little bit (first things first):

Please note that using the `checkout-encoding` attribute may have a number
of pitfalls:

- Git clients that do not support the `checkout-encoding` attribute
  will checkout the respective files as UTF-8 encoded,  which typically
  causes trouble.
  Escpecialy when older Git versions are used or alternative Git
  implementations such as JGit or libgit2 (as of January 2018).

- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause
  errors as the conversion might not be round trip safe.

- Reencoding content requires resources that might slow down certain
  Git operations (e.g 'git checkout' or 'git add').

Use the `checkout-encoding` attribute only if you cannot store a file in
UTF-8 encoding and if you want Git to be able to process the content as
text.

-
Side question: What happens if "the used encoding" is not supported by
the underlying iconv lib ?
Will Git fail, delete the file from the working tree ?
That may be worth to mention. (And I need to do the code-review)

> +Use the

Re: [PATCH v2 1/5] strbuf: add xstrdup_toupper()

2017-12-29 Thread Torsten Bögershausen

On Fri, Dec 29, 2017 at 04:22:18PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Create a copy of an existing string and make all characters upper case.
> Similar xstrdup_tolower().
> 
> This function is used in a subsequent commit.
> 
> Signed-off-by: Lars Schneider 
> ---
>  strbuf.c | 13 +
>  strbuf.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index 323c49ceb3..54276e96e7 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -760,6 +760,19 @@ char *xstrdup_tolower(const char *string)
>   return result;
>  }
>  
> +char *xstrdup_toupper(const char *string)
> +{
> + char *result;
> + size_t len, i;
> +
> + len = strlen(string);
> + result = xmallocz(len);
> + for (i = 0; i < len; i++)
> + result[i] = toupper(string[i]);
> + result[i] = '\0';

Isn't this already done by xmallocz()

> + return result;
> +}
> +
>  char *xstrvfmt(const char *fmt, va_list ap)
>  {
>   struct strbuf buf = STRBUF_INIT;
> diff --git a/strbuf.h b/strbuf.h
> index 0a74acb236..2bc148526f 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -616,6 +616,7 @@ __attribute__((format (printf,2,3)))
>  extern int fprintf_ln(FILE *fp, const char *fmt, ...);
>  
>  char *xstrdup_tolower(const char *);
> +char *xstrdup_toupper(const char *);
>  
>  /**
>   * Create a newly allocated string using printf format. You can do this 
> easily
> -- 
> 2.15.1
>

Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-29 Thread Torsten Bögershausen

On 2017-12-28 17:14, Lars Schneider wrote:> 
>> On 17 Dec 2017, at 18:14, Torsten Bögershausen <tbo...@web.de> wrote:
>>
>> On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote:
>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>
> 
>>> +`encoding`
>>> +^^
>>> +
>>> +By default Git assumes UTF-8 encoding for text files.  The `encoding`
>>
>> This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8.
> 
> I am not sure I understand what you want to tell me here.
> Do you think UTF-8 encoding is not correct in the sentence above?

Git itself was designed to handle source code in ASCII.
Text files which are encoded in ISO-8859-1, UTF-8 or any
super-set of ASCII are handled as well, and what exactly to do
with bytes >0x80 is outside the responsibility of Git.
E.g. the interpretation and rendering on the screen may be
dependent on the locale or being guessed.
All in all, saying that Git expects UTF-8 may be "overdriven".
Any kind of file that uses an '\n' as an end of line
and has no NUL bytes '\0' works as a text file in Git.
(End of BlaBla ;-)

We can probably replace:
"By default Git assumes UTF-8 encoding for text files"

with something like
"Git handles UTF-8 encoded files as text files, but UTF-16 encoded
 files are handled as binary files"

>>
>>> +attribute sets the encoding to be used in the working directory.
>>> +If the path is added to the index, then Git encodes the content to
>>> +UTF-8.  On checkout the content is encoded back to the original
>>> +encoding.  Consequently, you can use all built-in Git text processing
>>> +tools (e.g. git diff, line ending conversions, etc.) with your
>>> +non-UTF-8 encoded file.
>>> +
>>> +Please note that re-encoding content can cause errors and requires
>>> +resources. Use the `encoding` attribute only if you cannot store
>>> +a file in UTF-8 encoding and if you want Git to be able to process
>>> +the text.
>>> +
>>> +
>>> +*.txt  text encoding=UTF-16
>>> +
>>
>> I think that encoding (or worktree-encoding as said elsewhere) must imply 
>> text.
>> (That is not done in convert.c, but assuming binary or even auto
>> does not make much sense to me)
> 
> "text" only means "LF". "-text" means CRLF which would be perfectly fine
> for UTF-16. Therefore I don't think we need to imply text.
> Does this make sense?
Yes and now.

"text" means convert CRLF at "git add" (or commit) into LF,
And potentially LF into CRLF at checkout,
depending on the EOL attribute (if set), core.autocrlf and/or core.eol.

"-text" means don't touch CRLF or LF at all. Files with CRLF are commited
with CRLF.
Running a Unix like "diff" tool will often show "^M" to indicate that there
is a CR before the LF, which may be distracting or confusing.

If someone ever wants to handle the UTF-16 files on a Linux/Mac/Unix system,
it makes sense to convert the line endings into LF before storing them
into the index (at least this is my experience).

(Not specifying "text" or "-text" at all will trigger the auto-handling,
 which is not needed at all).
So if we want to be sure that line endings are CRLF in the worktree we
should ask the user to specify things like this:

*.ext worktree-encoding=UTF-16 text eol=CRLF

It may be enough to give this example in the documentation.
and I would rather be over-careful here, to avoid future problems.

> 
>>
>>
>>> +
>>> +All `iconv` encodings with a stable round-trip conversion to and from
>>> +UTF-8 are supported.  You can see a full list with the following command:
>>> +
>>> +
>>> +iconv --list
>>> +
>>> +
>>> `eol`
>>> ^
>>>
>>> diff --git a/convert.c b/convert.c
>>> index 20d7ab67bd..ee19c17104 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -7,6 +7,7 @@
>>> #include "sigchain.h"
>>> #include "pkt-line.h"
>>> #include "sub-process.h"
>>> +#include "utf8.h"
>>>
>>> /*
>>>  * convert.c - convert a file when checking it out and checking it in.
>>> @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct 
>>> text_stat *stats,
>>>
>>> }
>>
>> I would avoid to use these #ifdefs here.
>> All functions can be in strbuf.c (and may have #ifdefs there), b

Re: [PATCH] status: handle worktree renames

2017-12-26 Thread Torsten Bögershausen

On 2017-12-25 11:37, Nguyễn Thái Ngọc Duy wrote:
[]
>  wt-status.c   | 24 +++-
>  wt-status.h   |  1 +
>  3 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index 1bdf38e80d..41a8874e60 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -150,5 +150,20 @@ test_expect_success 'commit: ita entries ignored in 
> empty commit check' '
>   )
>  '
>  
> +test_expect_success 'rename detection finds the right names' '
> + git init rename-detection &&
> + (
> + cd rename-detection &&
> + echo contents > original-file &&
Micro-nit, please no " " after ">":
echo contents >original-file

Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-23 Thread Torsten Bögershausen

On Mon, Dec 18, 2017 at 08:12:49AM -0500, Jeff King wrote:
> On Mon, Dec 18, 2017 at 11:13:34AM +0100, Torsten Bögershausen wrote:
> 
> > Just to confirm my missing knowledge here:
> > Does this mean, that git-gui and gitk can decode/reencode
> > the content of a file/blob, when the .gitattributes say so ?
> 
> That's my impression, yes.
> 
> > If yes, would it make sense to enhance the "git diff" instead ?
> > "git diff --encoding" will pick up the commited encoding from
> > .attributes, convert it into UTF-8, and run the diff ?
> 
> You can do something like this already:
> 
>   git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
>   echo file diff=utf16 >.gitattributes
> 
> I have no objection to teaching it that "encoding" means roughly the
> same thing, which would have two advantages:
> 
>  - we could do it in-process, which would be much faster
> 
>  - we could skip the extra config step, which is a blocker for
>server-side diffs on sites like GitHub
> 
> I don't think you'd really need "--encoding". This could just be
> triggered by the normal "--textconv" rules (i.e., just treat this "as
> if" the textconv above was configured when we see an encoding
> attribute).

I can probably come up with a patch the next weeks or so.

> 
> > We actually could enhance the "git diff" output with a single
> > line saying
> > "Git index-encoding=cp1251"
> > or so, which can be picked up by "git apply".
> 
> That extends it beyond what textconv can do (we assume that textconv
> patches are _just_ for human consumption, and can't be applied). And it
> would mean you'd potentially want to trigger it in more places. On the
> other hand, I don't know that we're guaranteed that a round-trip
> between encodings will produce a byte-wise identical result. The nice
> thing about piggy-backing on textconv is that it's already dealt with
> that problem.
> 
> -Peff

Re: [PATCH v3 1/1] check-non-portable-shell.pl: Quoted `wc -l` is not portable

2017-12-22 Thread Torsten Bögershausen

On Fri, Dec 22, 2017 at 01:07:59PM -0800, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
> >
> > Reviewed-by: Johannes Schindelin <johannes.schinde...@gmx.de>
> > Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> 

> I'll flip these and add a helped-by to credit Eric.
...
> Don't try to apply this patch to your tree yourself ;-)

Thanks so much for cleaning up my mess.

Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-18 Thread Torsten Bögershausen

On Mon, Dec 11, 2017 at 09:47:24PM +0100, Johannes Sixt wrote:
> Am 11.12.2017 um 16:50 schrieb lars.schnei...@autodesk.com:
> >From: Lars Schneider 
> >
> >Git and its tools (e.g. git diff) expect all text files in UTF-8
> >encoding. Git will happily accept content in all other encodings, too,
> >but it might not be able to process the text (e.g. viewing diffs or
> >changing line endings).
> >
> >Add an attribute to tell Git what encoding the user has defined for a
> >given file. If the content is added to the index, then Git converts the
> >content to a canonical UTF-8 representation. On checkout Git will
> >reverse the conversion.
> >
> >Reviewed-by: Patrick Lühne 
> >Signed-off-by: Lars Schneider 
> >---
> >
> >Hi,
> >
> >here is a WIP patch to add text encoding support for files encoded with
> >something other than UTF-8 [RFC].
> >
> >The 'encoding' attribute is already used to view blobs in gitk. That
> >could be a problem as the content is stored in Git with the defined
> >encoding. This patch would interpret the content as UTF-8 encoded and
> 
> This will be a major drawback for me because my code base stores text files
> that are not UTF-8 encoded. And I do use the existing 'encoding' attribute
> to view the text in git-gui and gitk. Repurposing this attribute name is not
> an option, IMO.

Just to confirm my missing knowledge here:
Does this mean, that git-gui and gitk can decode/reencode
the content of a file/blob, when the .gitattributes say so ?

If yes, would it make sense to enhance the "git diff" instead ?
"git diff --encoding" will pick up the commited encoding from
.attributes, convert it into UTF-8, and run the diff ?
We actually could enhance the "git diff" output with a single
line saying
"Git index-encoding=cp1251"
or so, which can be picked up by "git apply".

The advantage would be that we could continue to commit in UTF-16
as before, and avoid the glitches with .gitattributes, that Peff
pointed out.

Does this make sense ?

> 
> >it would try to reencode it to the defined encoding on checkout > Plus,
> >many repos define the attribute very broad (e.g. "*
> encoding=cp1251").

Is this a user mistake ?

> >These folks would see errors like these with my patch:
> > error: failed to encode 'foo.bar' from utf-8 to cp1251
> 
> -- Hannes

Re: [PATCH v1] convert: add support for 'encoding' attribute

2017-12-17 Thread Torsten Bögershausen

On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider 
> 
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).
> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.
> 
> Reviewed-by: Patrick Lühne 
> Signed-off-by: Lars Schneider 
> ---
> 
> Hi,
> 
> here is a WIP patch to add text encoding support for files encoded with
> something other than UTF-8 [RFC].
> 
> The 'encoding' attribute is already used to view blobs in gitk. That
> could be a problem as the content is stored in Git with the defined
> encoding. This patch would interpret the content as UTF-8 encoded and
> it would try to reencode it to the defined encoding on checkout.
> Plus, many repos define the attribute very broad (e.g. "* encoding=cp1251").
> These folks would see errors like these with my patch:
> error: failed to encode 'foo.bar' from utf-8 to cp1251
> 
> A quick search on GitHub reveals 2,722 repositories that use the
> 'encoding' attribute [1]. Using the GitHub API [2], I identified the
> following encodings in all publicly accessible repositories:
> 
> ANSI<-- garbage (ignore by my implementation)
> cp1251
> cp866
> cp950
> iso8859-1
> koi8-r
> shiftjis<-- garbage (ignore by my implementation)
> UTF-8   <-- unnecessary (ignore by my implementation)
> utf8<-- garbage (ignore by my implementation)
> 
> TODOs:
> - The iconv docs mention that "errno == EINVAL" is harmless but
>   we don't handle that case in utf8.c:reencode_string_iconv()
> - Git does not compile with NO_ICONV=1 right now because of
>   compat/precompose_utf8.c. I will send a patch to fix that.

Minor: Does Git not compile under MacOS when setting NO_ICONV=1 ?
This is possible not introduced by the patch. 

> 
> Questions:
> - Should I use warning() or error() in the patch?
>   (currently I use the warning)

Errors, see below.

> - Do you agree with the approach in general?

Yes, some comments inline.

> 
> Thanks,
> Lars
> 
> 
> [RFC] 
> http://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.com
>   [1] 
> https://github.com/search?p=1=encoding+filename%3Agitattributes=Code=%E2%9C%93
>   [2] curl --user larsxschneider:secret -H 'Accept: 
> application/vnd.github.v3.text-match+json' 
> 'https://api.github.com/search/code?q=encoding+in:file+filename:gitattributes'
>  | jq -r '.items[].text_matches[].fragment' | sed 's/.*encoding=//' | sort | 
> uniq
>   [3] 
> https://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html#iconv-Examples
> 
> 
> 
> 
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/larsxschneider/git/commit/afc9e88a4d
> Checkout: git fetch https://github.com/larsxschneider/git encoding-v1 && 
> git checkout afc9e88a4d
> 
>  Documentation/gitattributes.txt |  27 ++
>  convert.c   | 192 
> +++-
>  t/t0028-encoding.sh |  60 +
>  3 files changed, 278 insertions(+), 1 deletion(-)
>  create mode 100755 t/t0028-encoding.sh
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 30687de81a..84de2fa49c 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -146,6 +146,33 @@ Unspecified::
>  Any other value causes Git to act as if `text` has been left
>  unspecified.
> 
> +`encoding`
> +^^
> +
> +By default Git assumes UTF-8 encoding for text files.  The `encoding`

This is probably "ASCII" and it's supersets like ISO-8859-1 or UTF-8.

> +attribute sets the encoding to be used in the working directory.
> +If the path is added to the index, then Git encodes the content to
> +UTF-8.  On checkout the content is encoded back to the original
> +encoding.  Consequently, you can use all built-in Git text processing
> +tools (e.g. git diff, line ending conversions, etc.) with your
> +non-UTF-8 encoded file.
> +
> +Please note that re-encoding content can cause errors and requires
> +resources. Use the `encoding` attribute only if you cannot store
> +a file in UTF-8 encoding and if you want Git to be able to process
> +the text.
> +
> +
> +*.txttext encoding=UTF-16
> +

I think that encoding (or worktree-encoding as said elsewhere) must imply text.
(That is not done in convert.c, but assuming binary or even auto
does not make much sense to me)


> +
> +All `iconv` encodings with a stable round-trip conversion to and from
> +UTF-8

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1221 matches

Mail list logo