[PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread John Keeping
My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
causes t4205 to fail by counting bytes instead of UTF-8 codepoints.

Instead of using sed for this, use Perl which behaves predictably
whatever locale is in use.

Signed-off-by: John Keeping j...@keeping.me.uk
---
This patch is on top of 'as/log-output-encoding-in-user-format'.

 t/t4205-log-pretty-formats.sh | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 3cfb744..5864f5b 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -20,9 +20,7 @@ commit_msg () {
# cut string, replace cut part with two dots
# $2 - chars count from the beginning of the string
# $3 - trailing chars
-   # LC_ALL is set to make `sed` interpret . as a UTF-8 char not 
a byte
-   # as it does with C locale
-   msg=$(echo $msg | LC_ALL=en_US.UTF-8 sed -e 
s/^\(.\{$2\}\)$3/\1../)
+   msg=$(echo $msg | $PERL_PATH -CIO -pe s/^(.{$2})$3/\1../)
fi
echo $msg
 }
@@ -205,7 +203,7 @@ test_expect_success 'left alignment formatting with ltrunc' 

 ..sage two
 ..sage one
 add bar  Z
-$(commit_msg  0 .\{11\})
+$(commit_msg  0 .{11})
 EOF
test_cmp expected actual
 
@@ -218,7 +216,7 @@ test_expect_success 'left alignment formatting with mtrunc' 

 mess.. two
 mess.. one
 add bar  Z
-$(commit_msg  4 .\{11\})
+$(commit_msg  4 .{11})
 EOF
test_cmp expected actual
 
-- 
1.8.3.1.747.g77f7d3a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread Alexey Shumkin
CC this to Johannes Sixt

On Wed, Jul 03, 2013 at 09:18:08PM +0100, John Keeping wrote:
 My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
 causes t4205 to fail by counting bytes instead of UTF-8 codepoints.
 
 Instead of using sed for this, use Perl which behaves predictably
 whatever locale is in use.
 
 Signed-off-by: John Keeping j...@keeping.me.uk
 ---
 This patch is on top of 'as/log-output-encoding-in-user-format'.
 
  t/t4205-log-pretty-formats.sh | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)
 
 diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
 index 3cfb744..5864f5b 100755
 --- a/t/t4205-log-pretty-formats.sh
 +++ b/t/t4205-log-pretty-formats.sh
 @@ -20,9 +20,7 @@ commit_msg () {
   # cut string, replace cut part with two dots
   # $2 - chars count from the beginning of the string
   # $3 - trailing chars
 - # LC_ALL is set to make `sed` interpret . as a UTF-8 char not 
 a byte
 - # as it does with C locale
 - msg=$(echo $msg | LC_ALL=en_US.UTF-8 sed -e 
 s/^\(.\{$2\}\)$3/\1../)
 + msg=$(echo $msg | $PERL_PATH -CIO -pe s/^(.{$2})$3/\1../)
   fi
   echo $msg
  }
 @@ -205,7 +203,7 @@ test_expect_success 'left alignment formatting with 
 ltrunc' 
  ..sage two
  ..sage one
  add bar  Z
 -$(commit_msg  0 .\{11\})
 +$(commit_msg  0 .{11})
  EOF
   test_cmp expected actual
  
 @@ -218,7 +216,7 @@ test_expect_success 'left alignment formatting with 
 mtrunc' 
  mess.. two
  mess.. one
  add bar  Z
 -$(commit_msg  4 .\{11\})
 +$(commit_msg  4 .{11})
  EOF
   test_cmp expected actual
  
 -- 
 1.8.3.1.747.g77f7d3a
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread Alexey Shumkin
http://thread.gmane.org/gmane.comp.version-control.git/229291

this is why CCed
 CC this to Johannes Sixt
 
 On Wed, Jul 03, 2013 at 09:18:08PM +0100, John Keeping wrote:
  My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
  causes t4205 to fail by counting bytes instead of UTF-8 codepoints.
  
  Instead of using sed for this, use Perl which behaves predictably
  whatever locale is in use.
  
  Signed-off-by: John Keeping j...@keeping.me.uk
  ---
  This patch is on top of 'as/log-output-encoding-in-user-format'.
  
   t/t4205-log-pretty-formats.sh | 8 +++-
   1 file changed, 3 insertions(+), 5 deletions(-)
  
  diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
  index 3cfb744..5864f5b 100755
  --- a/t/t4205-log-pretty-formats.sh
  +++ b/t/t4205-log-pretty-formats.sh
  @@ -20,9 +20,7 @@ commit_msg () {
  # cut string, replace cut part with two dots
  # $2 - chars count from the beginning of the string
  # $3 - trailing chars
  -   # LC_ALL is set to make `sed` interpret . as a UTF-8 char not 
  a byte
  -   # as it does with C locale
  -   msg=$(echo $msg | LC_ALL=en_US.UTF-8 sed -e 
  s/^\(.\{$2\}\)$3/\1../)
  +   msg=$(echo $msg | $PERL_PATH -CIO -pe s/^(.{$2})$3/\1../)
  fi
  echo $msg
   }
  @@ -205,7 +203,7 @@ test_expect_success 'left alignment formatting with 
  ltrunc' 
   ..sage two
   ..sage one
   add bar  Z
  -$(commit_msg  0 .\{11\})
  +$(commit_msg  0 .{11})
   EOF
  test_cmp expected actual
   
  @@ -218,7 +216,7 @@ test_expect_success 'left alignment formatting with 
  mtrunc' 
   mess.. two
   mess.. one
   add bar  Z
  -$(commit_msg  4 .\{11\})
  +$(commit_msg  4 .{11})
   EOF
  test_cmp expected actual
   
  -- 
  1.8.3.1.747.g77f7d3a
  
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread Junio C Hamano
John Keeping j...@keeping.me.uk writes:

 My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
 causes t4205 to fail by counting bytes instead of UTF-8 codepoints.

 Instead of using sed for this, use Perl which behaves predictably
 whatever locale is in use.

 Signed-off-by: John Keeping j...@keeping.me.uk
 ---
 This patch is on top of 'as/log-output-encoding-in-user-format'.

Thanks.  I think Alexey is going to send incremental updates to the
topic so I won't interfere by applying this patch on top of the
version I have in my tree.

But I do agree that using Perl may be a workable solution.

An alternative might be not to use this cryptic 3-arg form of
commit_msg at all.  They are used only for these three:

$(commit_msg  8 ..*$)
$(commit_msg  0 .\{11\})
$(commit_msg  4 .\{11\})

I somehow find them simply not readable, in order to figure out what
is going on.

Just using three variables to hold what are expected would be far
more portable and readable.

# anfänglich whatever it means.
sample_utf8_part=$(printf anf\303\244ng)

commit_msg () {
msg=initial. ${sample_utf8_part}lich;
if test -n $1
then
echo $msg | iconv -f utf-8 -t $1
else
echo $msg
fi
}

And then instead of writing in the expected test output.

$(commit_msg  8 ..*$)
$(commit_msg  0 .\{11\})
$(commit_msg  4 .\{11\})

we can just say

initial...
..an${sample_utf8_part}lich
init..lich

It is no worse than those cryptic 0, 4, 8 and 11 magic numbers we
see in the test, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread John Keeping
On Wed, Jul 03, 2013 at 02:41:06PM -0700, Junio C Hamano wrote:
 John Keeping j...@keeping.me.uk writes:
 
  My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
  causes t4205 to fail by counting bytes instead of UTF-8 codepoints.
 
  Instead of using sed for this, use Perl which behaves predictably
  whatever locale is in use.
 
  Signed-off-by: John Keeping j...@keeping.me.uk
  ---
  This patch is on top of 'as/log-output-encoding-in-user-format'.
 
 Thanks.  I think Alexey is going to send incremental updates to the
 topic so I won't interfere by applying this patch on top of the
 version I have in my tree.
 
 But I do agree that using Perl may be a workable solution.
 
 An alternative might be not to use this cryptic 3-arg form of
 commit_msg at all.  They are used only for these three:
 
   $(commit_msg  8 ..*$)
   $(commit_msg  0 .\{11\})
   $(commit_msg  4 .\{11\})
 
 I somehow find them simply not readable, in order to figure out what
 is going on.
 
 Just using three variables to hold what are expected would be far
 more portable and readable.
 
 # anfänglich whatever it means.
 sample_utf8_part=$(printf anf\303\244ng)
 
 commit_msg () {
   msg=initial. ${sample_utf8_part}lich;
   if test -n $1
   then
   echo $msg | iconv -f utf-8 -t $1
   else
   echo $msg
 fi
 }
 
 And then instead of writing in the expected test output.
 
   $(commit_msg  8 ..*$)
   $(commit_msg  0 .\{11\})
   $(commit_msg  4 .\{11\})
 
 we can just say
 
   initial...
 ..an${sample_utf8_part}lich
   init..lich
 
 It is no worse than those cryptic 0, 4, 8 and 11 magic numbers we
 see in the test, no?

That's probably better since we don't need to rely on some other tool
getting it right.

Alexey, will you incorporate this change in your incremental updates?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread Alexey Shumkin
On Wed, Jul 03, 2013 at 02:41:06PM -0700, Junio C Hamano wrote:
 John Keeping j...@keeping.me.uk writes:
 
  My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
  causes t4205 to fail by counting bytes instead of UTF-8 codepoints.
 
  Instead of using sed for this, use Perl which behaves predictably
  whatever locale is in use.
 
  Signed-off-by: John Keeping j...@keeping.me.uk
  ---
  This patch is on top of 'as/log-output-encoding-in-user-format'.
 
 Thanks.  I think Alexey is going to send incremental updates to the
 topic so I won't interfere by applying this patch on top of the
 version I have in my tree.
 
 But I do agree that using Perl may be a workable solution.
 
 An alternative might be not to use this cryptic 3-arg form of
 commit_msg at all.  They are used only for these three:
 
   $(commit_msg  8 ..*$)
   $(commit_msg  0 .\{11\})
   $(commit_msg  4 .\{11\})
 
 I somehow find them simply not readable, in order to figure out what
 is going on.
 
 Just using three variables to hold what are expected would be far
 more portable and readable.
 
 # anfänglich whatever it means.
 sample_utf8_part=$(printf anf\303\244ng)
 
 commit_msg () {
   msg=initial. ${sample_utf8_part}lich;
   if test -n $1
   then
   echo $msg | iconv -f utf-8 -t $1
   else
   echo $msg
 fi
 }
 
 And then instead of writing in the expected test output.
 
   $(commit_msg  8 ..*$)
   $(commit_msg  0 .\{11\})
   $(commit_msg  4 .\{11\})
 
 we can just say
 
   initial...
 ..an${sample_utf8_part}lich
   init..lich
 
 It is no worse than those cryptic 0, 4, 8 and 11 magic numbers we
 see in the test, no?
Yep!
when I was thinking about Johannes's suggestions, I finally came to the decision
alike yours.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t4205: don't rely on en_US.UTF-8 locale existing

2013-07-03 Thread Alexey Shumkin
On Wed, Jul 03, 2013 at 10:53:03PM +0100, John Keeping wrote:
 On Wed, Jul 03, 2013 at 02:41:06PM -0700, Junio C Hamano wrote:
  John Keeping j...@keeping.me.uk writes:
  
   My system doesn't have the en_US.UTF-8 locale (or plain en_US), which
   causes t4205 to fail by counting bytes instead of UTF-8 codepoints.
  
   Instead of using sed for this, use Perl which behaves predictably
   whatever locale is in use.
  
   Signed-off-by: John Keeping j...@keeping.me.uk
   ---
   This patch is on top of 'as/log-output-encoding-in-user-format'.
  
  Thanks.  I think Alexey is going to send incremental updates to the
  topic so I won't interfere by applying this patch on top of the
  version I have in my tree.
  
  But I do agree that using Perl may be a workable solution.
  
  An alternative might be not to use this cryptic 3-arg form of
  commit_msg at all.  They are used only for these three:
  
  $(commit_msg  8 ..*$)
  $(commit_msg  0 .\{11\})
  $(commit_msg  4 .\{11\})
  
  I somehow find them simply not readable, in order to figure out what
  is going on.
  
  Just using three variables to hold what are expected would be far
  more portable and readable.
  
  # anfänglich whatever it means.
  sample_utf8_part=$(printf anf\303\244ng)
  
  commit_msg () {
  msg=initial. ${sample_utf8_part}lich;
  if test -n $1
  then
  echo $msg | iconv -f utf-8 -t $1
  else
  echo $msg
  fi
  }
  
  And then instead of writing in the expected test output.
  
  $(commit_msg  8 ..*$)
  $(commit_msg  0 .\{11\})
  $(commit_msg  4 .\{11\})
  
  we can just say
  
  initial...
  ..an${sample_utf8_part}lich
  init..lich
  
  It is no worse than those cryptic 0, 4, 8 and 11 magic numbers we
  see in the test, no?
 
 That's probably better since we don't need to rely on some other tool
 getting it right.
 
 Alexey, will you incorporate this change in your incremental updates?
Yes, of course!
Thank you for your additions
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html