Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-03 Thread Junio C Hamano
Jeff King  writes:

> One thing I almost did in the example I gave above was to literally call
> the encoding name by a "real" one. I.e.:
>
>   echo '*.txt working-tree-encoding=iso-8859-1' >.gitattributes
>   git config encoding.iso-8859-1.replace latin1
>
> or something. But I wondered if it was a little crazy as a practice,
> since mapping "iso-8859-1" to "utf-8" is probably going to lead to
> headaches.
>
> But your example above of semantically equivalent variants with
> different spellings would be a good use of that trick.

Yeah, I think the above looks quite sensible.


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Jeff King
On Mon, Jul 02, 2018 at 04:09:32PM +0200, Lars Schneider wrote:

> Brian had a good argument [1] for an even more flexible system
> proposed by Peff:
> 
> 
> 1) We allow users to define custom encoding mappings in their Git config. 
> Example:
> 
> git config --global core.encoding.myenc UTF-16

I think this should be encoding.myenc.something. In Git's config format,
only the subsection names (the middle of a three-dot name) are
unconstrained. So even if encoding.myenc only ever has one key
("replace" or "useInstead" or whatever you want to call it), there's
value in organizing the namespace that way.

And as a bonus, it leaves room for extending the feature later if we do
need more keys.

-Peff


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Jeff King
On Sun, Jul 01, 2018 at 05:56:58PM +, brian m. carlson wrote:

> On Thu, Jun 28, 2018 at 01:27:07PM -0400, Jeff King wrote:
> > Yeah, that was along the lines that I was thinking. I wonder if anybody
> > would ever need two such auto-encodings, though. Probably not. But
> > another way to think about it would be to allow something like:
> > 
> >   working-tree-encoding=foo
> > 
> > and then in your config "foo" to map to some encoding.
> > 
> > But that may be over-engineering, I dunno. utf8 has always been enough
> > for me. :)
> 
> I had a thought the other day about why this solution might be valuable.
> Different platforms encode different values for iconv character sets.
> So, for example, one may have platforms supporting some disjoint sets of
> the following:
> 
> * LATIN-1
> * LATIN1
> * ISO8859-1
> * ISO-8859-1
> * ISO_8859-1
> * ISO_8859-1:1987
> * some lowercase variants of these
> 
> Therefore, specifying a working-tree-encoding value that works across a
> wide variety of system may be non-trivial.  This is less of a problem
> with UTF-8, but having the ability to pick an encoding and remap it to a
> supported value may be useful nevertheless.

One thing I almost did in the example I gave above was to literally call
the encoding name by a "real" one. I.e.:

  echo '*.txt working-tree-encoding=iso-8859-1' >.gitattributes
  git config encoding.iso-8859-1.replace latin1

or something. But I wondered if it was a little crazy as a practice,
since mapping "iso-8859-1" to "utf-8" is probably going to lead to
headaches.

But your example above of semantically equivalent variants with
different spellings would be a good use of that trick.

It also makes me wonder if there's another layer of indirection
somewhere in the iconv machinery we could be taking advantage of to
accomplish the same thing.  Probably not conveniently or portably, I
guess.

-Peff


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Lars Schneider
> -Lars Schneider  wrote: -
> To: Jeff King 
> From: Lars Schneider 
> Date: 06/28/2018 18:21
> Cc: "brian m. carlson" , Steve Groeger 
> , git@vger.kernel.org
> Subject: Re: Use of new .gitattributes working-tree-encoding attribute across 
> different platform types
> 
> 
>> On Jun 28, 2018, at 4:34 PM, Jeff King  wrote:
>> 
>> On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote:
>> 
>>> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
>>>> We have common code that is supposed to be usable across different 
>>>> platforms and hence different file encodings. With the full support of the 
>>>> working-tree-encoding in the latest version of git on all platforms, how 
>>>> do we have files converted to different encodings on different platforms?
>>>> I could not find anything that would allow us to say 'if platform = z/OS 
>>>> then encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be 
>>>> done?
>>> 
>>> I don't believe there is such functionality.  Git doesn't have
>>> attributes that are conditional on the platform in that sort of way.
>>> You could use a smudge/clean filter and adjust the filter for the
>>> platform you're on, which might meet your needs.
>> 
>> We do have prior art in the line-ending code, though. There the
>> attributes say either that a file needs a specific line-ending type
>> (which is relatively rare), or that it should follow the system type,
>> which is then set separately in the config.
>> 
>> I have the impression that the working-tree-encoding stuff was made to
>> handle the first case, but not the second. It doesn't seem like an
>> outrageous thing to eventually add.
>> 
>> (Though I agree that clean/smudge filters would work, and can even
>> implement the existing working-tree-encoding feature, albeit less
>> efficiently and conveniently).
> 
> Thanks for the suggestion Peff! 
> How about this:
> 
> 1) We allow users to set the encoding "auto". Example:
> 
>   *.txt working-tree-encoding=auto
> 
> 2) We define a new variable `core.autoencoding`. By default the value is 
> UTF-8 (== no re-encoding) but user can set to any value in their Git config. 
> Example:
> 
>git config --global core.autoencoding UTF-16
> 
> All files marked with the value "auto" will use the encoding defined in
> `core.autoencoding`.
> 
> Would that work?
> 
> @steve: Would that fix your problem?


On Jul 2, 2018, at 2:13 PM, Steve Groeger  wrote:
> 
> I think this proposed solution may resolve my issue.

Thanks for the confirmation!

Brian had a good argument [1] for an even more flexible system
proposed by Peff:


1) We allow users to define custom encoding mappings in their Git config. 
Example:

git config --global core.encoding.myenc UTF-16


2) Users can reuse these mappings in ther .gitattributes files:

*.txt working-tree-encoding=myenc


Does this idea look good to everyone?

Thanks,
Lars


[1] 
https://public-inbox.org/git/20180701175657.gc7...@genre.crustytoothpaste.net/


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Steve Groeger
Lars, 

I think this proposed solution may resolve my issue.
 
 
  
 
Thanks
 Steve Groeger
 Java Runtimes Development
 IBM Hursley
 IBM United Kingdom Ltd
 Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
 Fax (44) 1962 816800
 Lotus Notes: Steve Groeger/UK/IBM
 Internet: groe...@uk.ibm.com  
   
 
Unless stated otherwise above:
 IBM United Kingdom Limited - Registered in England and Wales with number 
741598.
 Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 
 

-Lars Schneider  wrote: -
To: Jeff King 
From: Lars Schneider 
Date: 06/28/2018 18:21
Cc: "brian m. carlson" , Steve Groeger 
, git@vger.kernel.org
Subject: Re: Use of new .gitattributes working-tree-encoding attribute across 
different platform types


> On Jun 28, 2018, at 4:34 PM, Jeff King  wrote:
> 
> On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote:
> 
>> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
>>> We have common code that is supposed to be usable across different 
>>> platforms and hence different file encodings. With the full support of the 
>>> working-tree-encoding in the latest version of git on all platforms, how do 
>>> we have files converted to different encodings on different platforms?
>>> I could not find anything that would allow us to say 'if platform = z/OS 
>>> then encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be 
>>> done?
>> 
>> I don't believe there is such functionality.  Git doesn't have
>> attributes that are conditional on the platform in that sort of way.
>> You could use a smudge/clean filter and adjust the filter for the
>> platform you're on, which might meet your needs.
> 
> We do have prior art in the line-ending code, though. There the
> attributes say either that a file needs a specific line-ending type
> (which is relatively rare), or that it should follow the system type,
> which is then set separately in the config.
> 
> I have the impression that the working-tree-encoding stuff was made to
> handle the first case, but not the second. It doesn't seem like an
> outrageous thing to eventually add.
> 
> (Though I agree that clean/smudge filters would work, and can even
> implement the existing working-tree-encoding feature, albeit less
> efficiently and conveniently).

Thanks for the suggestion Peff! 
How about this:

1) We allow users to set the encoding "auto". Example:

*.txt working-tree-encoding=auto

2) We define a new variable `core.autoencoding`. By default the value is 
UTF-8 (== no re-encoding) but user can set to any value in their Git config. 
Example:

git config --global core.autoencoding UTF-16

All files marked with the value "auto" will use the encoding defined in
`core.autoencoding`.

Would that work?

@steve: Would that fix your problem?

- Lars
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-01 Thread brian m. carlson
On Thu, Jun 28, 2018 at 01:27:07PM -0400, Jeff King wrote:
> Yeah, that was along the lines that I was thinking. I wonder if anybody
> would ever need two such auto-encodings, though. Probably not. But
> another way to think about it would be to allow something like:
> 
>   working-tree-encoding=foo
> 
> and then in your config "foo" to map to some encoding.
> 
> But that may be over-engineering, I dunno. utf8 has always been enough
> for me. :)

I had a thought the other day about why this solution might be valuable.
Different platforms encode different values for iconv character sets.
So, for example, one may have platforms supporting some disjoint sets of
the following:

* LATIN-1
* LATIN1
* ISO8859-1
* ISO-8859-1
* ISO_8859-1
* ISO_8859-1:1987
* some lowercase variants of these

Therefore, specifying a working-tree-encoding value that works across a
wide variety of system may be non-trivial.  This is less of a problem
with UTF-8, but having the ability to pick an encoding and remap it to a
supported value may be useful nevertheless.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-28 Thread Jeff King
On Thu, Jun 28, 2018 at 07:21:18PM +0200, Lars Schneider wrote:

> How about this:
> 
> 1) We allow users to set the encoding "auto". Example:
> 
>   *.txt working-tree-encoding=auto
> 
> 2) We define a new variable `core.autoencoding`. By default the value is 
> UTF-8 (== no re-encoding) but user can set to any value in their Git config. 
> Example:
> 
> git config --global core.autoencoding UTF-16
> 
> All files marked with the value "auto" will use the encoding defined in
> `core.autoencoding`.
> 
> Would that work?

Yeah, that was along the lines that I was thinking. I wonder if anybody
would ever need two such auto-encodings, though. Probably not. But
another way to think about it would be to allow something like:

  working-tree-encoding=foo

and then in your config "foo" to map to some encoding.

But that may be over-engineering, I dunno. utf8 has always been enough
for me. :)

-Peff


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-28 Thread Lars Schneider



> On Jun 28, 2018, at 4:34 PM, Jeff King  wrote:
> 
> On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote:
> 
>> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
>>> We have common code that is supposed to be usable across different 
>>> platforms and hence different file encodings. With the full support of the 
>>> working-tree-encoding in the latest version of git on all platforms, how do 
>>> we have files converted to different encodings on different platforms?
>>> I could not find anything that would allow us to say 'if platform = z/OS 
>>> then encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be 
>>> done?
>> 
>> I don't believe there is such functionality.  Git doesn't have
>> attributes that are conditional on the platform in that sort of way.
>> You could use a smudge/clean filter and adjust the filter for the
>> platform you're on, which might meet your needs.
> 
> We do have prior art in the line-ending code, though. There the
> attributes say either that a file needs a specific line-ending type
> (which is relatively rare), or that it should follow the system type,
> which is then set separately in the config.
> 
> I have the impression that the working-tree-encoding stuff was made to
> handle the first case, but not the second. It doesn't seem like an
> outrageous thing to eventually add.
> 
> (Though I agree that clean/smudge filters would work, and can even
> implement the existing working-tree-encoding feature, albeit less
> efficiently and conveniently).

Thanks for the suggestion Peff! 
How about this:

1) We allow users to set the encoding "auto". Example:

*.txt working-tree-encoding=auto

2) We define a new variable `core.autoencoding`. By default the value is 
UTF-8 (== no re-encoding) but user can set to any value in their Git config. 
Example:

git config --global core.autoencoding UTF-16

All files marked with the value "auto" will use the encoding defined in
`core.autoencoding`.

Would that work?

@steve: Would that fix your problem?

- Lars

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-28 Thread Jeff King
On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote:

> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
> > We have common code that is supposed to be usable across different 
> > platforms and hence different file encodings. With the full support of the 
> > working-tree-encoding in the latest version of git on all platforms, how do 
> > we have files converted to different encodings on different platforms?
> > I could not find anything that would allow us to say 'if platform = z/OS 
> > then encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be 
> > done?
> 
> I don't believe there is such functionality.  Git doesn't have
> attributes that are conditional on the platform in that sort of way.
> You could use a smudge/clean filter and adjust the filter for the
> platform you're on, which might meet your needs.

We do have prior art in the line-ending code, though. There the
attributes say either that a file needs a specific line-ending type
(which is relatively rare), or that it should follow the system type,
which is then set separately in the config.

I have the impression that the working-tree-encoding stuff was made to
handle the first case, but not the second. It doesn't seem like an
outrageous thing to eventually add.

(Though I agree that clean/smudge filters would work, and can even
implement the existing working-tree-encoding feature, albeit less
efficiently and conveniently).

-Peff


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread brian m. carlson
On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
> We have common code that is supposed to be usable across different platforms 
> and hence different file encodings. With the full support of the 
> working-tree-encoding in the latest version of git on all platforms, how do 
> we have files converted to different encodings on different platforms?
> I could not find anything that would allow us to say 'if platform = z/OS then 
> encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?

I don't believe there is such functionality.  Git doesn't have
attributes that are conditional on the platform in that sort of way.
You could use a smudge/clean filter and adjust the filter for the
platform you're on, which might meet your needs.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread Torsten Bögershausen
On 27.06.18 09:54, Steve Groeger wrote:
> Hi, 
> 
> Sorry for incomplete post earlier. Here is the full post:
> 
> 
> In the latest version of git a new attribute has been added, 
> working-tree-encoding. The release notes states: 
> 
> 'The new "working-tree-encoding" attribute can ask Git to convert the
>contents to the specified encoding when checking out to the working
>tree (and the other way around when checking in).'
>  We have been using this attribute on our z/OS systems using a version of git 
> from Rocket software to convert files to EBCDIC for quite a while now. On 
> other platforms (Linux, AIX etc) git ignored this attribute and therefore 
> left the files in ASCII.
> 
> We have common code that is supposed to be usable across different platforms 
> and hence different file encodings. With the full support of the 
> working-tree-encoding in the latest version of git on all platforms, how do 
> we have files converted to different encodings on different platforms?
> I could not find anything that would allow us to say 'if platform = z/OS then 
> encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?
>  
>  
>   
>  
> Thanks
>  Steve Groeger
[]

Did you consider to put a gitattributes file on machine level ?

https://git-scm.com/docs/gitattributes

[snipped the other places where to put gitattributes]
...
Attributes for all users on a system should be placed in the 
$(prefix)/etc/gitattributes file.
















>  Java Runtimes Development
>  IBM Hursley
>  IBM United Kingdom Ltd
>  Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
>  Fax (44) 1962 816800
>  Lotus Notes: Steve Groeger/UK/IBM
>  Internet: groe...@uk.ibm.com  
>
>  
> Unless stated otherwise above:
>  IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598.
>  Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU   
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 



Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread Steve Groeger
Hi, 

Sorry for incomplete post earlier. Here is the full post:


In the latest version of git a new attribute has been added, 
working-tree-encoding. The release notes states: 

'The new "working-tree-encoding" attribute can ask Git to convert the
   contents to the specified encoding when checking out to the working
   tree (and the other way around when checking in).'
 We have been using this attribute on our z/OS systems using a version of git 
from Rocket software to convert files to EBCDIC for quite a while now. On other 
platforms (Linux, AIX etc) git ignored this attribute and therefore left the 
files in ASCII.

We have common code that is supposed to be usable across different platforms 
and hence different file encodings. With the full support of the 
working-tree-encoding in the latest version of git on all platforms, how do we 
have files converted to different encodings on different platforms?
I could not find anything that would allow us to say 'if platform = z/OS then 
encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done?
 
 
  
 
Thanks
 Steve Groeger
 Java Runtimes Development
 IBM Hursley
 IBM United Kingdom Ltd
 Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
 Fax (44) 1962 816800
 Lotus Notes: Steve Groeger/UK/IBM
 Internet: groe...@uk.ibm.com  
   
 
Unless stated otherwise above:
 IBM United Kingdom Limited - Registered in England and Wales with number 
741598.
 Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 
 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-27 Thread Steve Groeger
I could not find anything that would allow us to say 'if platform = z/OS then 
encoding=EBCDIC else encoding=ASCII'.   Is there a way this can be done? 
 
Thanks
 Steve Groeger
 Java Runtimes Development
 IBM Hursley
 IBM United Kingdom Ltd
 Tel: (44) 1962 816911 Mobex: 279990 Mobile: 07718 517 129
 Fax (44) 1962 816800
 Lotus Notes: Steve Groeger/UK/IBM
 Internet: groe...@uk.ibm.com  
   
 
Unless stated otherwise above:
 IBM United Kingdom Limited - Registered in England and Wales with number 
741598.
 Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 
 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU