Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Ok, I can give you permissions on the wiki. What is your username on the haskell-prime wiki? Great! My haskell-prime username is roelvandijk. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On Thu, 2011-04-07 at 15:44 +0200, Roel van Dijk wrote: On 7 April 2011 14:11, Duncan Coutts duncan.cou...@googlemail.com wrote: I would be happy to work with you and others to develop the report text for such a proposal. I posted my first draft already :-) What would be a good way to proceed? Looking at the process I think we should create a wiki page and a ticket for this proposal. If necessary I'll volunteer to be the proposal owner. Ok, I can give you permissions on the wiki. What is your username on the haskell-prime wiki? Duncan ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 6 April 2011 15:13, Duncan Coutts duncan.cou...@googlemail.com wrote: So since the goal is interoperability of source files then perhaps we should also have a section somewhere with interoperability guidelines for implementations that do store Haskell programs as OS files. I think a set of interoperability guidelines is a great idea. It seems these guidelines are already followed by GHC, Cabal, Hackage, Jhc and possibly others. Shall we consider this the proposal instead of just the encoding part? ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Hi, Jason Reich wrote: Tillmann Rendel wrote: How would that affect the non-code parts of literate Haskell (*.lhs) files? In particular, would it place any burden on third-party tools processing these files? lhs2TeX already has limited support for UTF-8 for the rendering of Literate Agda files. My point is that literate Haskell programs are not just Haskell files, but also, for example, markdown or latex files, or even database entries representing a wiki page or a blog entry. Such programs are therefore processed by third-party tools outside of the Haskell eco-system, and it seems unrealistic that the Haskell report could unilateraly mandate how they are encoded. I think the Haskell report should not discourage Haskell implementations from being flexible about encoding. Tillmann ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Roel == Roel van Dijk vandijk.r...@gmail.com writes: Roel On 6 April 2011 20:42, Colin Paul Adams co...@colina.demon.co.uk wrote: Roel It seems you have a problem with the word allowed. What do Roel you think of the interoperability guidelines as proposed by Roel Duncan? They are less stringent while having the same Roel intention as my original proposal. I think they are fine. -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 7 April 2011 11:29, Christian Maeder christian.mae...@dfki.de wrote: I agree that Haskell files should be UTF-8, but I also agree that it is only relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or higher. It is relevant for all tools and systems which process Haskell sources. The motivation for this proposal can only be that future cabal packages will use more and more non-ASCII characters as is possible via http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and LANGUAGE pragma UnicodeSyntax (that happens to have no support for \ as lambda symbol - probably because lambda is a letter and no symbol!) The motivation for this proposal is interoperability of all tools and systems which process Haskell source files. Perhaps I could have made that more clear. However, I think, these extra characters only make sense for corner cases and should not be recommended for general purposes. Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs I have many more like that. I do not consider Chinese a corner case. Nor the vast amount of languages which can not be represented using ASCII. So my view is: Stick to ASCII and only if you must (not just for casual reasons) use UTF-8. When to use certain characters is not part of the proposal. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Am 07.04.2011 13:09, schrieb Roel van Dijk: Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs Great, that file made my firefox open infinitely many tabs (so that I had to close it). C. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 7 April 2011 14:33, Doug McIlroy d...@cs.dartmouth.edu wrote: This supposition is unwarranted. We have all seen relative naming systems that run both ways: a.b.c versus c(b(a)). And Haskellites would simplify the latter to c$b$a. Secondary storage may be organized by files, segments, objects, etc. Combinations of these notions have been created in order to cater for legacy languages that depend on particular models. It is a step too far to try to predict how Haskell modules will be adopted into every possible naming environment. The proposal doesn't try to regulate the use of Haskell modules in every possible naming environment. Just file systems. And there only as a set of guidelines. To quote Duncan Coutts previously in this thread: I hope I was clear in the example text that the interoperability guidelines were not forcing implementations to use files etc, just that if they do, if they uses these conventions then sources will be portable between implementations. It doesn't stop an implementation using URLs, sticking multiple modules a file or keeping modules in a database. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Am 07.04.2011 13:09, schrieb Roel van Dijk: Please take a look at the following file: http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points. C. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 7 April 2011 15:03, Christian Maeder christian.mae...@dfki.de wrote: The code would not suffer much if it were pure ASCII. I would prefer (ascii) haddock links to explain the various code points. The code in question contains Chinese characters like '三', which in a US-ASCII encoded Haskell file must be written as '\x4e09'. I do not consider these escape sequences an acceptable substitute. But this discussion is tangential to the proposal. I am interested in having a common set of guidelines to ensure interoperability of Haskell sources. An important part of that is having a common method of decoding files containing Haskell code. The easiest way to achieve that is using only 1 encoding. UTF-8 is the best candidate for that role. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 7 April 2011 14:11, Duncan Coutts duncan.cou...@googlemail.com wrote: I would be happy to work with you and others to develop the report text for such a proposal. I posted my first draft already :-) What would be a good way to proceed? Looking at the process I think we should create a wiki page and a ticket for this proposal. If necessary I'll volunteer to be the proposal owner. ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Tillmann Rendel wrote: How would that affect the non-code parts of literate Haskell (*.lhs) files? In particular, would it place any burden on third-party tools processing these files? lhs2TeX already has limited support for UTF-8 for the rendering of Literate Agda files. Jason ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 4 April 2011 23:48, Roel van Dijk vandijk.r...@gmail.com wrote: * Proposal The Haskell 2010 language specification states that: Haskell uses the Unicode character set [2]. It does not state what encoding should be used. This means, strictly speaking, it is not possible to reliably exchange Haskell source files on the byte level. I propose to make UTF-8 the only allowed encoding for Haskell source files. Implementations must discard an initial Byte Order Mark (BOM) if present [3]. * Next step Discussion! There was already some discussion on the haskell-cafe mailing list [7]. This is a simple and obviously sensible proposal. I'm certainly in favour. I think the only area where there might be some issue to discuss is the language of the report. As far as I can see, the report does not require that modules exist as files, does not require the .hs extension and does not give the standard mapping from module name to file name. So since the goal is interoperability of source files then perhaps we should also have a section somewhere with interoperability guidelines for implementations that do store Haskell programs as OS files. The section would describe the one module per file convention, the .hs extension (this is already obliquely mentioned in the section on literate Haskell syntax) and the mapping of module names to file names in common OS file systems. Then this UTF8 stipulation could go there (and it would be clear that it applies only to conventional implementations that store Haskell programs as files). e.g. Interoperability Guidelines This Report does not specify how Haskell programs are represented or stored. There is however a conventional representation using OS files. Implementations that conform to these guidelines will benefit from the portability of Haskell program representations. Haskell modules are stored as files, one module per file. These Haskell source files are given the file extension .hs for usual Haskell files and .lhs for literate Haskell files (see section 10.4). Source files must be encoded as UTF-8 \cite{utf8}. Implementations must discard an initial Byte Order Mark (BOM) if present. To find a source file corresponding to a module name used in an import declaration, the following mapping from module name to OS file name is used. The '.' character is mapped to the OS's directory separator string while all other characters map to themselves. The .hs or .lhs extension is added. Where both .hs and .lhs files exist for the same module, the .lhs one should be used. The OS's standard convention for representing Unicode file names should be used. For example, on a UNIX based OS, the module A.B would map to the file name A/B.hs for a normal Haskell file or to A/B.lhs for a literate Haskell file. Note that because it is rare for a Main module to be imported, there is no restriction on the name of the file containing the Main module. It is conventional, but not strictly necessary, that the Main module use the .hs or .lhs extension. Duncan ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On Wed, Apr 6, 2011 at 2:13 PM, Duncan Coutts duncan.cou...@googlemail.com wrote: Interoperability Guidelines [...] To find a source file corresponding to a module name used in an import declaration, the following mapping from module name to OS file name is used. The '.' character is mapped to the OS's directory separator string while all other characters map to themselves. The .hs or .lhs extension is added. Where both .hs and .lhs files exist for the same module, the .lhs one should be used. The OS's standard convention for representing Unicode file names should be used. This standard isn't quite universal. For example, jhc will look for Data.Foo in Data/Foo.hs but also Data.Foo.hs [1]. We could take this as an opportunity to discuss that practice, or we could try to make the changes to the report orthogonal to that issue. In some sense I think it's cute that the Report doesn't specify anything about how Haskell modules are stored or represented, but I don't think that freedom is actually used, so I'm happy to see it go. I'd think, though, that in that case there would be more to discuss than just the encoding, so if we could separate out the issues here, I think that would be useful. [1]: http://repetae.net/computer/jhc/manual.html#module-search-path ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
I forgot to CC the list: Roel == Roel van Dijk vandijk.r...@gmail.com writes: Roel I propose to make UTF-8 the only allowed encoding for Haskell Roel source files. Implementations must discard an initial Byte Roel Order Mark (BOM) if present [3]. Roel * Pros - Ensures that Haskell source can be reliably exchanged Roel on the byte level. - Disallows implicit ISO-8859-* encodings Roel in source code, ensuring portability. - Little or no Roel implementation burden for compiler writers. Having thought this over a bit more, I don't think it's a good idea. Allowed? Allowed for what? What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate. I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit). -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On Wed, 2011-04-06 at 16:09 +0100, Ben Millwood wrote: On Wed, Apr 6, 2011 at 2:13 PM, Duncan Coutts duncan.cou...@googlemail.com wrote: Interoperability Guidelines [...] To find a source file corresponding to a module name used in an import declaration, the following mapping from module name to OS file name is used. The '.' character is mapped to the OS's directory separator string while all other characters map to themselves. The .hs or .lhs extension is added. Where both .hs and .lhs files exist for the same module, the .lhs one should be used. The OS's standard convention for representing Unicode file names should be used. This standard isn't quite universal. For example, jhc will look for Data.Foo in Data/Foo.hs but also Data.Foo.hs [1]. We could take this as an opportunity to discuss that practice, or we could try to make the changes to the report orthogonal to that issue. Indeed. But it's true to say that if you do support the common convention then you get portability. This does not preclude JHC from supporting something extra, but sources that take advantage of JHC's extension are not portable to implementations that just use the common convention. In some sense I think it's cute that the Report doesn't specify anything about how Haskell modules are stored or represented, but I don't think that freedom is actually used, so I'm happy to see it go. I'd think, though, that in that case there would be more to discuss than just the encoding, so if we could separate out the issues here, I think that would be useful. It's not going. I hope I was clear in the example text that the interoperability guidelines were not forcing implementations to use files etc, just that if they do, if they uses these conventions then sources will be portable between implementations. It doesn't stop an implementation using URLs, sticking multiple modules in a file or keeping modules in a database. Duncan ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
On 6 April 2011 17:34, Colin Paul Adams co...@colina.demon.co.uk wrote: I forgot to CC the list: Roel == Roel van Dijk vandijk.r...@gmail.com writes: Roel I propose to make UTF-8 the only allowed encoding for Haskell Roel source files. Implementations must discard an initial Byte Roel Order Mark (BOM) if present [3]. Roel * Pros - Ensures that Haskell source can be reliably exchanged Roel on the byte level. - Disallows implicit ISO-8859-* encodings Roel in source code, ensuring portability. - Little or no Roel implementation burden for compiler writers. Having thought this over a bit more, I don't think it's a good idea. Allowed? Allowed for what? Allowed to be called a Haskell file. If the report doesn't specify what a Haskell file is then we can't reliably exchange Haskell source files by only looking at the files themselves. What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate. Suppose I give you MyHaskellFile.hs. But before telling you how it's encoded I go gliding (a hobby of mine). Unfortunately I crash my glider and die :-(. Now what encoding option do you give to your front-end script? I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit). That would only allow users of Hackage and Cabal to reliably exchange their Haskell files. If we specify it in the report every user can benefit. Regards, Bas ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime
Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files
Bas == Bas van Dijk v.dijk@gmail.com writes: Bas On 6 April 2011 17:34, Colin Paul Adams co...@colina.demon.co.uk wrote: Allowed? Allowed for what? Bas Allowed to be called a Haskell file. Well, what the report says on that is irrelevant. If I see a file containing Haskell code, I shall call it a Haskell file, irrespective. I suspect I will be in the majority. Bas If the report doesn't specify what a Haskell file is then we Bas can't reliably exchange Haskell source files by only looking at Bas the files themselves. Sure we can. What does it achieve? Nothing, as far as I can see. Authors will still be able to write their Haskell code in any encoding they like. And any compiler can have a front-end script with an option to specify the encoding used by source files, which simply uses iconv on the fly to translate. Bas Suppose I give you MyHaskellFile.hs. But before telling you how Bas it's encoded I go gliding (a hobby of mine). Unfortunately I Bas crash my glider and die :-(. Now what encoding option do you Bas give to your front-end script? Whatever the encoding happens to be. That won't be hard to find out. And presumably Haskell programmers don't dies so very frequently that it will become a time-consuming affair. I think the real place to mandate UTF-8 would be for Hackage. That's where it matters (an alternative design would be to add an encoding field in the .cabal file, but I don't think this has much merit). Bas That would only allow users of Hackage and Cabal to reliably Bas exchange their Haskell files. If we specify it in the report Bas every user can benefit. There is no benefit that I see. Anyone is free to write Haskell code in whatever encoding they fancy. Irrespective of what the report says. It's not going to have the force of law. -- Colin Adams Preston Lancashire () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime