Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-18 Thread Roel van Dijk
 Ok, I can give you permissions on the wiki. What is your username on the
 haskell-prime wiki?

Great! My haskell-prime username is roelvandijk.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-17 Thread Duncan Coutts
On Thu, 2011-04-07 at 15:44 +0200, Roel van Dijk wrote:
 On 7 April 2011 14:11, Duncan Coutts duncan.cou...@googlemail.com wrote:
  I would be happy to work with you and others to develop the report text
  for such a proposal. I posted my first draft already :-)
 
 What would be a good way to proceed? Looking at the process I think we
 should create a wiki page and a ticket for this proposal. If necessary
 I'll volunteer to be the proposal owner.

Ok, I can give you permissions on the wiki. What is your username on the
haskell-prime wiki?

Duncan


___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Roel van Dijk
On 6 April 2011 15:13, Duncan Coutts duncan.cou...@googlemail.com wrote:
 So since the goal is interoperability of source files then perhaps we
 should also have a section somewhere with interoperability guidelines
 for implementations that do store Haskell programs as OS files.

I think a set of interoperability guidelines is a great idea. It seems
these guidelines are already followed by GHC, Cabal, Hackage, Jhc and
possibly others.

Shall we consider this the proposal instead of just the encoding part?

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Tillmann Rendel

Hi,

Jason Reich wrote:

Tillmann Rendel wrote:

How would that affect the non-code parts of literate Haskell (*.lhs)
files? In particular, would it place any burden on third-party tools
processing these files?


lhs2TeX already has limited support for UTF-8 for the rendering of
Literate Agda files.


My point is that literate Haskell programs are not just Haskell files, 
but also, for example, markdown or latex files, or even database entries 
representing a wiki page or a blog entry. Such programs are therefore 
processed by third-party tools outside of the Haskell eco-system, and it 
seems unrealistic that the Haskell report could unilateraly mandate how 
they are encoded.


I think the Haskell report should not discourage Haskell implementations 
from being flexible about encoding.


  Tillmann

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Colin Paul Adams
 Roel == Roel van Dijk vandijk.r...@gmail.com writes:

Roel On 6 April 2011 20:42, Colin Paul Adams co...@colina.demon.co.uk 
wrote:

Roel It seems you have a problem with the word allowed. What do
Roel you think of the interoperability guidelines as proposed by
Roel Duncan? They are less stringent while having the same
Roel intention as my original proposal.

I think they are fine.
-- 
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Roel van Dijk
On 7 April 2011 11:29, Christian Maeder christian.mae...@dfki.de wrote:
 I agree that Haskell files should be UTF-8, but I also agree that it is only
 relevant for Hackage (and Cabal) and already enforced by ghc-6.12. or
 higher.

It is relevant for all tools and systems which process Haskell sources.

 The motivation for this proposal can only be that future cabal packages will
 use more and more non-ASCII characters as is possible via
 http://hackage.haskell.org/package/base-unicode-symbols-0.2.1.4 and
 LANGUAGE pragma UnicodeSyntax (that happens to have no support for \ as
 lambda symbol - probably because lambda is a letter and no symbol!)

The motivation for this proposal is interoperability of all tools and
systems which process Haskell source files. Perhaps I could have made
that more clear.

 However, I think, these extra characters only make sense for corner cases
 and should not be recommended for general purposes.

Please take a look at the following file:
http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs

I have many more like that. I do not consider Chinese a corner case.
Nor the vast amount of languages which can not be represented using
ASCII.

 So my view is: Stick to ASCII and only if you must (not just for casual
 reasons) use UTF-8.

When to use certain characters is not part of the proposal.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Christian Maeder

Am 07.04.2011 13:09, schrieb Roel van Dijk:

Please take a look at the following file:
http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs


Great, that file made my firefox open infinitely many tabs (so that I 
had to close it).


C.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Roel van Dijk
On 7 April 2011 14:33, Doug McIlroy d...@cs.dartmouth.edu wrote:
 This supposition is unwarranted.  We have all seen relative naming
 systems that run both ways: a.b.c versus c(b(a)). And Haskellites
 would simplify the latter to c$b$a.  Secondary storage may be
 organized by files, segments, objects, etc.  Combinations of these
 notions have been created in order to cater for legacy languages
 that depend on particular models.

 It is a step too far to try to predict how Haskell modules will
 be adopted into every possible naming environment.

The proposal doesn't try to regulate the use of Haskell modules in
every possible naming environment. Just file systems. And there only
as a set of guidelines.

To quote Duncan Coutts previously in this thread:

I hope I was clear in the example text that the
interoperability guidelines were not forcing implementations to use
files etc, just that if they do, if they uses these conventions then
sources will be portable between implementations.
It doesn't stop an implementation using URLs, sticking multiple modules
a file or keeping modules in a database.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Christian Maeder

Am 07.04.2011 13:09, schrieb Roel van Dijk:

Please take a look at the following file:
http://code.haskell.org/numerals/src/Text/Numeral/Language/ZH.hs


The code would not suffer much if it were pure ASCII. I would prefer 
(ascii) haddock links to explain the various code points.


C.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Roel van Dijk
On 7 April 2011 15:03, Christian Maeder christian.mae...@dfki.de wrote:
 The code would not suffer much if it were pure ASCII. I would prefer (ascii)
 haddock links to explain the various code points.

The code in question contains Chinese characters like '三', which in a
US-ASCII encoded Haskell file must be written as '\x4e09'. I do not
consider these escape sequences an acceptable substitute.

But this discussion is tangential to the proposal. I am interested in
having a common set of guidelines to ensure interoperability of
Haskell sources. An important part of that is having a common method
of decoding files containing Haskell code. The easiest way to achieve
that is using only 1 encoding. UTF-8 is the best candidate for that
role.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-07 Thread Roel van Dijk
On 7 April 2011 14:11, Duncan Coutts duncan.cou...@googlemail.com wrote:
 I would be happy to work with you and others to develop the report text
 for such a proposal. I posted my first draft already :-)

What would be a good way to proceed? Looking at the process I think we
should create a wiki page and a ticket for this proposal. If necessary
I'll volunteer to be the proposal owner.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Jason Reich

Tillmann Rendel wrote:

How would that affect the non-code parts of literate Haskell (*.lhs)
files? In particular, would it place any burden on third-party tools
processing these files?


lhs2TeX already has limited support for UTF-8 for the rendering of 
Literate Agda files.


Jason

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Duncan Coutts
On 4 April 2011 23:48, Roel van Dijk vandijk.r...@gmail.com wrote:
 * Proposal

 The Haskell 2010 language specification states that: Haskell uses the
 Unicode character set [2]. It does not state what encoding should be
 used. This means, strictly speaking, it is not possible to reliably
 exchange Haskell source files on the byte level.

 I propose to make UTF-8 the only allowed encoding for Haskell source
 files. Implementations must discard an initial Byte Order Mark (BOM)
 if present [3].

 * Next step

 Discussion! There was already some discussion on the haskell-cafe
 mailing list [7].

This is a simple and obviously sensible proposal. I'm certainly in favour.

I think the only area where there might be some issue to discuss is
the language of the report. As far as I can see, the report does not
require that modules exist as files, does not require the .hs
extension and does not give the standard mapping from module name to
file name.

So since the goal is interoperability of source files then perhaps we
should also have a section somewhere with interoperability guidelines
for implementations that do store Haskell programs as OS files. The
section would describe the one module per file convention, the .hs
extension (this is already obliquely mentioned in the section on
literate Haskell syntax) and the mapping of module names to file names
in common OS file systems. Then this UTF8 stipulation could go there
(and it would be clear that it applies only to conventional
implementations that store Haskell programs as files).

e.g.

Interoperability Guidelines


This Report does not specify how Haskell programs are represented or
stored. There is however a conventional representation using OS files.
Implementations that conform to these guidelines will benefit from the
portability of Haskell program representations.

Haskell modules are stored as files, one module per file. These
Haskell source files are given the file extension .hs for usual
Haskell files and .lhs for literate Haskell files (see section
10.4).

Source files must be encoded as UTF-8 \cite{utf8}. Implementations
must discard an initial Byte Order Mark (BOM) if present.

To find a source file corresponding to a module name used in an import
declaration, the following mapping from module name to OS file name is
used. The '.' character is mapped to the OS's directory separator
string while all other characters map to themselves. The .hs or
.lhs extension is added. Where both .hs and .lhs files exist for
the same module, the .lhs one should be used. The OS's standard
convention for representing Unicode file names should be used.

For example, on a UNIX based OS, the module A.B would map to the file
name A/B.hs for a normal Haskell file or to A/B.lhs for a literate
Haskell file. Note that because it is rare for a Main module to be
imported, there is no restriction on the name of the file containing
the Main module. It is conventional, but not strictly necessary, that
the Main module use the .hs or .lhs extension.


Duncan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Ben Millwood
On Wed, Apr 6, 2011 at 2:13 PM, Duncan Coutts
duncan.cou...@googlemail.com wrote:

 Interoperability Guidelines
 

 [...]

 To find a source file corresponding to a module name used in an import
 declaration, the following mapping from module name to OS file name is
 used. The '.' character is mapped to the OS's directory separator
 string while all other characters map to themselves. The .hs or
 .lhs extension is added. Where both .hs and .lhs files exist for
 the same module, the .lhs one should be used. The OS's standard
 convention for representing Unicode file names should be used.


This standard isn't quite universal. For example, jhc will look for
Data.Foo in Data/Foo.hs but also Data.Foo.hs [1]. We could take this
as an opportunity to discuss that practice, or we could try to make
the changes to the report orthogonal to that issue.

In some sense I think it's cute that the Report doesn't specify
anything about how Haskell modules are stored or represented, but I
don't think that freedom is actually used, so I'm happy to see it go.
I'd think, though, that in that case there would be more to discuss
than just the encoding, so if we could separate out the issues here, I
think that would be useful.

[1]: http://repetae.net/computer/jhc/manual.html#module-search-path

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


[Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Colin Paul Adams
I forgot to CC the list:

 Roel == Roel van Dijk vandijk.r...@gmail.com writes:

Roel I propose to make UTF-8 the only allowed encoding for Haskell
Roel source files. Implementations must discard an initial Byte
Roel Order Mark (BOM) if present [3].


Roel * Pros - Ensures that Haskell source can be reliably exchanged
Roel on the byte level.  - Disallows implicit ISO-8859-* encodings
Roel in source code, ensuring portability.  - Little or no
Roel implementation burden for compiler writers.

Having thought this over a bit more, I don't think it's a good idea.

Allowed? Allowed for what?

What does it achieve? Nothing, as far as I can see. Authors will still
be able to write their Haskell code in any encoding they like. And any
compiler can have a front-end script with an option to specify the
encoding used by source files, which simply uses iconv on the fly to
translate. 

I think the real place to mandate UTF-8 would be for Hackage. That's
where it matters (an alternative design would be to add an encoding
field in the .cabal file, but I don't think this has much merit).

-- 
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Duncan Coutts
On Wed, 2011-04-06 at 16:09 +0100, Ben Millwood wrote:
 On Wed, Apr 6, 2011 at 2:13 PM, Duncan Coutts
 duncan.cou...@googlemail.com wrote:
 
  Interoperability Guidelines
  
 
  [...]
 
  To find a source file corresponding to a module name used in an import
  declaration, the following mapping from module name to OS file name is
  used. The '.' character is mapped to the OS's directory separator
  string while all other characters map to themselves. The .hs or
  .lhs extension is added. Where both .hs and .lhs files exist for
  the same module, the .lhs one should be used. The OS's standard
  convention for representing Unicode file names should be used.
 
 
 This standard isn't quite universal. For example, jhc will look for
 Data.Foo in Data/Foo.hs but also Data.Foo.hs [1]. We could take this
 as an opportunity to discuss that practice, or we could try to make
 the changes to the report orthogonal to that issue.

Indeed. But it's true to say that if you do support the common
convention then you get portability. This does not preclude JHC from
supporting something extra, but sources that take advantage of JHC's
extension are not portable to implementations that just use the common
convention.

 In some sense I think it's cute that the Report doesn't specify
 anything about how Haskell modules are stored or represented, but I
 don't think that freedom is actually used, so I'm happy to see it go.
 I'd think, though, that in that case there would be more to discuss
 than just the encoding, so if we could separate out the issues here, I
 think that would be useful.

It's not going. I hope I was clear in the example text that the
interoperability guidelines were not forcing implementations to use
files etc, just that if they do, if they uses these conventions then
sources will be portable between implementations.

It doesn't stop an implementation using URLs, sticking multiple modules
in a file or keeping modules in a database.

Duncan


___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Bas van Dijk
On 6 April 2011 17:34, Colin Paul Adams co...@colina.demon.co.uk wrote:
 I forgot to CC the list:

 Roel == Roel van Dijk vandijk.r...@gmail.com writes:

    Roel I propose to make UTF-8 the only allowed encoding for Haskell
    Roel source files. Implementations must discard an initial Byte
    Roel Order Mark (BOM) if present [3].


    Roel * Pros - Ensures that Haskell source can be reliably exchanged
    Roel on the byte level.  - Disallows implicit ISO-8859-* encodings
    Roel in source code, ensuring portability.  - Little or no
    Roel implementation burden for compiler writers.

 Having thought this over a bit more, I don't think it's a good idea.

 Allowed? Allowed for what?

Allowed to be called a Haskell file.

If the report doesn't specify what a Haskell file is then we can't
reliably exchange Haskell source files by only looking at the files
themselves.

 What does it achieve? Nothing, as far as I can see. Authors will still
 be able to write their Haskell code in any encoding they like. And any
 compiler can have a front-end script with an option to specify the
 encoding used by source files, which simply uses iconv on the fly to
 translate.

Suppose I give you MyHaskellFile.hs. But before telling you how it's
encoded I go gliding (a hobby of mine). Unfortunately I crash my
glider and die :-(. Now what encoding option do you give to your
front-end script?

 I think the real place to mandate UTF-8 would be for Hackage. That's
 where it matters (an alternative design would be to add an encoding
 field in the .cabal file, but I don't think this has much merit).

That would only allow users of Hackage and Cabal to reliably exchange
their Haskell files. If we specify it in the report every user can
benefit.

Regards,

Bas

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: [Colin Paul Adams] Re: Proposal: Define UTF-8 to be the encoding of Haskell source files

2011-04-06 Thread Colin Paul Adams
 Bas == Bas van Dijk v.dijk@gmail.com writes:

Bas On 6 April 2011 17:34, Colin Paul Adams co...@colina.demon.co.uk 
wrote:

 Allowed? Allowed for what?

Bas Allowed to be called a Haskell file.

Well, what the report says on that is irrelevant. If I see a file
containing Haskell code, I shall call it a Haskell file, irrespective. I
suspect I will be in the majority.

Bas If the report doesn't specify what a Haskell file is then we
Bas can't reliably exchange Haskell source files by only looking at
Bas the files themselves.

Sure we can.

 What does it achieve? Nothing, as far as I can see. Authors will
 still be able to write their Haskell code in any encoding they
 like. And any compiler can have a front-end script with an option
 to specify the encoding used by source files, which simply uses
 iconv on the fly to translate.

Bas Suppose I give you MyHaskellFile.hs. But before telling you how
Bas it's encoded I go gliding (a hobby of mine). Unfortunately I
Bas crash my glider and die :-(. Now what encoding option do you
Bas give to your front-end script?

Whatever the encoding happens to be. That won't be hard to find out. And
presumably Haskell programmers don't dies so very frequently that it
will become a time-consuming affair.

 I think the real place to mandate UTF-8 would be for
 Hackage. That's where it matters (an alternative design would be
 to add an encoding field in the .cabal file, but I don't think
 this has much merit).

Bas That would only allow users of Hackage and Cabal to reliably
Bas exchange their Haskell files. If we specify it in the report
Bas every user can benefit.

There is no benefit that I see. Anyone is free to write Haskell code in
whatever encoding they fancy. Irrespective of what the report says. It's
not going to have the force of law.
-- 
Colin Adams
Preston Lancashire
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime