Re: Picky details about Unicode (was RE: Haskell 98 Report possible errors, part one)

2001-07-24 Thread Marcin 'Qrczak' Kowalczyk

Mon, 23 Jul 2001 11:23:30 -0700, Mark P Jones [EMAIL PROTECTED] pisze:

 I guess the intention here is that:
 
   symbol  - ascSymbol | uniSymbol_special | _ | : |  | '

Right.

 In fact, since all the characters in ascSymbol are either
 punctuation or symbols in Unicode, the inclusion of ascSymbol
 is redundant, and a better specification might be:
 
   symbol  - uniSymbol_special | _ | : |  | '

It would still be nice to explicitly list ASCII symbols, so one
doesn't need to look at Unicode specs to use ASCII-only source.

There are two places when character predicates are used in Haskell:
program source and module Char. I'm sure that we all agree that they
should be consistent with each other.

Some predicates in module Char are wrong, i.e. I don't agree with
their meaning. For example that isSpace is restricted to ISO-8859-1,
and that caseless letters are considered uppercase.

It's not clear what good definitions are, or even what set of
predicates is useful, because there is no single official source
with unambiguous and complete set of predicates. There are Unicode
character categories, Unicode property lists, and implementations of
C character predicates - all with different data. I guess Java specs
have something to tell here too.

I have an implemented proposal of improved Char predicates in QForeign
http://sf.net/projects/qforeign/. Definitions are based on both
Unicode character categories and PropList.txt from Unicode.

-- 
 __(  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTÊPCZA
QRCZAK


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Haskell 98 Report possible errors, part one

2001-07-24 Thread Lars Henrik Mathiesen

 From: Dylan Thurston [EMAIL PROTECTED]
 Date: Mon, 23 Jul 2001 19:57:54 -0400
 
 On Mon, Jul 23, 2001 at 06:30:30AM -0700, Simon Peyton-Jones wrote:
  Someone else, quoted by Simon, attribution elided by Dylan, wrote:
  | 2.2. Identifiers can use small and large Unicode letters. 
  | What about caseless scripts where letters are neither small 
  | nor large? The description of module Char says: For the 
  | purposes of Haskell, any alphabetic character which is not 
  | lower case is treated as upper case (Unicode actually has 
  | three cases: upper, lower and title). This suggests that the 
  | only anomaly is that titlecase letters are considered 
  | uppercase. But what is actually specified is that caseless 
  | scripts can be used to write constructor names, but not to 
  | variable names. I don't know how to solve this.
  
  I am woefully ignorant of Unicode, and I have no idea what to do
  about this one.  I therefore propose to do nothing on the grounds
  that I might easily make matters worse.
 
 In this case, what about requiring identifiers to start with an upper
 or lower case alphabetic character?

I'm not sure that makes things better. It just makes it impossible to
have identifiers in caseless scripts (some of which are alphabetic).

And whether you choose your upper or lower case alphabetic character
from Latin, Greek, Coptic, Cyrillic, Armenian, Georgian, or Deseret,
it will probably look silly in front of a variable name spelled in
Hangul.

What would make sense to me is to define that caseless letters
(Unicode class Lo) behave as lowercase, and to choose some easily
visible, culturally neutral, symbol as the official 'conid marker'.
Since the problem only arises on Unicode-capable systems, there should
be plenty of those to choose from, even outside Latin-1.

To fix Haskell 98, the least intrusive way might be to allow only
classes Ll, Lt, and Lu in identifiers, with Lt (titlecase) and Lu
counting as uppercase --- it looks like that may actually have been
the intention. And then add a note explaining that caseless scripts
can't be used because they weren't considered initially.

Lars Mathiesen (U of Copenhagen CS Dep) [EMAIL PROTECTED] (Humour NOT marked)

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Marlow


 3. A precedence table says that case (rightwards) has higher 
 precedence
 than operators and right associativity. If it's meaningful to talk
 about precedence of such syntactic constructs as case at all, 
 it should
 probably be told to have a lower precedence, so case x+1 of ...
 is valid as case (x+1) of  At least I don't see a difference
 between case (rightwards) and if (rightwards). I'm not sure if
 it makes sense to explain parsing of case in terms of precedence.

Interesting.  The table seems to say that case(rightwards) has a higher
precedence than infix operators, so that eg.

case x of p - x + y

would parse as

(case x of p - x) + y

which is in conflict with the longest parse rule.  I have no idea why
case(rightwards) is given a different precedence, and the inclusion of
'case alternative' in the list is confusing.

 4.3.1. A class declaration with no where part [...]
 The instance declaration must be given explicitly with no where part.
 Actually the where part may be present but empty, with the 
 same meaning
 as no where part.
 
 Generally I'm not sure that having a layout rule which says that {} is
 inserted when the next indentation level is about to start and the new
 indent is smaller than the outer one is necessary; in all useful cases
 the keyword which triggered the layout could be omitted, and writing
 let x = case x of
   foo - ...
 should be either an error or it should be allowed to have the next
 indent smaller than the previous one - it's not useful to let it mean
 let {x = case x of {}}
   foo - ...
 and in case one really wants to have empty alts in case, he 
 can write {}
 explicitly.

I agree, but this isn't really a bug so there's no need to change the
report.  Besides, GHC is the only compiler which actually implements the
layout rule as specified :-)

Cheers,
Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Peyton-Jones

Marcin

Thanks for your careful read.  Many of your suggestions I will
implement.
I'll send separate email about any others.

[Haskell mailing list folk: I hope you'll forgive email about the 
minutae of the Haskell Report.  But I don't want to let changes, or
even clarifications, go by without giving you all a chance to yell.

It is amazing how the act of saying here's the final version has an 
uncanny ability to stimulate new, and entirely well-founded, feedback.
I propose to continue this process, though.  I continue to make
strenuous
efforts to change the report only (a) to clarify what is obscure, (b) to

fix grevious errors or inconsistencies.]

Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Peyton-Jones

Folks

Marcin is right about this.  It is inconsistent as it stands. 
I propose to delete the sentence The Preldue module
is always available as a qualified import... in the first 
para of 5.6.1.

The situation will then be:
  if you don't import Prelude explicitly, you implicitly get
import Prelude
  if you do import Prelude explicitly, you get no implicit imports

Nice and simple

Simon

| 5.6.1. an implicit `import qualified Prelude' is part of 
| every module and names prefixed by `Prelude.' can always be 
| used to refer to entities in the Prelude. So what happens in 
| the following?
| 
| module Test (null) where
| import Prelude hiding (null)
| null :: Int
| null = 0
| 
| module Test2 where
| import Test as Prelude
| import Prelude hiding (null)
| x :: Int
| x = Prelude.null
| 
| ghc allows that, it dosen't seem to implement the qualified 
| part of the implicit Prelude import. The report is 
| contradictory: adding `import qualified Prelude' makes 
| Prelude.null ambiguous, and thus names prefixed by `Prelude.' 
| can't always be used to refer to entities in the Prelude.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Peyton-Jones

| 3. A precedence table says that case (rightwards) has higher 
| precedence than operators and right associativity. If it's 
| meaningful to talk about precedence of such syntactic 
| constructs as case at all, it should probably be told to have 
| a lower precedence, so case x+1 of ... is valid as case 
| (x+1) of  At least I don't see a difference between 
| case (rightwards) and if (rightwards). I'm not sure if it 
| makes sense to explain parsing of case in terms of precedence.

I can't make head or tail of what Table 1 (Section 3, beginning) is
trying to say.   It claims to be an aid to understanding the grammar,
but it seems downright confusing to me.

Proposal: remove Table 1 and its associated paragraph.

Does anyone like it? 

Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Peyton-Jones

| 2.2. Identifiers can use small and large Unicode letters. 
| What about caseless scripts where letters are neither small 
| nor large? The description of module Char says: For the 
| purposes of Haskell, any alphabetic character which is not 
| lower case is treated as upper case (Unicode actually has 
| three cases: upper, lower and title). This suggests that the 
| only anomaly is that titlecase letters are considered 
| uppercase. But what is actually specified is that caseless 
| scripts can be used to write constructor names, but not to 
| variable names. I don't know how to solve this.

I am woefully ignorant of Unicode, and I have no idea what to do about
this
one.  I therefore propose to do nothing on the grounds that I might
easily
make matters worse.

Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Haskell 98 Report possible errors, part one

2001-07-23 Thread Olaf Chitil


Unfortunately both the old and the new situation are not so nice.
Both don't allow a simple translation of Haskell into the Haskell
kernel,
e.g. you cannot translate [1..] into Prelude.enumFrom 1, because the
latter may be ambiguous.

The following remark at the beginning of Section 3 is misleading:

Free variables and constructors used in these translations refer to
entities defined by the Prelude. To avoid clutter, we use True instead
of Prelude.True or map instead of Prelude.map. (Prelude.True is a
qualified name as described in Section 5.3.)

It implicitly suggests that a simple translation is possible.

Unfortunately I don't see any simple way to regain a simple translation.
Hence I just suggest to change the remark at the beginning of Section 3.
Just say that all free variables and constructors refer to entities
defined by the Prelude and warn that full qualification is in general
not sufficient to achieve this (because the entity may not be imported
and because of import .. as).

Ciao,
Olaf


 Marcin is right about this.  It is inconsistent as it stands.
 I propose to delete the sentence The Preldue module
 is always available as a qualified import... in the first
 para of 5.6.1.
 
 The situation will then be:
   if you don't import Prelude explicitly, you implicitly get
 import Prelude
   if you do import Prelude explicitly, you get no implicit imports
 
 Nice and simple
 
 | 5.6.1. an implicit `import qualified Prelude' is part of
 | every module and names prefixed by `Prelude.' can always be
 | used to refer to entities in the Prelude. So what happens in
 | the following?
 |
 | module Test (null) where
 | import Prelude hiding (null)
 | null :: Int
 | null = 0
 |
 | module Test2 where
 | import Test as Prelude
 | import Prelude hiding (null)
 | x :: Int
 | x = Prelude.null
 |
 | ghc allows that, it dosen't seem to implement the qualified
 | part of the implicit Prelude import. The report is
 | contradictory: adding `import qualified Prelude' makes
 | Prelude.null ambiguous, and thus names prefixed by `Prelude.'
 | can't always be used to refer to entities in the Prelude.

-- 
OLAF CHITIL, 
 Dept. of Computer Science, University of York, York YO10 5DD, UK. 
 URL: http://www.cs.york.ac.uk/~olaf/
 Tel: +44 1904 434756; Fax: +44 1904 432767

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Haskell 98 Report possible errors, part one

2001-07-23 Thread Marcin 'Qrczak' Kowalczyk

Mon, 23 Jul 2001 15:11:32 +0100, Olaf Chitil [EMAIL PROTECTED] pisze:

 Both don't allow a simple translation of Haskell into the Haskell
 kernel,
 e.g. you cannot translate [1..] into Prelude.enumFrom 1, because the
 latter may be ambiguous.

That's why I was proposing that importing another module as Prelude
should be the way to change the meaning of builtin syntax in ghc,
instead of -fno-implicit-prelude combined with importing some names
to be available unqualified.

It's not a change in the report, which doesn't support changing the
meaning of builtin syntax and should only be clarified that entities
refer to standard Prelude. But as an extension some builtin syntax in
ghc might be defined as textual expansion to Prelude-qualified names,
so they may come either from original Prelude or from a replacement.

To complete that extension it should be legal to self-import:
module MyPrelude where
import MyPrelude as Prelude
import Prelude as P
so that 5 used in this very module expands to Prelude.fromIntegral 5,
i.e. MyPrelude.fromIntegral 5.

-- 
 __(  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTÊPCZA
QRCZAK


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: Haskell 98 Report possible errors, part one

2001-07-23 Thread Simon Peyton-Jones

| Unfortunately both the old and the new situation are not so 
| nice. Both don't allow a simple translation of Haskell into 
| the Haskell kernel, e.g. you cannot translate [1..] into 
| Prelude.enumFrom 1, because the latter may be ambiguous.
| 
| The following remark at the beginning of Section 3 is misleading:
| 
| Free variables and constructors used in these translations 
| refer to entities defined by the Prelude. To avoid clutter, 
| we use True instead of Prelude.True or map instead of 
| Prelude.map. (Prelude.True is a qualified name as described 
| in Section 5.3.)

The report is vainly trying to say that, regardless of what is
lexically in scope, the builtin syntax refers to Prelude entities.

Perhaps I should reword the offending paragraph to say:

Free variables and constructors used in these translations 
refer to entities defined by the Prelude, regardless of what
variables or constructors are actually in scope.  For example,
   concatMap used in the translation of list comprehensions (Section
3.11)
means the concatMap defined by the Prelude, regardless of whether
or not concatMap or Prelude.concatMap are in scope.

Would that be better?

Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Haskell 98 Report possible errors, part one

2001-07-23 Thread Olaf Chitil


 The report is vainly trying to say that, regardless of what is
 lexically in scope, the builtin syntax refers to Prelude entities.
 
 Perhaps I should reword the offending paragraph to say:
 
 Free variables and constructors used in these translations
 refer to entities defined by the Prelude, regardless of what
 variables or constructors are actually in scope.  For example,
concatMap used in the translation of list comprehensions (Section
 3.11)
 means the concatMap defined by the Prelude, regardless of whether
 or not concatMap or Prelude.concatMap are in scope.
 
 Would that be better?

You can probably delete the first , regardless ... subsentence for
better readability. Maybe you should add and refer unambiguously to the
Prelude. to the end of the last sentence?

Or does the report state somewhere that being in scope includes not
being ambiguous? Unfortunately, as far as I see, the report does not
even explain what it means for an identifier to be ambiguous.

-- 
OLAF CHITIL, 
 Dept. of Computer Science, University of York, York YO10 5DD, UK. 
 URL: http://www.cs.york.ac.uk/~olaf/
 Tel: +44 1904 434756; Fax: +44 1904 432767

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Picky details about Unicode (was RE: Haskell 98 Report possible errors, part one)

2001-07-23 Thread Mark P Jones

| 2.2. Identifiers can use small and large Unicode letters ...

If we're picking on the report's handling of Unicode, here's
another minor quibble to add to the list.  In describing the
lexical syntax of operator symbols, the report uses:

   varsym- (symbol {symbol | :})_reservedop
   symbol- ascSymbol | uniSymbol
   uniSymbol - any Unicode symbol or punctuation

The last line seems to include more characters than I'd expect.
Specifically:

  ()[]{}  are punctuation (Unicode type Pe, Ps)
  `   is a symbol, modifier (Unicode type Sk)
  ':;,   are punctuation, other (Unicode type Po)
  _   is punctuation, connector (Unicode type Pc)

And, so, if I read the report correctly, I should be able to
define :-) as a consym and `div`, [], and hello as varsyms!
(Not to mention some altogether more bizarre choices!)

I guess the intention here is that:

  symbol  - ascSymbol | uniSymbol_special | _ | : |  | '

In fact, since all the characters in ascSymbol are either
punctuation or symbols in Unicode, the inclusion of ascSymbol
is redundant, and a better specification might be:

  symbol  - uniSymbol_special | _ | : |  | '

All the best,
Mark

P.S.  A caveat: I'm not a Unicode expert!  Perhaps Marcin can
advise ...


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Haskell 98 Report possible errors, part one

2001-07-23 Thread Dylan Thurston

On Mon, Jul 23, 2001 at 06:30:30AM -0700, Simon Peyton-Jones wrote:
 | 2.2. Identifiers can use small and large Unicode letters. 
 | What about caseless scripts where letters are neither small 
 | nor large? The description of module Char says: For the 
 | purposes of Haskell, any alphabetic character which is not 
 | lower case is treated as upper case (Unicode actually has 
 | three cases: upper, lower and title). This suggests that the 
 | only anomaly is that titlecase letters are considered 
 | uppercase. But what is actually specified is that caseless 
 | scripts can be used to write constructor names, but not to 
 | variable names. I don't know how to solve this.
 
 I am woefully ignorant of Unicode, and I have no idea what to do
 about this one.  I therefore propose to do nothing on the grounds
 that I might easily make matters worse.

In general, the situation with claimed support of Unicode in Haskell
is pretty shaky; probably much more needs to be said in many places.
I will continue to be dubious until some compiler actually provides
reasonable support.

In this case, what about requiring identifiers to start with an upper
or lower case alphabetic character?

--Dylan


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell