Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-27 Thread John Meacham
>> that can appear at the end of both symbols and ids. >> >> currently it consists of >> >> $trailing = [₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋] >> >> John >> >> On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov >> wrote: >>> >>> Hello lists, >>> >>> A

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-25 Thread Mikhail Vorozhtsov
₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹₍₎⁽⁾₊₋] John On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov wrote: Hello lists, As some of you may know, GHC's support for Unicode characters in lexemes is rather crude and hence prone to inconsistencies in their handling versus the ASCII counterparts. For example,

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-17 Thread John Meacham
sible and doesn't entail CPP concerns. John On Sun, Jun 15, 2014 at 5:26 PM, Mateusz Kowalczyk wrote: > On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote: >> Hello lists, >> >> As some of you may know, GHC's support for Unicode characters in lexemes &

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-17 Thread Mikhail Vorozhtsov
On 06/17/2014 03:13 AM, Tsuyoshi Ito wrote: Hello, Mikhail Vorozhtsov wrote: I also worry (although not based on anything particular you said) whether this will not change meaning of any existing programs. Does it only allow new programs? As far as I can see, no change in meaning. Some hacky

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Tsuyoshi Ito
Hello, Mikhail Vorozhtsov wrote: >> I also worry (although not based on anything particular you said) >> whether this will not change meaning of any existing programs. Does it >> only allow new programs? > > As far as I can see, no change in meaning. Some hacky operators and some > hacky identifi

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Mikhail Vorozhtsov
On 06/16/2014 04:26 AM, Mateusz Kowalczyk wrote: On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote: Hello lists, As some of you may know, GHC's support for Unicode characters in lexemes is rather crude and hence prone to inconsistencies in their handling versus the ASCII counterparts

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-16 Thread Herbert Valerio Riedel
On 2014-06-16 at 02:26:49 +0200, Mateusz Kowalczyk wrote: [...] > While personally I like the proposal (wanted prime and sub/sup scripts > way too many times), I worry what this means for compatibility reasons: > suddenly we'll have code that fails to build on 7.8 and before because > someone usi

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-15 Thread Mateusz Kowalczyk
On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote: > Hello lists, > > As some of you may know, GHC's support for Unicode characters in lexemes > is rather crude and hence prone to inconsistencies in their handling > versus the ASCII counterparts. For example, APOSTROPHE is tr

Re: RFC: Unicode primes and super/subscript characters in GHC

2014-06-14 Thread John Meacham
As some of you may know, GHC's support for Unicode characters in lexemes is > rather crude and hence prone to inconsistencies in their handling versus the > ASCII counterparts. For example, APOSTROPHE is treated differently from > PRIME: > > λ> data a +' b = Plus a b >

RFC: Unicode primes and super/subscript characters in GHC

2014-06-14 Thread Mikhail Vorozhtsov
Hello lists, As some of you may know, GHC's support for Unicode characters in lexemes is rather crude and hence prone to inconsistencies in their handling versus the ASCII counterparts. For example, APOSTROPHE is treated differently from PRIME: λ> data a +' b = Plus a b :3:9:

Re: Bug with unicode characters in file names

2012-03-13 Thread Brent Yorgey
On Tue, Mar 13, 2012 at 06:06:49PM +0100, Volker Wysk wrote: > > I'm sending this to glasgow-haskell-users instead of glasgow-haskell-bugs, > because the latter does not seem to accept my messages. I receive nothing, > neither the message in the mailing list, nor any error message. As I underst

Re: Bug with unicode characters in file names

2012-03-13 Thread Johan Tibell
Hi, Your best option is to file a bug at http://hackage.haskell.org/trac/ghc/ -- Johan ___ Glasgow-haskell-users mailing list [email protected] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Bug with unicode characters in file names

2012-03-13 Thread Volker Wysk
Hi This is some file "äöü.hs" with three German umlauts in the file name: main = putStrLn "äöü" Now I want to get the dependendency information. Therefore I call: ghc -M äöü.hs The following gets added to the Makefile: # DO NOT DELETE: Beginning of Haskell dependencies äöü.o :

Re: Unicode windows console output.

2010-11-04 Thread David Sankel
ayer, i.e. using write() and > pseudo-file-descriptors. More than a few problems have been caused by this, > and it's totally unnecessary except that we get to share some code between > the POSIX and Windows backends. We ought to be using the native Win32 APIs > and HANDLE directly, t

Re: Unicode windows console output.

2010-11-04 Thread Simon Marlow
On 04/11/2010 02:35, David Sankel wrote: On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow mailto:[email protected]>> wrote: On 03/11/2010 10:36, Bulat Ziganshin wrote: Hello Max, Wednesday, November 3, 2010, 1:26:50 PM, you wrote: 1. You need to use "chcp 65001" t

Re: Unicode windows console output.

2010-11-03 Thread David Sankel
On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow wrote: > On 03/11/2010 10:36, Bulat Ziganshin wrote: > >> Hello Max, >> >> Wednesday, November 3, 2010, 1:26:50 PM, you wrote: >> >> 1. You need to use "chcp 65001" to set the console code page to UTF8 >>> 2. It is very likely that your Windows consol

Re: Unicode windows console output.

2010-11-03 Thread Simon Marlow
de page to use by default - see libraries/base/GHC/IO/Encoding/CodePage.hs. Windows Consoles use Unicode internally. I presume at some point between WriteFile() and the console some decoding is supposed to happen, but I don't know where that is, or how well it works (other evidence on

Re[2]: Unicode windows console output.

2010-11-03 Thread Bulat Ziganshin
Hello Max, Wednesday, November 3, 2010, 1:26:50 PM, you wrote: > 1. You need to use "chcp 65001" to set the console code page to UTF8 > 2. It is very likely that your Windows console won't have the fonts > required to actually make sense of the output. Pipe the output to > foo.txt. If you open th

Re: Unicode windows console output.

2010-11-03 Thread Max Bolingbroke
On 2 November 2010 21:05, David Sankel wrote: > Is there a ghc "wontfix" bug ticket for this? Perhaps we can make a small C > test case and send it to the Microsoft people. Some[1] are reporting success > with Unicode console output. I confirmed that I can output Chinese u

Re: Unicode windows console output.

2010-11-03 Thread Krasimir Angelov
we can make a small C > test case and send it to the Microsoft people. Some[1] are reporting success > with Unicode console output. > David > > [1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx > > On Tue, Nov 2, 2010 at 3:49 AM, Krasimir Angelov > wrote:

Re: Unicode windows console output.

2010-11-02 Thread David Sankel
Is there a ghc "wontfix" bug ticket for this? Perhaps we can make a small C test case and send it to the Microsoft people. Some[1] are reporting success with Unicode console output. David [1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx On Tue, Nov 2, 2010 at 3:49 AM

Re: Unicode windows console output.

2010-11-02 Thread Krasimir Angelov
This is evidence for the broken Unicode support in the Windows terminal and not a problem with GHC. I experienced the same many times. 2010/11/2 David Sankel : > > On Mon, Nov 1, 2010 at 10:20 PM, David Sankel wrote: >> >> Hello all, >> I'm attempting to outpu

Re: Unicode windows console output.

2010-11-01 Thread David Sankel
On Mon, Nov 1, 2010 at 10:20 PM, David Sankel wrote: > Hello all, > > I'm attempting to output some Unicode on the windows console. I set my > windows console code page to utf-8 using "chcp 65001". > > The program: > > -- Test.hs > main = putStr &

Unicode windows console output.

2010-11-01 Thread David Sankel
Hello all, I'm attempting to output some Unicode on the windows console. I set my windows console code page to utf-8 using "chcp 65001". The program: -- Test.hs main = putStr "λ.x→x" The output of `runghc Test.hs`: λ.x→ >From within ghci, typing `main`:

Re: unicode characters in operator name

2010-09-10 Thread Greg
aste.  =)Thanks-- GregOn Sep 10, 2010, at 06:49 PM, Brandon S Allbery KF8NH wrote:-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 9/10/10 21:12 , Greg wrote: > unicode symbol (defined as any Unicode symbol or punctuation). I'm pretty > sure º is a unicode symbol or punctuation. No,

Re: unicode characters in operator name

2010-09-10 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 9/10/10 21:12 , Greg wrote: > unicode symbol (defined as any Unicode symbol or punctuation). I'm pretty > sure º is a unicode symbol or punctuation. No, it's a raised lowercase "o" used by convention to indicate gend

Re: unicode characters in operator name

2010-09-10 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 9/10/10 21:39 , Daniel Fischer wrote: > On Saturday 11 September 2010 03:12:11, Greg wrote: >> a unicode symbol (defined as any Unicode symbol or punctuation). I'm >> pretty sure º is a unicode symbol or punctuation. &g

Re: unicode characters in operator name

2010-09-10 Thread Daniel Fischer
On Saturday 11 September 2010 03:12:11, Greg wrote: > > If I read the Haskell Report correctly, operators are named by (symbol > {symbol | : }), where symbol is either an ascii symbol (including *) or > a unicode symbol (defined as any Unicode symbol or punctuation).  I'm >

unicode characters in operator name

2010-09-10 Thread Greg
claration is fine.If I read the Haskell Report correctly, operators are named by (symbol {symbol | : }), where symbol is either an ascii symbol (including *) or a unicode symbol (defined as any Unicode symbol or punctuation).  I'm pretty sure º is a unicode symbol or punctuation.I know I could ge

Re: Unicode alternative for '..' (ticket #3894)

2010-04-21 Thread Roel van Dijk
On Wed, Apr 21, 2010 at 12:51 AM, Yitzchak Gale wrote: > Yes, sorry. Either use TWO DOT LEADER, or remove > this Unicode alternative altogether > (i.e. leave it the way it is *without* the UnicodeSyntax extension). > > I'm happy with either of those. I just don't like mo

Re: Unicode alternative for '..' (ticket #3894)

2010-04-20 Thread Yitzchak Gale
I wrote: >> My opinion is that we should either use TWO DOT LEADER, >> or just leave it as it is now, two FULL STOP characters. Simon Marlow wrote: > Just to be clear, you're suggesting *removing* the Unicode alternative for > '..' from GHC's UnicodeSyntax

Re: Unicode alternative for '..' (ticket #3894)

2010-04-19 Thread Simon Marlow
On 15/04/2010 18:12, Yitzchak Gale wrote: My opinion is that we should either use TWO DOT LEADER, or just leave it as it is now, two FULL STOP characters. Just to be clear, you're suggesting *removing* the Unicode alternative for '..' from GHC's UnicodeSyntax extension

Re: Unicode alternative for '..' (ticket #3894)

2010-04-15 Thread Roel van Dijk
That is very interesting. I didn't know the history of those characters. > If we can't find a Unicode character that everyone agrees upon, > I also don't see any problem with leaving it as two FULL STOP > characters. I agree. I don't like the current Unicode

Re: Unicode alternative for '..' (ticket #3894)

2010-04-15 Thread Yitzchak Gale
My opinion is that we should either use TWO DOT LEADER, or just leave it as it is now, two FULL STOP characters. Two dots indicating a range is not the same symbol as a three dot ellipsis. Traditional non-Unicode Haskell will continue to be around for a long time to come. It would be very

Re: Unicode alternative for '..' (ticket #3894)

2010-04-15 Thread Jason Dusek
I think the baseline ellipsis makes much more sense; it's hard to see how the midline ellipsis was chosen. -- Jason Dusek ___ Glasgow-haskell-users mailing list [email protected] http://www.haskell.org/mailman/listinfo/glasgow-haskell

Unicode alternative for '..' (ticket #3894)

2010-04-14 Thread Roel van Dijk
feedback on this since it is a change that breaks backwards compatibility (even though it is a really small change). Regards, Roel van Dijk 1 - http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax 2 - http://en.wikipedia.org/wiki/Ellipsis#In_mathematical_

Re: Ready for testing: Unicode support for Handle I/O

2009-02-04 Thread Paolo Losi
Max Vasin wrote: Wouldn't it be more correct to separate binary IO, which return [Word8] (or ByteString) and text IO which return [Char] and deal with text encoding? IIRC that was done in Bulat Ziganshin's streams library. That's exactly what I meant. Text IO could be then implemented on to

Re: Ready for testing: Unicode support for Handle I/O

2009-02-04 Thread Paolo Losi
Simon Marlow wrote: The only change to the existing behaviour is that by default, text IO is done in the prevailing encoding of the system. Handles created by openBinaryFile use the Latin-1 encoding, as do Handles placed in binary mode using hSetBinaryMode. wouldn't be semantically correct fo

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread John Goerzen
Duncan Coutts wrote: > Sorry, I think we've been talking at cross purposes. I think so. >> There always has to be *some* conversion from a 32-bit Char to the >> system's selection, right? > > Yes. In text mode there is always some conversion going on. Internally > there is a byte buffer and a ch

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread Duncan Coutts
On Tue, 2009-02-03 at 17:39 -0600, John Goerzen wrote: > On Tue, Feb 03, 2009 at 10:56:13PM +, Duncan Coutts wrote: > > > > Thanks to suggestions from Duncan Coutts, it's possible to call > > > > hSetEncoding even on buffered read Handles, and the right thing > > > > happens. So we can read fr

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread John Goerzen
On Tue, Feb 03, 2009 at 10:56:13PM +, Duncan Coutts wrote: > > > Thanks to suggestions from Duncan Coutts, it's possible to call > > > hSetEncoding even on buffered read Handles, and the right thing > > > happens. So we can read from text streams that include multiple > > > encodings, such as

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread Duncan Coutts
On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote: > Will there also be something to handle the UTF-16 BOM marker? I'm not > sure what the best API for that is, since it may or may not be present, > but it should be considered -- and could perhaps help autodetect encoding. I think someone el

Re: [Haskell-cafe] Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread John Goerzen
Simon Marlow wrote: > I've been working on adding proper Unicode support to Handle I/O in GHC, > and I finally have something that's ready for testing. I've put a patchset > here: Yay! Comments below. > Comments/discussion please! Do you expect Hugs will be

Ready for testing: Unicode support for Handle I/O

2009-02-03 Thread Simon Marlow
I've been working on adding proper Unicode support to Handle I/O in GHC, and I finally have something that's ready for testing. I've put a patchset here: http://www.haskell.org/~simonmar/base-unicode.tar.gz That is a set of patches against a GHC repo tree: unpack the tarbal

Re: Future plans: unicode and line editing

2008-11-25 Thread Ian Lynagh
On Tue, Nov 25, 2008 at 01:28:48PM -0800, Donald Bruce Stewart wrote: > > Can we construct a set of tests that determines if a given line editing > code base works to our satisfaction? If you can make some tests then that would be great. You need to be careful though, e.g. input had better look l

Re: Future plans: unicode and line editing

2008-11-25 Thread Don Stewart
igloo: > > Hi all, > > We've been weighing up the options to solve the recent problems that > editline has given us, and we think that this is the best way forward: > > For 6.12: > > * http://hackage.haskell.org/trac/ghc/ticket/2811 > Implement unicode

Future plans: unicode and line editing

2008-11-25 Thread Ian Lynagh
Hi all, We've been weighing up the options to solve the recent problems that editline has given us, and we think that this is the best way forward: For 6.12: * http://hackage.haskell.org/trac/ghc/ticket/2811 Implement unicode support for text I/O (we've had this on the TODO lis

Re[3]: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

2005-11-24 Thread Bulat Ziganshin
Hello Bulat, Thursday, November 24, 2005, 4:17:24 AM, you wrote: BZ> but i propose to make these middle-level functions after stage 2 or BZ> even 3 in this scheme - so that they will be fully in Haskell world, BZ> only work with file descriptors instead of Handles. for example: "it's better one

Re[2]: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

2005-11-23 Thread Bulat Ziganshin
ould, if we're lucky, handle properly the case when some future version SM> of Windows removes the quirk, is not a good use of developer time. SM> Furthermore, Windows hardly ever changes APIs, they just add new ones. SM> So I don't see occasional use of #ifdef mingw32_HOST_OS as

RE: adding to GHC/win32 Handle operations support of Unicode filenamesand files larger than 4 GB

2005-11-23 Thread Simon Marlow
rrently don't support operations with files with Unicode > filenames, nor it can tell/seek in files for positions larger than 4 > GB. it is because Unix-compatible functions open/fstat/tell/... that > is supported in Mingw32 works only with "char[]" for filenames and > off

Re[2]: adding to GHC/win32 Handle operations support of Unicode filenames and files larger than 4 GB

2005-11-22 Thread Bulat Ziganshin
Hello Sven, Tuesday, November 22, 2005, 8:53:55 PM, you wrote: >> #ifdef mingw32_HOST_OS >> type CFilePath = LPCTSTR >> type CFileOffset = Int64 SP> Whatever will be done, please use *feature-based* ifdefs, not those SP> platform-dependent ones above, which will be proven wrong either immediat

Re: adding to GHC/win32 Handle operations support of Unicode filenames and files larger than 4 GB

2005-11-22 Thread Sven Panne
Am Montag, 21. November 2005 13:01 schrieb Bulat Ziganshin: > [...] > #ifdef mingw32_HOST_OS > type CFilePath = LPCTSTR > type CFileOffset = Int64 > withCFilePath = withTString > peekCFilePath = peekTString > #else > type CFilePath = CString > type CFileOffset = COff > withCFilePath = withCString >

adding to GHC/win32 Handle operations support of Unicode filenames and files larger than 4 GB

2005-11-21 Thread Bulat Ziganshin
Hello glasgow-haskell-users, Simon, what you will say about the following plan? ghc/win32 currently don't support operations with files with Unicode filenames, nor it can tell/seek in files for positions larger than 4 GB. it is because Unix-compatible functions open/fstat/tell/... th

RE: Unicode in GHC 6.2.2 and 6.4.x (was: Re: [Haskell-cafe] Unicode.hs)

2005-07-18 Thread Simon Marlow
.4.1, but definitely in 6.6. >> > > I have put those files that work for me in GHC 6.2.2 (Unicode support) > for download. Please read the Wiki page: > > http://haskell.org/hawiki/GhcUnicode > > for instructions. > > Any feedback will be appreciated. I belie

Unicode in GHC 6.2.2 and 6.4.x (was: Re: [Haskell-cafe] Unicode.hs)

2005-07-16 Thread Dimitry Golubovsky
Dear List Subscribers, Simon Marlow wrote: On 30 June 2005 14:36, Dimitry Golubovsky wrote: It is in CVS now, and I believe will be in 6.4.1 Not planned for 6.4.1, but definitely in 6.6. I have put those files that work for me in GHC 6.2.2 (Unicode support) for download. Please read

Re[4]: Unicode source files

2005-05-18 Thread Bulat Ziganshin
Hello Simon, Tuesday, May 17, 2005, 5:30:06 PM, you wrote: >>> The question is what Alex should see for a unicode character: Alex >>> currently assumes that characters are in the range 0-255 (you need a >>> fixed range in order to generate the lexer tables). One po

RE: Re[2]: Unicode source files

2005-05-17 Thread Simon Marlow
On 13 May 2005 11:37, Bulat Ziganshin wrote: > Thursday, May 05, 2005, 1:56:12 PM, you wrote: > >>> it is true what to support unicode source files only StringBuffer >>> implementation must be changed? > >> It depends whether you want to support several diff

Fwd: Re[2]: Unicode source files

2005-05-14 Thread Bulat Ziganshin
Sorry, Simon, are you received this message? This is a forwarded message From: Bulat Ziganshin <[EMAIL PROTECTED]> To: "Simon Marlow" <[EMAIL PROTECTED]> Date: Thursday, May 05, 2005, 10:13:37 PM Subject: Unicode source files ===8<==Original message text==

RE: Unicode source files

2005-05-05 Thread Simon Marlow
On 04 May 2005 15:57, Bulat Ziganshin wrote: > it is true what to support unicode source files only StringBuffer > implementation must be changed? It depends whether you want to support several different encodings, or just UTF-8. If we only want to support UTF-8, then we can ke

Unicode source files

2005-05-05 Thread Bulat Ziganshin
Hello it is true what to support unicode source files only StringBuffer implementation must be changed? if so, then task can be simplified by converting any files read by hGetStringBuffer to UTF-32 (PackedString) representation and putting in memory array in this form. After this, we must change

Unicode Source / Keyboard Layout

2005-03-21 Thread Sven Moritz Hallberg
Greetings GHC and Haskell folk, please excuse the cross-post. This is a coordinational message. :) I've been longing for Unicode (UTF-8) input support in GHC for a long time. I am currently customizing a keyboard layout to include many mathematical operators and special characters which wou

RE: [Haskell] [ANNOUNCE] New version of unicode CWString library withextras

2005-01-19 Thread Simon Marlow
On 19 January 2005 05:31, John Meacham wrote: > A while ago I wrote a glibc specific implementation of the CWString > library. I have since made several improvements: > > * No longer glibc specific, should compile and work on any system with > iconv (which is unix standard) (but there are still

RE: Unicode in GHC: need more advice

2005-01-17 Thread Simon Marlow
On 14 January 2005 12:58, Dimitry Golubovsky wrote: > Now I need more advice on which "flavor" of Unicode support to > implement. In Haskell-cafe, there were 3 flavors summarized: I am > reposting the table here (its latest version). > > |Seb

Re: Unicode in GHC: need more advice

2005-01-14 Thread Dimitry Golubovsky
s are basically int -> int, it does not affect the result. The code I use is some draft code, based on what I submitted for Hugs (pure Unicode basically, even without extra space characters). Now I need more advice on which "flavor" of Unicode support to implement. In Haskell-

Re: Unicode in GHC: need some advice on building

2005-01-11 Thread Shawn Garbett
--- Dimitry Golubovsky <[EMAIL PROTECTED]> wrote: > Hi, > > Following up the discussion in Haskell-Cafe about > ways to bring better > Unicode support in GHC. A radical suggestion from an earlier discussion was to make String a typeclass. Have unicode, ascii, etc. all be

RE: Unicode in GHC: need some advice on building

2005-01-11 Thread Simon Marlow
On 11 January 2005 02:29, Dimitry Golubovsky wrote: > Bad thing is, LD_PRELOAD does not work on all systems. So I tried to > put the code directly into the runtime (where I believe it should be; > the Unicode properties table is packed, and won't eat much space). I > renamed

Unicode in GHC: need some advice on building

2005-01-10 Thread Dimitry Golubovsky
Hi, Following up the discussion in Haskell-Cafe about ways to bring better Unicode support in GHC. I may take care on putting this into the GHC runtime, but I need some advice as I am completely new to this. What needs to be done primarily, is to replace the FFI calls made from GHC.Unicode

Re: GHC and UNICODE...

2003-12-22 Thread Ross Paterson
On Fri, Dec 19, 2003 at 12:17:42PM -0800, John Meacham wrote: > 1. written the CWString library (now a part of the FFI) which lets you > call arbitrary C functions doing all the proper character set conversion > stuff. Do you plan to update this and merge it with the hierarchical libraries to comp

RE: GHC and UNICODE...

2003-12-22 Thread Simon Marlow
k this is the best way to go about it. Sure, you can run Alex over the UTF-8 source, but the grammar will be huge. A simpler way is to take advantage of the fact that Haskell only uses 5 classes of Unicode characters: uniSmall, uniLarge, uniWhite, uniSymbol, and uniDigit. Alex has a good inp

Re: GHC and UNICODE...

2003-12-19 Thread John Meacham
On Fri, Dec 19, 2003 at 04:51:50PM +, MR K P SCHUPKE wrote: > Whilst I appreciate the topic of show is not directly related to GHC, > what I would like to know is how to handle UNICODE properly... If I assume > I have a good unicode terminal, so stdin and stdout are in unicode form

GHC and UNICODE...

2003-12-19 Thread MR K P SCHUPKE
Whilst I appreciate the topic of show is not directly related to GHC, what I would like to know is how to handle UNICODE properly... If I assume I have a good unicode terminal, so stdin and stdout are in unicode format, and all my text files are in unicode, how do I deal with this properly in GHC

Re: Unicode

2001-10-08 Thread Kent Karlsson
- Original Message - From: "Dylan Thurston" <[EMAIL PROTECTED]> To: "Andrew J Bromage" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, October 05, 2001 6:00 PM Subject: Re: UniCode > On Fri, Oct 05, 2

Re: Unicode

2001-10-08 Thread Kent Karlsson
- Original Message - From: "Ketil Malde" <[EMAIL PROTECTED]> To: "Dylan Thurston" <[EMAIL PROTECTED]> Cc: "Andrew J Bromage" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, October 08, 2001 9:02 A

Re: UniCode

2001-10-07 Thread Ketil Malde
Dylan Thurston <[EMAIL PROTECTED]> writes: > Right. In Unicode, the concept of a "character" is not really so > useful; After reading a bit about it, I'm certainly confused. Unicode/ISO-10646 contains a lot of things that aren'r really one character, e.g. l

Re: UniCode

2001-10-06 Thread Andrew J Bromage
G'day all. On Fri, Oct 05, 2001 at 06:17:26PM +, Marcin 'Qrczak' Kowalczyk wrote: > This information is out of date. AFAIR about 4 of them is assigned. > Most for Chinese (current, not historic). I wasn't aware of this. Last time I looked was Unicode 3.0. Th

Re: UniCode

2001-10-05 Thread Marcin 'Qrczak' Kowalczyk
05 Oct 2001 14:35:17 +0200, Ketil Malde <[EMAIL PROTECTED]> pisze: > Does Haskell's support of "Unicode" mean UTF-32, or full UCS-4? It's not decided officially. GHC uses UTF-32. It's expected that UCS-4 will vanish and ISO-10646 will be reduced to the sa

Re: UniCode

2001-10-05 Thread Marcin 'Qrczak' Kowalczyk
Fri, 5 Oct 2001 23:23:50 +1000, Andrew J Bromage <[EMAIL PROTECTED]> pisze: > There is a set of one million (more correctly, 1M) Unicode characters > which are only accessible using surrogate pairs (i.e. two UTF-16 > codes). There are currently none of these codes assigned, This

Re: UniCode

2001-10-05 Thread Dylan Thurston
On Fri, Oct 05, 2001 at 11:23:50PM +1000, Andrew J Bromage wrote: > G'day all. > > On Fri, Oct 05, 2001 at 02:29:51AM -0700, Krasimir Angelov wrote: > > > Why Char is 32 bit. UniCode characters is 16 bit. > > It's not quite as simple as that. There is a se

Re: UniCode

2001-10-05 Thread Andrew J Bromage
G'day all. On Fri, Oct 05, 2001 at 02:29:51AM -0700, Krasimir Angelov wrote: > Why Char is 32 bit. UniCode characters is 16 bit. It's not quite as simple as that. There is a set of one million (more correctly, 1M) Unicode characters which are only accessible using surrogate pa

Re: UniCode

2001-10-05 Thread Ketil Malde
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes: > Fri, 5 Oct 2001 02:29:51 -0700 (PDT), Krasimir Angelov <[EMAIL PROTECTED]> pisze: > > > Why Char is 32 bit. UniCode characters is 16 bit. > No, Unicode characters have 21 bits (range U+

Re: UniCode

2001-10-05 Thread Marcin 'Qrczak' Kowalczyk
Fri, 5 Oct 2001 02:29:51 -0700 (PDT), Krasimir Angelov <[EMAIL PROTECTED]> pisze: > Why Char is 32 bit. UniCode characters is 16 bit. No, Unicode characters have 21 bits (range U+..10). They used to fit in 16 bits a long time ago, and they are sometimes encoded as UTF

UniCode

2001-10-05 Thread Krasimir Angelov
Why Char is 32 bit. UniCode characters is 16 bit. __ Do You Yahoo!? NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 ___ Glasgow-haskell

Re: A question about Unicode support

2001-09-11 Thread Marcin 'Qrczak' Kowalczyk
Tue, 11 Sep 2001 13:19:54 -0300 (GMT), Pablo Pedemonte <[EMAIL PROTECTED]> pisze: > Ghc 5.00.2 provides (initial) Unicode support, so I thought the > chr function would do. But it seems it still rejects Int values > greater than 0xFF. It doesn't. -- __("<

A question about Unicode support

2001-09-11 Thread Pablo Pedemonte
Hi all! The question is really simple: how can I convert an Int into a Char? Ghc 5.00.2 provides (initial) Unicode support, so I thought the chr function would do. But it seems it still rejects Int values greater than 0xFF. So, what function shoud I use? Thanks in advance. Regards, Pablo

Re: Unicode

2000-05-17 Thread Frank Atanassow
on of a name is the official identifier, it is > rather bad form to write a person's name in Kana (the > phonetic alphabets). You're absolutely right. This fact slipped my mind. Still, probably 85% (just a guess) of Japanese names can be written with Jyouyou kanji, and the

Re: Unicode

2000-05-16 Thread Manuel M. T. Chakravarty
ore than > > > Int, to be able to use ord and chr safely. > > Er does it have to? The Java Virtual Machine implements Unicode with > > 16 bits. (OK, so I suppose that means it can't cope > > with Korean or Chinese.) > > Just to set the record straight: >

Re: Unicode

2000-05-16 Thread Marcin 'Qrczak' Kowalczyk
Tue, 16 May 2000 12:26:12 +0200 (MET DST), Frank Atanassow <[EMAIL PROTECTED]> pisze: > Of course, you can always come up with specialized schemes involving stateful > encodings and/or "block-swapping" (using the Unicode private-use areas, for > example), but then, tha

Re: Unicode

2000-05-16 Thread Marcin 'Qrczak' Kowalczyk
t; > Er does it have to? The Java Virtual Machine implements Unicode with > 16 bits. (OK, so I suppose that means it can't cope with Korean or Chinese.) > So requiring Char to be >=30 bits would stop anyone implementing a > conformant Haskell on the JVM. OK, "allowed&

Re: Unicode

2000-05-16 Thread Frank Atanassow
Er does it have to? The Java Virtual Machine implements Unicode with > 16 bits. (OK, so I suppose that means it can't cope with Korean or Chinese.) Just to set the record straight: Many CJK (Chinese-Japanese-Korean) characters are encodable in 16 bits. I am not so familiar with the Chinese

RE: Unicode

2000-05-16 Thread Simon Marlow
> > OTOH, it wouldn't be hard to change GHC's Char datatype to be a > > full 32-bit integral data type. > > Could we do it please? > > It will not break anything if done slowly. I imagine that > {read,write}CharOffAddr and _ccall_ will still use only 8 bits of > Char. But after Char is wide, lib

Re: Unicode

2000-05-16 Thread George Russell
Marcin 'Qrczak' Kowalczyk wrote: > As for the language standard: I hope that Char will be allowed or > required to have >=30 bits instead of current 16; but never more than > Int, to be able to use ord and chr safely. Er does it have to? The Java Virtual Machine implement

Re: Unicode

2000-05-15 Thread Marcin 'Qrczak' Kowalczyk
Mon, 15 May 2000 02:45:17 -0700, Simon Marlow <[EMAIL PROTECTED]> pisze: > OTOH, it wouldn't be hard to change GHC's Char datatype to be a > full 32-bit integral data type. Could we do it please? It will not break anything if done slowly. I imagine that {read,write}CharOffAddr and _ccall_ will

RE: Unicode

2000-05-15 Thread Simon Marlow
> How safe is representinging Unicode characters as Chars unsafeCoerce#d > from large Ints? Seems to work in simple cases :-) er, "downright dangerous". There are lots of places where we assume that Chars have only 8 bits of data, even though the representation has room for 3

Unicode

2000-05-13 Thread Marcin 'Qrczak' Kowalczyk
How safe is representinging Unicode characters as Chars unsafeCoerce#d from large Ints? Seems to work in simple cases :-) -- __("<Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ GCS/M d- s+:-- a23 C+++$ UL++>$ P+++ L++>$ E- ^^

Re: Unicode support

1998-04-24 Thread Frank A. Christoph
>> What is the status of the lastest release (3.01) with respect to Unicode >> support? Is it possible to write source in Unicode? How wide are >> characters? Do the I/O library functions support it? etc. > >I don't believe that we've done anything much

Unicode support

1998-04-23 Thread Frank A. Christoph
What is the status of the lastest release (3.01) with respect to Unicode support? Is it possible to write source in Unicode? How wide are characters? Do the I/O library functions support it? etc. --FC