Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-24 Thread Khudyakov Alexey
В сообщении от 24 апреля 2010 06:14:55 вы написали:
 Khudyakov Alexey wrote:
  Actually, the behavior of openFile when given a String with characters 
  0xFF is also completely undocumented.  I am not sure what it does with
  that.  It should probably be the same as runCommand, whatever it is.
 
  Under unices file names are just array of bytes. There is no notion of
  encoding at all. It's just matter of interpretation of that array.
 
 Quite right.  One must be able to pass binary strings, which contain
 anything except \0 and '/' to openFile.  The same goes for runCommand.
 I am uncomfortable, for this reason, with saying that runCommand ought
 to re-encode in the system locale while openFile doesn't.  It is
 preferable to drop characters than to drop the ability to pass arbitrary
 binary data.
 
But truncation makes impossible to pass non ASCII strings portably. They 
should be encoded there is no easy way to do so. 

Actually problem is use of strings. String is sequence of _characters_ and 
program talk to outside world using sequence of bytes. I think that right (but 
impossible) way to solve this problem is to use separate data types for file 
path, command line arguments.

Something along the lines:
 data FilePath = ...

 stringToFilePath :: String - Maybe FilePath
 filePathToString :: FilePath - Maybe String

Both functions are non total hence presence of Maybes. But it break a LOT of 
code and violate language definition.

I think there are two alternatives. One is to encode/decode strings using 
current locale and provide [Word8] based variants. Main problem is that 
seeming innocent actions like getting directory content could crash program 
(exception )

Another options is to provide function to encode/decode strings. This is ugly 
and mix strings which hold characters and string which hold bytes and 
completely unhaskellish but it seems there is no good solution.



Also truncation could have security implications. It makes almost impossible 
to escape dangerous characters robustly. Consider following code. This is more 
matter of speculations than real threat but nevertheless:

 evil, maskedEvil :: String
 evil = I am an evil script; date; echo I\\'m doing whatever I want
 maskedEvil = map (toEnum . (+256) . fromEnum) evil

 -- Should escape all dangerous chars
 escape :: String - String
 escape = id

 oops :: IO ()
 oops = do
   runCommand (echo  ++ maskedEvil ++ )
   return ()
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread John Goerzen

Here is a very interesting little problem.

ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude :m System.Process
Prelude System.Process runCommand echo привет
?...@825b

This is a minimal test case for a bug reported in HSH at 
http://github.com/jgoerzen/hsh/issues#issue/1


It is not entirely clear to me what the behavior here should be.  It 
seems inconsistent with the default behavior of System.IO to, 
apparently, just strip the bits higher than 0xFF.  On the other hand, 
when it's OS commands we're talking about, it's not entirely clear to me 
if the default should be to encode in UTF-8.  There should almost 
certainly be an *option* controlling this, and perhaps a version of 
runProcess that accepts ByteStrings.


Thoughts?

-- John
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread Khudyakov Alexey
В сообщении от 23 апреля 2010 21:44:29 John Goerzen написал:
 Here is a very interesting little problem.
 
 ghci
 GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
 Loading package ghc-prim ... linking ... done.
 Loading package integer-gmp ... linking ... done.
 Loading package base ... linking ... done.
 Prelude :m System.Process
 Prelude System.Process runCommand echo привет
 ?...@825b
 
 This is a minimal test case for a bug reported in HSH at
 http://github.com/jgoerzen/hsh/issues#issue/1
 
 It is not entirely clear to me what the behavior here should be.  It
 seems inconsistent with the default behavior of System.IO to,
 apparently, just strip the bits higher than 0xFF.  On the other hand,
 when it's OS commands we're talking about, it's not entirely clear to me
 if the default should be to encode in UTF-8.  There should almost
 certainly be an *option* controlling this, and perhaps a version of
 runProcess that accepts ByteStrings.
 
It should just use system locale for encoding like System.IO do. 
FYI I just submitted bug to GHC trac:
http://hackage.haskell.org/trac/ghc/ticket/4006

P.S. Haskell libraries aren't very well designed with respect to unicode and 
company.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread Ivan Lazar Miljenovic
John Goerzen jgoer...@complete.org writes:
 ghci
 GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
 Loading package ghc-prim ... linking ... done.
 Loading package integer-gmp ... linking ... done.
 Loading package base ... linking ... done.
 Prelude :m System.Process
 Prelude System.Process runCommand echo привет
 ?...@825b

Are you arguing about IO-specific stuff like this, or for all non-ASCII
Strings?

-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread John Goerzen

Ivan Lazar Miljenovic wrote:

John Goerzen jgoer...@complete.org writes:

ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude :m System.Process
Prelude System.Process runCommand echo привет
?...@825b


Are you arguing about IO-specific stuff like this, or for all non-ASCII
Strings?



I'm not sure I understand the question.  I consider the behavior in 
System.IO to be well-documented.  The behavior in System.Process is not 
documented at all.  As I said, I'm not certain what the proper answer 
is, but not documenting what happens probably isn't it.


Actually, the behavior of openFile when given a String with characters  
0xFF is also completely undocumented.  I am not sure what it does with 
that.  It should probably be the same as runCommand, whatever it is.


-- John
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread Khudyakov Alexey
В сообщении от 24 апреля 2010 03:50:54 John Goerzen написал:
 Ivan Lazar Miljenovic wrote:
  John Goerzen jgoer...@complete.org writes:
  ghci
  GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
  Loading package ghc-prim ... linking ... done.
  Loading package integer-gmp ... linking ... done.
  Loading package base ... linking ... done.
  Prelude :m System.Process
  Prelude System.Process runCommand echo привет
  ?...@825b
 
  Are you arguing about IO-specific stuff like this, or for all non-ASCII
  Strings?
 
 I'm not sure I understand the question.  I consider the behavior in
 System.IO to be well-documented.  The behavior in System.Process is not
 documented at all.  As I said, I'm not certain what the proper answer
 is, but not documenting what happens probably isn't it.
 
 Actually, the behavior of openFile when given a String with characters 
 0xFF is also completely undocumented.  I am not sure what it does with
 that.  It should probably be the same as runCommand, whatever it is.
 
Under unices file names are just array of bytes. There is no notion of encoding 
at all. It's just matter of interpretation of that array. 

There is a problem with FilePath data type. It's String actually but it should 
be abstract data type. There is relevant bug[1] on GHC trac.

P.S. openFile truncates Chars.

[1] http://hackage.haskell.org/trac/ghc/ticket/3307
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings and runCommand / runProcess

2010-04-23 Thread John Goerzen

Khudyakov Alexey wrote:

Actually, the behavior of openFile when given a String with characters 
0xFF is also completely undocumented.  I am not sure what it does with
that.  It should probably be the same as runCommand, whatever it is.

Under unices file names are just array of bytes. There is no notion of encoding 
at all. It's just matter of interpretation of that array. 


Quite right.  One must be able to pass binary strings, which contain 
anything except \0 and '/' to openFile.  The same goes for runCommand. 
I am uncomfortable, for this reason, with saying that runCommand ought 
to re-encode in the system locale while openFile doesn't.  It is 
preferable to drop characters than to drop the ability to pass arbitrary 
binary data.


So I am not sure I agree with your stance in 
http://hackage.haskell.org/trac/ghc/ticket/4006


-- John
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Unicode strings

2006-11-05 Thread Pupeno
Hello,
I am trying to make a program that outputs some Unicode characters but the 
output doesn't match what I try to print.
Attached is a little test program. It tries to print the arrows ←↑→↓ but 
instead it outputs \220\221\222\223 (that is, character number 220, then 
221, then 222). I've also tried writing the Unicode code points (although GHC 
6.6 should deal just fine with Unicode source code) and I get the same 
result. In case anybody wants to try, this would be the 
string: \8592\8593\8594\8595.
I am also attaching the output file, you can see that the contents are not 
right.
Any ideas what am I doing wrong here ?
Thank you.
-- 
Pupeno [EMAIL PROTECTED] (http://pupeno.com)
import qualified System.IO as IO

main = do
let str = ←↑→↓
putStrLn str
h - IO.openFile test.output IO.WriteMode 
IO.hPutStrLn h str




pgpEC1fzMtr5Z.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings

2006-11-05 Thread Spencer Janssen
The problem is that GHC's output functions only print the lowest 8  
bits of each code point.  To print these higher code points, you'll  
need to translate your [Char] into a byte encoding that your terminal  
will understand (most likely UTF-8).  I know there are several of  
these floating around in the wild, hopefully someone will chime in  
with a code snippet soon.  Also, I seem to remember that Bulat's  
Streams library supports some Unicode encodings, perhaps you can  
check there?



Cheers,
Spencer Janssen

On Nov 5, 2006, at 12:17 PM, Pupeno wrote:


Hello,
I am trying to make a program that outputs some Unicode characters  
but the

output doesn't match what I try to print.
Attached is a little test program. It tries to print the arrows  
←↑→↓ but
instead it outputs \220\221\222\223 (that is, character number  
220, then
221, then 222). I've also tried writing the Unicode code points  
(although GHC

6.6 should deal just fine with Unicode source code) and I get the same
result. In case anybody wants to try, this would be the
string: \8592\8593\8594\8595.
I am also attaching the output file, you can see that the contents  
are not

right.
Any ideas what am I doing wrong here ?
Thank you.
--
Pupeno [EMAIL PROTECTED] (http://pupeno.com)
test.hs
test.output
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings

2006-11-05 Thread Piotr Kalinowski
Hello,http://repetae.net/repos/jhc/UTF8.hs has some nice functions for UTF-8 - unicode conversions.Regards,-- Intelligence is like a river: the deeper it is, the less noise it makes
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unicode strings

2006-11-05 Thread Luis F. Araujo
Pupeno wrote:
 Hello,
 I am trying to make a program that outputs some Unicode characters but the 
 output doesn't match what I try to print.
 Attached is a little test program. It tries to print the arrows ←↑→↓ but 
 instead it outputs \220\221\222\223 (that is, character number 220, then 
 221, then 222). I've also tried writing the Unicode code points (although GHC 
 6.6 should deal just fine with Unicode source code) and I get the same 
 result. In case anybody wants to try, this would be the 
 string: \8592\8593\8594\8595.
 I am also attaching the output file, you can see that the contents are not 
 right.
 Any ideas what am I doing wrong here ?
 Thank you.
   
 

 import qualified System.IO as IO

 main = do
 let str = ←↑→↓
 putStrLn str
 h - IO.openFile test.output IO.WriteMode 
 IO.hPutStrLn h str
   
   

The problem is with the output of putStrLn. You need to encode the
string into utf8.

You can find several modules on the web, i have been using this one:

http://www.haskell.org/pipermail/haskell-i18n/2004-February/000127.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe