Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
В сообщении от 24 апреля 2010 06:14:55 вы написали: Khudyakov Alexey wrote: Actually, the behavior of openFile when given a String with characters 0xFF is also completely undocumented. I am not sure what it does with that. It should probably be the same as runCommand, whatever it is. Under unices file names are just array of bytes. There is no notion of encoding at all. It's just matter of interpretation of that array. Quite right. One must be able to pass binary strings, which contain anything except \0 and '/' to openFile. The same goes for runCommand. I am uncomfortable, for this reason, with saying that runCommand ought to re-encode in the system locale while openFile doesn't. It is preferable to drop characters than to drop the ability to pass arbitrary binary data. But truncation makes impossible to pass non ASCII strings portably. They should be encoded there is no easy way to do so. Actually problem is use of strings. String is sequence of _characters_ and program talk to outside world using sequence of bytes. I think that right (but impossible) way to solve this problem is to use separate data types for file path, command line arguments. Something along the lines: data FilePath = ... stringToFilePath :: String - Maybe FilePath filePathToString :: FilePath - Maybe String Both functions are non total hence presence of Maybes. But it break a LOT of code and violate language definition. I think there are two alternatives. One is to encode/decode strings using current locale and provide [Word8] based variants. Main problem is that seeming innocent actions like getting directory content could crash program (exception ) Another options is to provide function to encode/decode strings. This is ugly and mix strings which hold characters and string which hold bytes and completely unhaskellish but it seems there is no good solution. Also truncation could have security implications. It makes almost impossible to escape dangerous characters robustly. Consider following code. This is more matter of speculations than real threat but nevertheless: evil, maskedEvil :: String evil = I am an evil script; date; echo I\\'m doing whatever I want maskedEvil = map (toEnum . (+256) . fromEnum) evil -- Should escape all dangerous chars escape :: String - String escape = id oops :: IO () oops = do runCommand (echo ++ maskedEvil ++ ) return () ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Unicode strings and runCommand / runProcess
Here is a very interesting little problem. ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m System.Process Prelude System.Process runCommand echo привет ?...@825b This is a minimal test case for a bug reported in HSH at http://github.com/jgoerzen/hsh/issues#issue/1 It is not entirely clear to me what the behavior here should be. It seems inconsistent with the default behavior of System.IO to, apparently, just strip the bits higher than 0xFF. On the other hand, when it's OS commands we're talking about, it's not entirely clear to me if the default should be to encode in UTF-8. There should almost certainly be an *option* controlling this, and perhaps a version of runProcess that accepts ByteStrings. Thoughts? -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
В сообщении от 23 апреля 2010 21:44:29 John Goerzen написал: Here is a very interesting little problem. ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m System.Process Prelude System.Process runCommand echo привет ?...@825b This is a minimal test case for a bug reported in HSH at http://github.com/jgoerzen/hsh/issues#issue/1 It is not entirely clear to me what the behavior here should be. It seems inconsistent with the default behavior of System.IO to, apparently, just strip the bits higher than 0xFF. On the other hand, when it's OS commands we're talking about, it's not entirely clear to me if the default should be to encode in UTF-8. There should almost certainly be an *option* controlling this, and perhaps a version of runProcess that accepts ByteStrings. It should just use system locale for encoding like System.IO do. FYI I just submitted bug to GHC trac: http://hackage.haskell.org/trac/ghc/ticket/4006 P.S. Haskell libraries aren't very well designed with respect to unicode and company. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
John Goerzen jgoer...@complete.org writes: ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m System.Process Prelude System.Process runCommand echo привет ?...@825b Are you arguing about IO-specific stuff like this, or for all non-ASCII Strings? -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
Ivan Lazar Miljenovic wrote: John Goerzen jgoer...@complete.org writes: ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m System.Process Prelude System.Process runCommand echo привет ?...@825b Are you arguing about IO-specific stuff like this, or for all non-ASCII Strings? I'm not sure I understand the question. I consider the behavior in System.IO to be well-documented. The behavior in System.Process is not documented at all. As I said, I'm not certain what the proper answer is, but not documenting what happens probably isn't it. Actually, the behavior of openFile when given a String with characters 0xFF is also completely undocumented. I am not sure what it does with that. It should probably be the same as runCommand, whatever it is. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
В сообщении от 24 апреля 2010 03:50:54 John Goerzen написал: Ivan Lazar Miljenovic wrote: John Goerzen jgoer...@complete.org writes: ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude :m System.Process Prelude System.Process runCommand echo привет ?...@825b Are you arguing about IO-specific stuff like this, or for all non-ASCII Strings? I'm not sure I understand the question. I consider the behavior in System.IO to be well-documented. The behavior in System.Process is not documented at all. As I said, I'm not certain what the proper answer is, but not documenting what happens probably isn't it. Actually, the behavior of openFile when given a String with characters 0xFF is also completely undocumented. I am not sure what it does with that. It should probably be the same as runCommand, whatever it is. Under unices file names are just array of bytes. There is no notion of encoding at all. It's just matter of interpretation of that array. There is a problem with FilePath data type. It's String actually but it should be abstract data type. There is relevant bug[1] on GHC trac. P.S. openFile truncates Chars. [1] http://hackage.haskell.org/trac/ghc/ticket/3307 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings and runCommand / runProcess
Khudyakov Alexey wrote: Actually, the behavior of openFile when given a String with characters 0xFF is also completely undocumented. I am not sure what it does with that. It should probably be the same as runCommand, whatever it is. Under unices file names are just array of bytes. There is no notion of encoding at all. It's just matter of interpretation of that array. Quite right. One must be able to pass binary strings, which contain anything except \0 and '/' to openFile. The same goes for runCommand. I am uncomfortable, for this reason, with saying that runCommand ought to re-encode in the system locale while openFile doesn't. It is preferable to drop characters than to drop the ability to pass arbitrary binary data. So I am not sure I agree with your stance in http://hackage.haskell.org/trac/ghc/ticket/4006 -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Unicode strings
Hello, I am trying to make a program that outputs some Unicode characters but the output doesn't match what I try to print. Attached is a little test program. It tries to print the arrows ←↑→↓ but instead it outputs \220\221\222\223 (that is, character number 220, then 221, then 222). I've also tried writing the Unicode code points (although GHC 6.6 should deal just fine with Unicode source code) and I get the same result. In case anybody wants to try, this would be the string: \8592\8593\8594\8595. I am also attaching the output file, you can see that the contents are not right. Any ideas what am I doing wrong here ? Thank you. -- Pupeno [EMAIL PROTECTED] (http://pupeno.com) import qualified System.IO as IO main = do let str = ←↑→↓ putStrLn str h - IO.openFile test.output IO.WriteMode IO.hPutStrLn h str pgpEC1fzMtr5Z.pgp Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings
The problem is that GHC's output functions only print the lowest 8 bits of each code point. To print these higher code points, you'll need to translate your [Char] into a byte encoding that your terminal will understand (most likely UTF-8). I know there are several of these floating around in the wild, hopefully someone will chime in with a code snippet soon. Also, I seem to remember that Bulat's Streams library supports some Unicode encodings, perhaps you can check there? Cheers, Spencer Janssen On Nov 5, 2006, at 12:17 PM, Pupeno wrote: Hello, I am trying to make a program that outputs some Unicode characters but the output doesn't match what I try to print. Attached is a little test program. It tries to print the arrows ←↑→↓ but instead it outputs \220\221\222\223 (that is, character number 220, then 221, then 222). I've also tried writing the Unicode code points (although GHC 6.6 should deal just fine with Unicode source code) and I get the same result. In case anybody wants to try, this would be the string: \8592\8593\8594\8595. I am also attaching the output file, you can see that the contents are not right. Any ideas what am I doing wrong here ? Thank you. -- Pupeno [EMAIL PROTECTED] (http://pupeno.com) test.hs test.output ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings
Hello,http://repetae.net/repos/jhc/UTF8.hs has some nice functions for UTF-8 - unicode conversions.Regards,-- Intelligence is like a river: the deeper it is, the less noise it makes ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicode strings
Pupeno wrote: Hello, I am trying to make a program that outputs some Unicode characters but the output doesn't match what I try to print. Attached is a little test program. It tries to print the arrows ←↑→↓ but instead it outputs \220\221\222\223 (that is, character number 220, then 221, then 222). I've also tried writing the Unicode code points (although GHC 6.6 should deal just fine with Unicode source code) and I get the same result. In case anybody wants to try, this would be the string: \8592\8593\8594\8595. I am also attaching the output file, you can see that the contents are not right. Any ideas what am I doing wrong here ? Thank you. import qualified System.IO as IO main = do let str = ←↑→↓ putStrLn str h - IO.openFile test.output IO.WriteMode IO.hPutStrLn h str The problem is with the output of putStrLn. You need to encode the string into utf8. You can find several modules on the web, i have been using this one: http://www.haskell.org/pipermail/haskell-i18n/2004-February/000127.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe