On 10 November 2011 14:35, Simon Marlow marlo...@gmail.com wrote:
Agreed.
Committed.
I'm wondering if we should also have hSetLocaleEncoding,
hSetFileSystemEncoding :: TextEncoding - IO () and change
localeEncoding, fileSystemEncoding :: IO TextEncoding.
hSetFileSystemEncoding in
On 10 November 2011 00:17, Ian Lynagh ig...@earth.li wrote:
On Wed, Nov 09, 2011 at 03:58:47PM +, Max Bolingbroke wrote:
(Note that the above outlined problems are problems in the current
implementation too
Then the proposal seems to me to be strictly better than the current
system.
On 9 November 2011 16:29, Simon Marlow marlo...@gmail.com wrote:
Ok, so since we need something like
makePrintable :: FilePath - String
arguably we might as well make that do the locale decoding. That's
certainly a good point...
You could, but getArgs :: IO [String], not :: IO [FilePath].
On 09/11/2011 16:42, John Millikin wrote:
On Wed, Nov 9, 2011 at 08:04, Simon Marlowmarlo...@gmail.com wrote:
Ok, I spent most of today adding ByteString alternatives for all of the
functions in System.Posix that use FilePath or environment strings. The
Haddocks for my augmented unix package
On 10/11/2011 09:28, Max Bolingbroke wrote:
Is there any consensus about what to do here? My take is that we
should move back to lone surrogates. This:
1. Recovers the roundtrip property, which we appear to believe is essential
2. Removes all the weird problems I outlined earlier that can
On Thu, Nov 10, 2011 at 03:28, Simon Marlow marlo...@gmail.com wrote:
I've done a search/replace and called it RawFilePath. Ok?
Fantastic, thank you very much.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
On 8 November 2011 11:43, Simon Marlow marlo...@gmail.com wrote:
Don't you mean 1 is what we have?
Yes, sorry!
Failing to roundtrip in some cases, and doing so silently, seems highly
suboptimal to me. I'm sorry I didn't pick up on this at the time (Unicode
is a swamp :).
I *can* change the
On 7 November 2011 17:32, John Millikin jmilli...@gmail.com wrote:
I am also not convinced that it is possible to correctly implement
either of these functions if their behavior is dependent on the user's
locale.
FWIW it's only dependent on the users locale because whether glibc
iconv detects
On Wed, Nov 09, 2011 at 11:02:54AM +, Simon Marlow wrote:
I would be happy with the surrogate approach I think. Arguable if
you try to treat a string with lone surrogates as Unicode and it
fails, then that is a feature: the original string wasn't Unicode.
All you can do with an invalid
On 9 November 2011 13:11, Ian Lynagh ig...@earth.li wrote:
If we aren't going to guarantee that the encoded string is unicode, then
is there any benefit to encoding it in the first place?
(I think you mean decoded here - my understanding is that decode ::
ByteString - String, encode :: String -
On 9 November 2011 11:02, Simon Marlow marlo...@gmail.com wrote:
The performance overhead of all this worries me. withCString has taken a
huge performance hit, and I think there are people who wnat to know that
there aren't several complex encoding/decoding passes between their Haskell
code
On 08/11/2011 15:42, John Millikin wrote:
On Tue, Nov 8, 2011 at 03:04, Simon Marlowmarlo...@gmail.com wrote:
I really think we should provide the native APIs. The problem is that the
System.Posix.Directory API is all in terms of FilePath (=String), and if we
gave that a different meaning
On 09/11/2011 13:11, Ian Lynagh wrote:
On Wed, Nov 09, 2011 at 11:02:54AM +, Simon Marlow wrote:
I would be happy with the surrogate approach I think. Arguable if
you try to treat a string with lone surrogates as Unicode and it
fails, then that is a feature: the original string wasn't
On Wed, Nov 9, 2011 at 08:04, Simon Marlow marlo...@gmail.com wrote:
Ok, I spent most of today adding ByteString alternatives for all of the
functions in System.Posix that use FilePath or environment strings. The
Haddocks for my augmented unix package are here:
On 09/11/2011 15:58, Max Bolingbroke wrote:
(Note that the above outlined problems are problems in the current
implementation too -- but the current implementation doesn't even
pretend to support U+EFxx characters. Its correctness is entirely
dependent on them never showing up, which is why we
On Wed, Nov 09, 2011 at 03:58:47PM +, Max Bolingbroke wrote:
(Note that the above outlined problems are problems in the current
implementation too
Then the proposal seems to me to be strictly better than the current
system. Under both systems the wrong thing happen when U+EFxx is entered
My primary concerns are (in order of priority - and I only speak for myself)
(a) consistency across platforms
(b) minimize (unrequired) performance overhead
I would prefer an api which is consistent for both win32, posix or other
os which only did as much as what the user (us) wanted
for
On 07/11/2011 17:57, Ian Lynagh wrote:
On Mon, Nov 07, 2011 at 05:02:32PM +, Simon Marlow wrote:
Basically, imagine a reversible transformation:
encode :: String - [Word8]
decode :: [Word8] - String
this transformation is applied in the appropriate direction by the
IO library to
On 07/11/2011 17:32, John Millikin wrote:
On Mon, Nov 7, 2011 at 09:02, Simon Marlowmarlo...@gmail.com wrote:
I think you might be misunderstanding how the new API works. Basically,
imagine a reversible transformation:
encode :: String - [Word8]
decode :: [Word8] - String
this
On 02/11/2011 21:40, Max Bolingbroke wrote:
On 2 November 2011 20:16, Ian Lynaghig...@earth.li wrote:
Are you saying there's a bug that should be fixed?
You can choose between two options:
1. Failing to roundtrip some strings (in our case, those containing
the 0xEFNN byte sequences)
2.
On Tue, Nov 8, 2011 at 03:04, Simon Marlow marlo...@gmail.com wrote:
As mentioned earlier in the thread, this behavior is breaking things.
Due to an implementation error, programs compiled with GHC 7.2 on
POSIX systems cannot open files unless their paths also happen to be
valid text according
On 11/8/11 6:04 AM, Simon Marlow wrote:
I really think we should provide the native APIs. The problem is that
the System.Posix.Directory API is all in terms of FilePath (=String),
and if we gave that a different meaning from the System.Directory
FilePaths then confusion would ensue. So perhaps
On 06/11/2011 16:56, John Millikin wrote:
2011/11/6 Max Bolingbrokebatterseapo...@hotmail.com:
On 6 November 2011 04:14, John Millikinjmilli...@gmail.com wrote:
For what it's worth, on my Ubuntu system, Nautilus ignores the locale
and just treats all paths as either UTF8 or invalid.
To me,
On Mon, Nov 7, 2011 at 09:02, Simon Marlow marlo...@gmail.com wrote:
I think you might be misunderstanding how the new API works. Basically,
imagine a reversible transformation:
encode :: String - [Word8]
decode :: [Word8] - String
this transformation is applied in the appropriate
On Mon, Nov 07, 2011 at 05:02:32PM +, Simon Marlow wrote:
Basically, imagine a reversible transformation:
encode :: String - [Word8]
decode :: [Word8] - String
this transformation is applied in the appropriate direction by the
IO library to translate filesystem paths into
Simon Marlow wrote:
It would probably be better to have an abstract FilePath type and to keep
the original bytes, decoding on demand. But that is a big change to the API
and would break much more code. One day we'll do this properly; for now we
have this, which I think is a pretty reasonble
On Mon, Nov 7, 2011 at 15:39, Yitzchak Gale g...@sefer.org wrote:
The problem is that Haskell 98 specifies type FilePath = String.
In retrospect, we now know that this is too simplistic.
But that's what we have right now.
This is *a* problem, but not a particularly major one; the definition
of
On 6 November 2011 04:14, John Millikin jmilli...@gmail.com wrote:
For what it's worth, on my Ubuntu system, Nautilus ignores the locale
and just treats all paths as either UTF8 or invalid.
To me, this seems like the most reasonable option; the concept of
locale encoding is entirely vestigal,
2011/11/6 Max Bolingbroke batterseapo...@hotmail.com:
On 6 November 2011 04:14, John Millikin jmilli...@gmail.com wrote:
For what it's worth, on my Ubuntu system, Nautilus ignores the locale
and just treats all paths as either UTF8 or invalid.
To me, this seems like the most reasonable option;
Quoth John Millikin jmilli...@gmail.com,
...
One is to give low-level access, using abstractions as close to the
real API as possible. In this model, unix would provide functions
like [[ rename :: ByteString - ByteString - IO () ]], and I would
know that it's not going to do anything weird to
for what it is worth, I would like to see both System.IO and Directory
export internal functions where the filepath is a Raw Byte representation.
I have utilities that regularly scan 100,000 of files and hash the path
the details of which are irrelevant to this discussion, the point being
that
Can't we just have the usual .Internal module convention, where people who
want internals can get at them if they need to, and most people get a
simpler interface? It's amazingly frustrating when you have a library that
does 99% of what you need it to do, except for one tiny internal detail
that
FYI: I just released new versions of system-filepath and
system-fileio, which attempt to work around the changes in GHC 7.2.
On Wed, Nov 2, 2011 at 11:55, Max Bolingbroke
batterseapo...@hotmail.com wrote:
Maybe I'm misunderstanding, but it sounds like you're still trying to
treat posix file
On Thu, Nov 03, 2011 at 09:41:32AM +, Max Bolingbroke wrote:
On 2 November 2011 21:46, Ganesh Sittampalam gan...@earth.li wrote:
The workaround you propose seems a little complex and it might be a bit
problematic that 100% roundtripping can't be guaranteed even once your
fix is applied.
I
On 2 November 2011 21:46, Ganesh Sittampalam gan...@earth.li wrote:
The workaround you propose seems a little complex and it might be a bit
problematic that 100% roundtripping can't be guaranteed even once your
fix is applied.
I can understand this perspective, although the roundtripping as
On 1 November 2011 20:13, John Millikin jmilli...@gmail.com wrote:
$ ghci-7.2.1
GHC import System.Directory
GHC getDirectoryContents path-test
[\161\165,\61345\61349,..,.]
GHC readFile path-test/\161\165
world\n
GHC readFile path-test/\61345\61349
*** Exception: path-test/: openFile:
Hi,
On 01.11.2011, at 19:43, Max Bolingbroke wrote:
As I pointed out earlier in the thread you can recover the old
behaviour if you really want it by manually reencoding the strings, so
I would dispute the claim that it is impossible to fix within the
given API.
As far as I know, not all
On 2 November 2011 10:03, Jean-Marie Gaillourdet j...@gaillourdet.net wrote:
As far as I know, not all encodings are reversable. I.e. there are byte
sequences which are invalid utf-8. Therefore, decoding and re-encoding might
not return the exact same byte sequence.
The PEP 383 mechanism
On 2 November 2011 09:37, Max Bolingbroke batterseapo...@hotmail.com wrote:
On 1 November 2011 20:13, John Millikin jmilli...@gmail.com wrote:
$ ghci-7.2.1
GHC import System.Directory
GHC getDirectoryContents path-test
[\161\165,\61345\61349,..,.]
GHC readFile path-test/\161\165
world\n
On 2 November 2011 13:53, Max Bolingbroke batterseapo...@hotmail.com wrote:
I think the only way to fix this last case in general is to fix iconv
itself, so I'm going to see if I can get a patch upstream. Fixing it
for people with UTF-8 locales should be enough for 99% of users,
though.
One
On Wed, Nov 02, 2011 at 01:29:16PM +, Max Bolingbroke wrote:
On 2 November 2011 10:03, Jean-Marie Gaillourdet j...@gaillourdet.net wrote:
As far as I know, not all encodings are reversable. I.e. there are byte
sequences which are invalid utf-8. Therefore, decoding and re-encoding
On Wed, Nov 2, 2011 at 06:53, Max Bolingbroke
batterseapo...@hotmail.com wrote:
I've got a patch that will work around the issue in most situations by
avoiding the iconv code path. With the patch everything will work OK
as long as the system locale is one that we have a native-Haskell
decoder
On 2 November 2011 17:15, John Millikin jmilli...@gmail.com wrote:
What package does this patch -- unix, directory, something else?
The base package. The problem lay in the implementation of
GHC.IO.Encoding.fileSystemEncoding on non-Windows OSes.
Maybe I'm misunderstanding, but it sounds like
On 2 November 2011 16:29, Ian Lynagh ig...@earth.li wrote:
If I understand correctly, you use U+EF00-U+EFFF to encode the
characters 0-255 when they are not a valid part of the UTF8 stream.
Yes.
So why not encode U+EF00 (which in UTF8 is 0xEE 0xBC 0x80) as
U+EFEE U+EFBC U+EF80, and so on?
On Wed, Nov 02, 2011 at 07:02:09PM +, Max Bolingbroke wrote:
[snip some stuff I didn't understand. I think I made the mistake of
entering a Unicode discussion]
This is why the unmodified PEP383 approach is kind of nice - it uses
lone surrogate (rather than private use) codepoints to do the
On 2 November 2011 19:13, Ian Lynagh ig...@earth.li wrote:
[snip some stuff I didn't understand. I think I made the mistake of
entering a Unicode discussion]
Sorry, perhaps that was too opaque! The problem is that if we commit
to support occurrences of the private-use codepoint 0xEF80 then what
On Wed, Nov 02, 2011 at 07:59:21PM +, Max Bolingbroke wrote:
On 2 November 2011 19:13, Ian Lynagh ig...@earth.li wrote:
They are allowed to occur in Linux/ext2 filenames, anyway, and I think
we ought to be able to handle them correctly if they do.
In Python, if a filename is decoded
On 2 November 2011 20:16, Ian Lynagh ig...@earth.li wrote:
Are you saying there's a bug that should be fixed?
You can choose between two options:
1. Failing to roundtrip some strings (in our case, those containing
the 0xEFNN byte sequences)
2. Having GHC's decoding functions return strings
Hi Max,
On 01/11/2011 10:23, Max Bolingbroke wrote:
This is my implementation of Python's PEP 383 [1] for Haskell.
IMHO this behaviour is much closer to what users expect.For example,
getDirectoryContents . = print shows Unicode filenames properly.
As a result of this change we were able
Hi,
I'm just investigating what we can do about a problem with darcs'
handling of non-ASCII filenames on GHC 7.2.
The issue is apparently that as of GHC 7.2, getDirectoryContents now
tries to decode filenames in the current locale, rather than converting
a stream of bytes into characters:
Hi Ganesh,
On 1 November 2011 07:16, Ganesh Sittampalam gan...@earth.li wrote:
Can anyone point me at the rationale and details of the change and/or
suggest workarounds?
This is my implementation of Python's PEP 383 [1] for Haskell.
IMHO this behaviour is much closer to what users expect.For
On Tue, Nov 1, 2011 at 5:16 AM, Ganesh Sittampalam gan...@earth.li wrote:
I'm just investigating what we can do about a problem with darcs'
handling of non-ASCII filenames on GHC 7.2.
The issue is apparently that as of GHC 7.2, getDirectoryContents now
tries to decode filenames in the current
You're right -- many parts of system-fileio (the parts based on
directory) are broken due to this. I'll need to update it to call
the posix/win32 functions directly.
IMO, the GHC behavior in =7.0 is ugly, but the behavior in 7.2 is
fundamentally wrong.
Different OSes have different definitions
Hi John,
On 1 November 2011 17:14, John Millikin jmilli...@gmail.com wrote:
GHC 7.2 assumes Linux/BSD paths are text, which 1) silently breaks all
existing code and 2) makes it impossible to fix within the given API.
Please can you give an example of code that is broken with the new
behaviour?
On Tue, Nov 1, 2011 at 11:43, Max Bolingbroke
batterseapo...@hotmail.com wrote:
Hi John,
On 1 November 2011 17:14, John Millikin jmilli...@gmail.com wrote:
GHC 7.2 assumes Linux/BSD paths are text, which 1) silently breaks all
existing code and 2) makes it impossible to fix within the given
55 matches
Mail list logo