Re: concurrency guarentees clarification
John Meacham [EMAIL PROTECTED] writes: * Foreign concurrent calls, handle IO, and all IO actions that directly interact with the world outside the current process all must be yield-points. (in addition to any yield-points implied by the progress guarentee) If an IO call includes a long period of waiting, we don't only want it to yield before or after it. We want it to allow other threads to proceed during the whole wait. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: unsafePerformIO and cooperative concurrency
Ashley Yakeley [EMAIL PROTECTED] writes: Is there a ticket for this? I would prefer that unsafePerformIO and friends not be part of the standard. I would prefer otherwise. Every implementation supports it, which proves that it's useful. And it's no less unsafe than FFI. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: FFI, safe vs unsafe
John Meacham [EMAIL PROTECTED] writes: Checking thread local state for _every_ foregin call is definitly not an option either. (but for specificially annotated ones it is fine.) BTW, does Haskell support foreign code calling Haskell in a thread which the Haskell runtime has not seen before? Does it work in GHC? If so, does it show the same ThreadId from that point until OS thread's death (like in Kogut), or a new ThreadId for each callback (like in Python)? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: FFI, safe vs unsafe
John Meacham [EMAIL PROTECTED] writes: I object to the idea that concurrent calls are 'safer'. getting it wrong either way is a bug. it should fail in the most obvious way rather than the way that can remain hidden for a long time. I wouldn't consider it a bug of an implementation if it makes a call behave like concurrent when it's specified as non-concurrent. If a library wants to make it a critical section, it should use a mutex (MVar). Or there should be another kind of foreign call which requires serialization of calls. But of which calls? it's rarely the case that it needs to be serialized with other calls to the same function only, and also rare that it must be serialized with everything else, so the granularity of the mutex must be explicit. It's fine to code the mutex explicitly if there is a kosher way to make it global. Non-concurrent calls which really blocks other thread should be treated only as an efficiency trick, as in implementations where the runtime is non-reentrant and dispatches threads running Haskell code internally, making such call without ensuring that other Haskell threads have other OS threads to run them is faster. OTOH in implementations which run Haskell threads truly in parallel, the natural default is to let C code behave concurrently. Ensuring that it is serialized would require extra work which is counter-productive. For functions like sqrt() the programmer wants to say that there is no need to make it concurrent, without also saying that it requires calls to be serialized. Which is why I'd prefer some term involving 'blocking' because that is the issue. blocking calls are exactly those you need to make concurrent in order to ensure the progress guarentee. What about getaddrinfo()? It doesn't synchronize with the rest of the program, it will eventually complete no matter whether other threads make progress, so making it concurrent is not necessary for correctness. It should be made concurrent nevertheless because it might take a long time. It does block; if it didn't block but needed the same time for an internal computation which doesn't go back to Haskell, it would still benefit from making the call concurrent. It is true that concurrent calls often coincide with blocking. It's simply the most common reason for a single non-calling-back function to take a long time, and one which can often be predicted statically (operations of extremely long integers might take a long time too, but it would be hard to differentiate them from the most which don't). The name 'concurrent' would be fine with me if the default is 'not necessarily concurrent'. If concurrent calls are the default, the name 'nonconcurrent' is not so good, because it would seem to imply some serialization which is not mandatory. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: Signals + minimal proposal
Simon Marlow [EMAIL PROTECTED] writes: How does cancelling a thread differ from sending it an exception? It doesn't. By cancelling I mean just sending a particular async exception. Can a thread be GC'd without being sent an exception first? Yes, but I'm now changing this. Unfortunately it doesn't seem possible to guarantee proper stack unwinding in all cases: POSIX threads evaporate after fork() in the child process. This means that foreign code performing callbacks doesn't exist there, except in the thread doing the fork; the C stacks are inaccessible. So in Kogut in this case bound threads become unbound, and they only run up to the end of the nearest callback from foreign code: then they are killed immediately. Another technical limitation: in a build where OS threads are not used, callbacks returning in a non-LIFO order must wait for the callback using the top of the OS stack to return. It's impossible to cause such thread to continue immediately even if it gets an exception. There are also limitations caused by principles I've adopted myself. I have scoped mutex locking and scoped unlocking. In particular waiting for a condition unlocks the mutex and always relocks it before returning. When a thread is waiting to relock a mutex when exiting a scope (rather than when entering a scope), it must absolutely lock it before it can continue, in order to guarantee consistent state of the mutex in regions of code. So I'm going to send threads about to be GC'd a signal rather than an exception; it will be handled only if the thread has signals unblocked. There is another case similar to GC'ing a thread: when the runtime discovers that there are no threads to be run, to wait for I/O, to wait for a timeout, and the thread handling system signals doesn't seem to be intentionally waiting for signals (it's not blocked on a Kogut construct similar to POSIX sigwait), the runtime attempts to wake up the thread handling system signals with a Deadlock signal, so the program can react to a total deadlock. Of course a deadlock of only a subset of threads won't be detected if the threads are not GC'd. When waiting for the rest of threads at program exit, it might happen that some threads won't want to return after being cancelled, e.g. they have signals blocked or they lock up during cleanup. Such case would normally be a deadlock (the main thread is waiting until they finish, and they are waiting for something else), but the above mechanism causes the main thread to be woken up and continue even though some threads have not finished. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: Signals + minimal proposal
John Meacham [EMAIL PROTECTED] writes: forkIO action = forkIO' action' where action' = do myThreadId = onExit . throwTo PleaseExit action This would be a memory leak: even after the thread finishes, its onExit handler would remain registered. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: asynchronous exceptions
Simon Marlow [EMAIL PROTECTED] writes: BTW, I just realised a better way to express block. If block is supposed to count nesting, then we have a problem that you can still unblock exceptions even within a block by using sufficient number of unblocks, so the right way is to give block this type: block :: ((IO b - IO b) - IO a) - IO a Or perhaps: block :: ((forall b. IO b - IO b) - IO a) - IO a It doesn't fit classic mutexes and conditions with my semantics of implicit blocking, because condition wait should restore the blocking state to the one before taking the mutex - but the condition wait is written without explicit passing of any value from the point of taking the mutex. I'm not sure how it carries over to Haskell patterns though. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: asynchronous exceptions
Simon Marlow [EMAIL PROTECTED] writes: I think it's unnecessary to treat signals in the way you do - you're assuming that a signal interrupts the current thread and runs a new computation (the signal handler) on the same stack, completely blocking the interrupted thread until the signal handler completes. This is the wrong way to view signal handlers, IMO: they should run in completely separate threads (perhaps a higher priority thread, if possible). This can be emulated in my model: by designating a thread for system signals, possibly even letting it spawn a new thread for each signal. Most Unix signals are supposed to abort the process however, and thus a mechanism for aborting one thread from another is needed anyway. I think async exceptions are not that much easier than async signals. Async signals include the ability to pause threads in safe points, which is needed for SIGSTOP / SIGTSTP and for my fork() wrapper. This is not archievable with signals spawning threads + async exceptions. + you don't have to block signals just because you happen to be holding a mutex. Synchronisation with a signal handler is just synchronisation with another thread. It's still very probable that taking a mutex coincides with the need to block async exceptions: an async exception in the middle of a critical section implies a danger of leaving data in an inconsistent state. Reasons for automatic blocking of async signals carry over to async exceptions. I agree with your assessment of the problems with interruptible operations in GHC: that it is impossible to completely block async exceptions across a computation. We could certainly add a way to do this. Is that the substance of your objection to GHC's async exception mechanism? Regarding interruptible operations, this and one more thing that I haven't written there: The fact that some function uses an interruptible operation internally is a visible aspect of its behavior, both in GHC design and in mine. This means that choosing which operations are interruptible should be done carefully: even if some operation blocks the thread, it might be a bad choice for an interruption point, because usage of some blocking operations should better not have to be exposed. In my case such blocking but uninterruptible operations include waiting for a mutex, and waiting for a lazy variable, among others. But Concurrent Haskell uses a single construct of MVars as mutexes, semaphores, or communication channels. The runtime can't recognize the pattern of usage to distinguish these cases. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: important news: refocusing discussion
Simon Marlow [EMAIL PROTECTED] writes: I think it would be a mistake to relegate concurrency to an addendum; it is a central feature of the language, and in fact is one area where Haskell (strictly speaking GHC) is really beginning to demonstrate significant advantages over other languages. We should make the most of it. I agree. Concurrency is needed for finalizers (except those which only call system functions, without mutating other objects). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: asynchronous exceptions
Simon Marlow [EMAIL PROTECTED] writes: I'm not sure whether asynchronous exceptions should be in Haskell'. I don't feel entirely comfortable about the interruptible operations facet of the design, I designed that differently for my language. There is a distinct synchronous mode where asynchronous exceptions are handled by certain operations only, similarly to POSIX deferred mode of thread cancellation. This allows to use blocking operations without being interrupted. http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf Actually I support asynchronous signals, not just exceptions: the reaction to a signal can be something other than throwing an exception. For example Linux SIGWINCH should be handled by resizing and repainting the screen at an appropriate moment, not by aborting any computation in progress. The fact that throwTo can interrupt a takeMVar, but can't interrupt a foreign call, even a concurrent one, is a bit strange. When entering foreign mode in C code embedded in Kogut (which corresponds to concurrent foreign imports in Haskell but is less convenient to use), it's possible to specify how that thread wishes to be interrupted in case someone else sends it an asynchronous signal during the foreign trip. The only implemented application of this mechanism is sending a Unix signal. This is enough to interrupt blocking syscalls like waitpid. If waitpid fails and errno == EINTR, pending signals for this thread are processed and waiting continues (unless some signal handler has thrown an exception). Implementing this without race conditions requires a sigsafe library or something equivalent. John Meacham [EMAIL PROTECTED] writes: * do we require the thrower to 'block' until the signal is recieved? (only relevant to pre-emptive implementations) My language doesn't do it, and I'm not convinced that Haskell should block. It's more efficient to make this non-blocking, and I think usually this is what is needed. * what happens if mutilple thrown exceptions pile up before the catcher gets to them? In my language each thread has a queue of pending asynchronous signals, and they are processed in order. Handling an asynchronous signal, or throwing an exception until it is handled, blocks further signals automatically, so more signals are processed only after the previous signal was handled. An exception handler is not in a tail position wrt. the catching construct, for two reasons: the state of asynchronous signals is restored after handling the exception, and a stack trace shown when the exception is propagated to the toplevel without being handled includes code in unfinished exception handlers. There is a separate exception handling syntax when the exception should be considered already handled, for cases when the exception handler should be in a tail context. * what happns to exceptions that fall off the end of threads, or the main thread? (should be answered anyway) In my case a thread body ends with a value or with an exception, and this can be examined when joining a thread, or by default the exception is propagated in the joiner. This has a disadvantage that errors in threads nobody waits for might be left undetected, unless they use an explicit wrapper. For the main thread there is a settable handler of exceptions reaching the toplevel, which by default handles some exceptions specially (Unix signals, and a request of program termination), and others are printed along with a stack trace. * promtness? how much work is the target allowed to do before it sees the exception? pthreads allows an implementation to delay processing an exception to a cancellation point do we want the same thing in haskell? Perhaps. My design includes that. David Roundy [EMAIL PROTECTED] writes: It would also be nice to address signal behavior, and by default state that signals should be converted to asynchronous exceptions. This is not enough for SIGWINCH, or for SIGHUP used to trigger reloading configuration files. OTOH purity of Haskell's functional subsystem has some nice consequences for asynchronous exceptions which don't have to carry over to asynchronous signals which don't necessarily abort the computation. If the signal is guaranteed to abort some part of IO code, then it makes sense to revert thunks under evaluation. If the signal only causes to execute some handler, then reverting them might be wasteful, as they will soon be needed again. The only downside I can see of this as default behavior would be that in cooperative systems the response to a sigTERM might be very slow. Right, it's a pity, and I agree that benefits outweigh this problem. In my implementation the thread which handles system signals (settable, defaults to the main thread) needs to be chosen by the scheduler in order to process the signal. It might take some time if there is a lot of threads. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^
Re: How to find out the C type signature corresponding to a Haskell function type in FFI?
Brian Hulley [EMAIL PROTECTED] writes: I've got a Haskell module with the following ffi import: foreign import ccall duma_init :: Int - IO Int However my problem is that I've got no idea what the type signature for the corresponding C function should be, HsInt duma_init(HsInt arg); Or use int on the C side and CInt on the Haskell side. fromIntegral can be used for converting integers in Haskell. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Export lists in modules
Simon Marlow [EMAIL PROTECTED] writes: The main difference is that I'm doing away with parentheses, commas, and export specifiers, and using layout and full declaration syntax instead. (I don't really want to discuss this very rough idea any more though, it's just a distraction, and I'm not sure I like it anyway). I like this general idea, I was thinking about something similar a long time ago. But in case of a large datatype, e.g. an AST, we certainly don't want to duplicate it in whole. It should be sufficient in the export section. So perhaps what is really needed is the ability to split a module into a public part and a private part, and allowing to duplicate certain definitions in a partial form, e.g. have both 'data Foo' and 'data Foo = Con1 | Con2' in the same module. The details are not obvious though because the syntax has not been designed with this in mind. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: [Haskell-cafe] Re: standard poll/select interface
Simon Marlow [EMAIL PROTECTED] writes: I think the reason we set O_NONBLOCK is so that we don't have to test with select() before reading, we can just call read(). If you don't use O_NONBLOCK, you need two system calls to read/write instead of one. This probably isn't a big deal, given that we're buffering anyway. I've heard that for Linux sockets select/poll/epoll might say that data is available where it in fact is not (it may be triggered by socket activity which doesn't result in new data). Select/poll/epoll are designed to work primarily with non-blocking I/O. In my implementation of my language pthreads are optionally used in the way very similar to your paper Extending the Haskell Foreign Function Interface with Concurrency. This means that I have a choice of using blocking or non-blocking I/O for a given descriptor, both work similarly, but blocking I/O takes up an OS thread. Each file has a blocking flag kept in its data. A non-blocking I/O is done in the same thread. The timer signal is kept active, so if another process has switched the file to blocking, it will be woken up by the timer signal and won't block the whole process. The thread performing the I/O will only waste its timeslices. A blocking I/O temporarily releases access to the the runtime, setting up a worker OS thread for other threads if needed etc. As an optimization, if there are no other threads to be run by the scheduler (no running threads, nor waiting for I/O, nor waiting for a timeout, and we are the thread which handles system signals), then runtime is not physically released (no worker OS threads, no unlinking of the thread structure), only the signal mask is changed so the visible semantics is maintained. This is common to other such potentially blocking system calls. I don't know if GHC does something similar. (I recently made it working even if a thread that my runtime has not seen before wants to access the runtime. If the optimization of not physically releasing the runtime was in place, the new thread performs the actions on behalf of the previous thread.) In either case EAGAIN causes the thread to block, asking the scheduler to wake it up when I/O is ready. This means that even if some other process has switched the file to non-blocking, the process will only do unnecessary context switches. It's important to make this working when the blocking flag is out of sync. The Unix blocking flag is not even associated with the descriptor but with an open file, i.e. it's shared with descriptors created by dup(), so it might be hard to predict without asking the OS. If pthreads are available, stdin, stdout and stderr are kept blocking, because they are often shared with other processes, and making them blocking works well. Without pthreads they are non-blocking, because I felt it was more important to not waste timeslices of the thread performing I/O than to be nice to other processes. In both cases pipes and sockets are non-blocking, while named files are blocking. The programmer can change the blocking state explicitly, but this is probably useful only when setting up redirections before exec*(). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: standard poll/select interface
Simon Marlow [EMAIL PROTECTED] writes: I agree that a generic select/poll interface would be nice. We must be aware that epoll (and I think kqueue too) registers event sources in advance, separately from waiting, which is its primary advantage over poll. The interface should use this model because it's easy to implement it in terms of select/poll without losing efficiency, but the converse would lose the benefit of epoll. (My runtime has a generic interface on the C level only, for hooking another implementation to be used by the scheduler.) -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Why is $ right associative instead of leftassociative?
Brian Hulley [EMAIL PROTECTED] writes: My final suggestion if anyone is interested is as follows: 1) Use : for types 2) Use , instead of ; in the block syntax so that all brace blocks can be replaced by layout if desired (including record blocks) 3) Use ; for list cons. ; is already used for forming lists in natural language, and has the added advantage that (on my keyboard at least) you don't even need to press the shift key! ;-) My language uses \ for cons, and ? for lambda. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library
Bulat Ziganshin [EMAIL PROTECTED] writes: i reported only the speed of the buffering transformers. this don't include speed of char encoding that should be very low at this time. Recoding will be slow if it's done on top of buffering and if encoding itself has heavy startup. Buffering should be on the very top, so it amortizes the cost of starting the recoder. It should be possible to use iconv for recoding. Iconv works on blocks and it should not be applied to one character at a time. Byte streams and character streams should be distinguished in types, preferably by class-constrained parametric polymorphism. In particular byte buffers and char buffers should be reperesented differently, so block copying between byte streams moves whole blocks of memory. I have designed and implemented these issues for my language Kogut, and now I'm trying to port them to Haskell. Static typing gets in the way in various places (hiding the type behind an existentially qualified type, passing optional named arguments), but it looks doable. The design uses buffers internally. A buffer is a queue of characters or bytes, with amortized O(1) cost of operating at an end, and fast block operations. A buffer itself is a stream too (reading eats its contents from the beginning, writing appends to the end). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library
Bulat Ziganshin [EMAIL PROTECTED] writes: MQK It should be possible to use iconv for recoding. Iconv works on MQK blocks and it should not be applied to one character at a time. recoding don't need any startup. Calling iconv (or other similar routine) does need startup. And you really don't want to reimplement all encoders/decoders by hand in Haskell. Processing a stateful encoding needs the time to pick up the state and convert the materialized state into a form used during recoding Dispatching to the encoding function (usually not known statically) takes time. When we generically convert an encoder which fails for invalid data, to an encoder which replaces invalid data with U+FFFD or question marks, setting up exception handlers takes time. These are all little times, but they can be avoided. Converting newlines takes time, and it's very similar to character recoding. It should be done transparently; network protocols often use CR-LF newlines, and it's painful to remember to output a '\r' before every newline by hand. It should be done on top of character recoding; consider UTF-16, where newline conversion works in terms of characters rather than bytes. Some conversions can be implemented with tight loops which keep data in machine registers. The tightness matters when there are many iterations; loop startup is amortized by buffering. Buffering can provide arbitrarily far lookahead, arbitrarily long putback, and checking for end of stream while logically not moving the current position. But this works only if buffering is the last stage which changes stream contents. MQK Byte streams and character streams should be distinguished in types, MQK preferably by class-constrained parametric polymorphism. In particular so that vGetBuf, vGetChar, and getWord32 can't be used at the same stream? You can get bytes from a given byte stream, and get bytes from a character stream put on top of that byte stream. Buf if the protocol mixes bytes with characters and is specified in terms of bytes, it's probably better to work in terms of bytes, and convert byte strings to character strings after determining where they end. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library
Bulat Ziganshin [EMAIL PROTECTED] writes: recoding don't need any startup. each vGetChar or vPutChar just executes one or more vGetByte/vPutByte calls, according to encoding rules. this should be fast enough Hmm, your interface for the encoder (String - [Word8]) doesn't support stateful encodings at all, like ISO-2022-JP. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: UTF-8 decoding
John Meacham [EMAIL PROTECTED] writes: Another possibility is quasi-utf8 encoding. where it passes through any invalid utf8 sequences as latin1 characters. in practice, this works very well as interpreting both latin1 and utf8 transparently but is more than somewhat hacky. It would not be reliable. I'm strongly against that: it gives an illusion that Latin1 works, but it breaks in very rare cases. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: FilePath as ADT
Axel Simon [EMAIL PROTECTED] writes: The solution of representing a file name abstractly is also used by the Java libraries. I think it is not. Besides using Java UTF-16 strings for filenames, there is the File class, but it also uses Java strings. The documentation of listFiles() says that each resulting File is made using the File(File, String) constructor. The GNU Java implementation uses a single Java string inside it. On Windows the OS uses UTF-16 strings natively rather than byte sequences. UTF-16 and Unicode is almost interconvertible (modulo illegal sequences of surrogates), while converting between UTF-16 and byte sequences is messy. This means that unconditionally using Word8 as the representation of filenames would be bad. I don't know a good solution. * * * Encouraged by Mono, for my language Kogut I adopted a hack that Unicode people hate: the possibility to use a modified UTF-8 variant where byte sequences which are illegal in UTF-8 are decoded into U+ followed by another character. This encoding is used as the default encoding instead of the true UTF-8 if the locale says that UTF-8 should be used and a particular environment variable is set (KO_UTF8_ESCAPED_BYTES=1). The encoding has the following properties: - Any byte sequence is decodable to a character sequence, which encodes back to the original byte sequence. - Different character sequences encode to different byte sequences (the U+ escape is valid only when it would be necessary). - It coincides with UTF-8 for valid UTF-8 byte sequences not containing 0x00, and character sequences not containing U+. It's a hack, and doesn't address other encodings than UTF-8, but it was good enough for me; it allows to maintain the illusion that OS strings are character strings. Alternatives were: * Use byte strings and character strings in different places, sometimes using a different type depending on the OS (Windows filenames would be character strings). Disadvantages: It's hard to write a filename to a text file. The API is more complex. The programmer must too often care about the kind of a string. * Fail when encountering byte strings which can't be decoded. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: give equal rights to types and classes! :)
Bulat Ziganshin [EMAIL PROTECTED] writes: if my idea was incorporated in Haskell, this change don't require even changing signatures of most functions working with arrays - just Array type become Array interface, what a much difference? What would 'Eq - Eq - Ord - Bool' mean? '(Eq a, Eq b, Ord c) = a - b - c - Bool'? '(Eq a, Ord b) = a - a - b - Bool'? '(Eq a, Ord a) = a - a - a - Bool'? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: Test performance impact
Simon Marlow [EMAIL PROTECTED] writes: Summary: 2 programs failed to compile due to type errors (anna, gg). One program did 19% more allocation, a few other programs increased allocation very slightly (2%). I wonder how many programs would fail to compile if local identifier bindings without type signatures would be treated as pattern bindings (monomorphic, no matter whether overloaded or not) and global ones as polymorphic. This is the solution I would make. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime
Re: [Haskell] IO == ST RealWorld
Axel Simon [EMAIL PROTECTED] writes: One function that particularly annoyed me is in Control.Exception handle :: (Exception - IO a) - IO a - IO a should be handle :: MonadIO m = (Exception - m a) - m a - m a I think it would be unimplementable. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Strictness question
Ben Lippmeier [EMAIL PROTECTED] writes: To gloss over details: it'll reduce x far enough so it knows that it's an Integer, but it won't nessesarally compute that integers value. No, Integers don't contain any lazy components. It statically knows that it's an integer. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Space questions about intern and sets
Gracjan Polak [EMAIL PROTECTED] writes: intern :: Ord a = a - a intern x = unsafePerformIO $ internIO x iorefset :: Ord a = IORef(Map.Map a a) iorefset = unsafePerformIO $ do newIORef $ Map.empty It will not work because you can't put values of different types as keys of the same dictionary, as you can't compare them. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Are new sequences really O(1)?
Simon Marlow [EMAIL PROTECTED] writes: Nice trick. Unfortunately the same assumptions don't hold for GHC's garbage collector - objects are aged in the youngest generation, so it is usually at least two GCs before an object is promoted. We could still do the same trick, but instead we'd have to find maximum stack depth at which all objects below are in an older generation. I don't know how to do it under these assumptions, i.e. how to find that depth. Incedentally, how do you mark the stack depth? Overwrite a return address with a special one, and keep the real one around in a known place? This is the first thing I tried :-) and even implemented it before I realized that it doesn't work under my assumptions. The problem is that I pass the return address in a virtual register, and it's saved on the stack by the callee only if it performs some non-tail calls. The return address is saved at the end of the stack frame. This means that a tail call followed by a non-tail call might read the special return address from the stack, but without jumping through it immediately. This return address is put again on the stack by a different function, with a possibly different stack frame size, so it lands in a different position on the stack, and thus it can't be found when GC wants to restore it. While changing the stack frame layout could perhaps make this workable, I found a much simpler solution. Since forever each stack frame contained a pointer used only by the GC and by stack trace printing. It points to a static structure containing the frame size and return address - source location mapping. This pointer is now marked in the lowest bit by the GC. Pushing a fresh stack frame always puts an even pointer. BTW, a few days earlier I fixed a similar behavior for extensible arrays. Most mutable objects use a write barrier which stores changed locations, but arrays store references to changed objects because their payload is malloced and may be reallocated without a GC. This meant that code which repeatedly appends to a given array has O(N^2) complexity for huge arrays, as each GC scans the whole array. So I now store the size of the initial part of the array which is known to be unchanged since last GC. This is updated manually by particular operations. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Are new sequences really O(1)?
[moved from libraries to glasgow-haskell-users] Ross Paterson [EMAIL PROTECTED] writes: GCs that happen during this process will be more expensive, as they have to scan the stack. I suspect that GC costs are swamping everything else for large n. I just tweaked the implementation GC in my compiler of my language, so that minor collection doesn't scan the whole stack, but only the part up to the deepest point where the stack pointer has been since the previous collection. Deeper regions contain only old pointers so they don't need to be scanned (I have only two generations). A program which builds a list of length 270,000 non-tail-recursively, which in a strict language leads to proportional usage of the stack, and performs on average 4kB of temporary allocations for each element, so there are 9,000 GCs in total with the young heap of 128 kB, runs 10 times faster after the change. GC takes 5% of the time instead of 88%. The implementation relies on the fact that only the topmost stack frame can be mutated, so it's enough to look only one frame deeper than the stack pointer reached. Each frame contained a pointer which is used only for GC and for stack trace printing, and thus it can be marked in the lowest bit without impacting normal operations. I prefer to make non-tail recursion efficient than to have to rewrite algorithms to use a large heap instead of the stack. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell] Beyond ASCII only editors for Haskell
Benjamin Franksen [EMAIL PROTECTED] writes: Please forgive me for taking this as an opportunity to rant about the single misfeature of Haskell's layout syntax, which is how if/then/else must be layed out. For me it's worse that I can't write like this: foo x = do y - foo x let z = some long line which must be split into two return (y, z) -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Re: Existing Haskell IPv6 Code
Peter Simons [EMAIL PROTECTED] writes: [URIs might be the answer] But what URI should represent e.g. unix datagram sockets? I don't think it's worth even trying to hide both stream- and packet-oriented services behind the same API. These are completely different things, treated them differently is fine, IMHO. But they don't differ in addressing. In BSD sockets the difference between streams and packets lies in socket type, while addresses are split into address families which bijectively correspond to protocol families. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Re: Existing Haskell IPv6 Code
Tony Finch [EMAIL PROTECTED] writes: But they don't differ in addressing. In BSD sockets the difference between streams and packets lies in socket type, while addresses are split into address families which bijectively correspond to protocol families. I believe there are some obscure protocol families that have more than one address family, which is why the two concepts exist in the API. Single Unix Specification v3 specifies only AF_* constants, used for both. Linux man socket says: NOTE The manifest constants used under BSD 4.* for protocol families are PF_UNIX, PF_INET, etc., while AF_UNIX etc. are used for address fami- lies. However, already the BSD man page promises: The protocol family generally is the same as the address family, and subsequent standards use AF_* everywhere. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Specify array or list size?
Antti-Juhani Kaijanaho [EMAIL PROTECTED] writes: As far as I know, the last programming language that included arrays' sizes in their types was Standard Pascal, There have been many such languages since Standard Pascal. For example C, C++, C#, Java, Ada, VHDL, and NU-Prolog. C, C++ and Java do not belong to that list. I can't speak about the others, not being very familiar with them. Java and C# don't belong to that list, but C and C++ do. I don't know about others. In C and C++, the declaration int n[50]; introduces an array variable with size 50 having the type array of int. The size is *not* part of the type. No, it introduces a variable of type array of 50 ints, which can be converted to pointer to int. It matters when you make a pointer of such arrays, an array of such arrays, or sizeof such array. In C++ the size can be matched by template parameter, and you can have separate overloadings for separate array sizes. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Specify array or list size?
Sebastian Sylvan [EMAIL PROTECTED] writes: A list is, for me, more of a logical entity (as opposed to structural). It's a sequence of stuff not a particular way to store it (singly-linked, doubly-linked, arraylists etc.). I call it sequence. A list is usually a concrete type in a given language. Exceptions: Python calls its arrays lists, and Perl calls its sequences lists. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Compiling with NHC98
Daniel Carrera [EMAIL PROTECTED] writes: $ nhc98 prng.hs -o prng I/O error (user-defined), call to function `userError': In file ./RC4.hi: 1:1-1:6 Found _module_ but expected a interface GHC and NHC confuse each other with prng.hi files they produce and examine, in incompatible formats. You can delete them; they are needed to compile other modules which use that module. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Specify array or list size?
Thomas Davie [EMAIL PROTECTED] writes: I'm not familiar with your C++ example (not being familiar with C++), but I think that it's a bit of a stretch of the imagination to say that C introduces a variable of type array of 50 ints, the fact that this is now an array of 50 integers is never checked at any point in the compilation or run, and I'm not sure it can be even if KR had wanted to. The size is taken into account when such array type is an element of another array, and by sizeof. int (*p)[50]; /* p may legally point only to arrays of 50 ints each */ ++p; /* p is assumed to point into an array, and is moved by one element, i.e. by 50 ints */ As an example: int bobsArray[5]; bobsArray[6] = 23; is not badly typed - it is merely a badly broken program. Because the array size is not taken into account by indexing. But it's a part of the type. These issues are independent, for example in C# both are the opposite. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Specify array or list size?
David Roundy [EMAIL PROTECTED] writes: No, int (*p)[50] is a multidimensional array, one of the most useless concepts in C, and is equivalent to int p[50][] (or is it p[][50]... I always get my matrix subscripts messed up). No, it's not equivalent to either. Array type are not the same as pointer type. (Except that in function parameters an array type really means a pointer.) int p[50][] is an error because array element type must be a complete type, and an array without a dimension is an incomplete type. int p[][50] declares p as an array with unknown size, of arrays of size 50. Such declaration makes sense: 1. as a declaration of an array defined elsewhere (usually with extern); it must have a size specified in some place; 2. with an initializer in braces, whose size determines the array size; 3. as function parameter, where it means a pointer. int (*p)[50] declares p as a pointer to an array of size 50. Such declaration can be used anywhere without restrictions - this is a complete type. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Specify array or list size?
Hamilton Richards [EMAIL PROTECTED] writes: That's not the case in C, C++, Java, or Ada. In C and C++, for example, given two arrays int X[50]; int Y[100]; and a function declared as void P( int a[] ) then these calls P( X ) P( Y ) are both valid, because the sizes of X and Y are not part of their type. But here you don't pass the array but a pointer to its first element. You can even call P(x) where x is an int, not an array at all. Consider this: int X[50]; int Y[100]; void P(int (a)[50]) {} int main() { P(X); // valid P(Y); // invalid } In C and C++, there's not even any way for a function to discover the size of an array argument. templateint N int size(int (a)[N]) { return N; } -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell vs OCaml
Donn Cave [EMAIL PROTECTED] writes: I have been able to build ocaml everywhere I have wanted it, including the native code compiler. And it builds itself much faster than GHC. (I couldn't measure how much, because GHC didn't build at all, failing to find HsBaseConfig.h.in.) -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell vs OCaml
John Goerzen [EMAIL PROTECTED] writes: I'd say that there are probably no features OCaml has that Haskell lacks that are worth mentioning. Its type system has some interesting features: polymorphic variants, parametric modules, labeled and optional arguments, objects, variance annotations of type parameters used for explicit subtyping. It has more convenient exceptions: the exn type can be extended with new cases which look like variants of algebraic types. There is camlp4 for extending the syntax or changing it completely. OTOH Haskell provides type classes, better integrated arbitrary precision integer type, type variables with kinds other than *, polymorphic recursion, much better FFI, and with GHC extensions: universal and existential quantifiers in function types (OTOH OCaml recently got universal quantifiers in record fields), GADTs, implicit parameters, template Haskell. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Haskell vs OCaml
Michael Vanier [EMAIL PROTECTED] writes: I also learned ocaml before learning haskell, and the biggest single difference I found is that haskell is a lazy, purely functional language and ocaml is a strict, mostly functional language. Indeed. In contrast to this one, my differences were not inherent in the languages - I think most of them could be ported to the other language without conceptual difficulties and without changing its core properties (they don't depend on purity/impurity nor laziness/strictness). Another big difference between ocaml and haskell is that haskell has type classes and ocaml does not. OCaml people recognize this but said that it's too big piece of design, it would complicate OCaml type system too much. Especially as sometimes modules or objects can be used for the same purpose, it would increase the overlap of OCaml features. There is some experimental design of overloading, called generics there (someone has said that languages use the term generics for the kind of polymorphism we didn't have). I didn't like it, or perhaps I didn't understand it enough; it was less expressive than type classes, you couldn't extend a given function in several modules independently and then combine all extensions. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Eternal Compatibility In Theory
robert dockins [EMAIL PROTECTED] writes: Is there a way to reliably and automatically check if two versions of a haskell module are interface compatible? No, because it would have to check whether the semantics of functions is the same, even if they are written differently. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Instances of constrained datatypes
Arjun Guha [EMAIL PROTECTED] writes: data (Eq v) = EqList v = EqList [v] I'd like to make it an instance of Functor. However, fmap takes an arbitrary function of type a - b. I need an Eq constraint on a and b. Is there any way to do this without creating my own `EqFunctor' class with explicitly-kinded quantification: No. Why not to remove the Eq v constraint in the type? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] URLs in haskell module namespace
Here is what I designed and implemented for my language Kogut: There is a file format of compilation parameters (compiler options, source file encoding, directories to look for interface files for imported modules, directories to look for libraries, C libraries to link, directories to look for packages, packages used). Parameters are gathered from several places: - From the command line (in a slightly different format than a file). - From the file name derived from the source file, with a changed extension. - From common.kop in the directory of the source file. - From common.kop in parent directories of the source, up to the current directory. - From used packages. A package actually corresponds to such parameter file, nothing else. Packages are included in the global dependency order, with duplicates removed. - From the default file in the compiler installation. Some parameters are accumulated (e.g. directory lists or libraries; with duplicates removed; the order is important for static libraries) while others are overridden (e.g. the C compiler to use). So if a package is needed to bring one module used in one place, it can be specified near the file which needs it (and there is less chance that the dependency will stick forgotten when no longer needed). OTOH a package used all over the place will be given in the common.kop in the root of the directory tree. A package usually has two parameter files: one common.kop used during its compilation, and another named after the package which is used by its clients. Since compilation options can be put in parameter files corresponding to source files, it's not necessary to invent a way to specify them per-file in Makefiles. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Unicode Source / Keyboard Layout
Sven Moritz Hallberg [EMAIL PROTECTED] writes: 1. In addition to the backslash, accept mathematical * small lamda (U+1D6CC, U+1D706, U+1D740, U+1D77A, and U+1D7B4) for lambda abstractions. Leave greek small letter lamda as a regular letter, so the Greeks can write their native language. I think making a lone Greek lambda a keyword would be better. It would be available as a letter but not as a whole word. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] invalid character encoding
Wolfgang Thaller [EMAIL PROTECTED] writes: Also, IIRC, Java strings are supposed to be unicode, too - how do they deal with the problem? Java (Sun) -- Filenames are assumed to be in the locale encoding. a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD. b) Creating. Characters which cannot be converted are replaced by ?. Command line arguments and standard I/O are treated in the same way. Java (GNU) -- Filenames are assumed to be in Java-modified UTF-8. a) Interpreting. If a filename cannot be converted, a directory listing contains a null instead of a string object. b) Creating. All Java characters are representable in Java-modified UTF-8. Obviously not all potential filenames can be represented. Command line arguments are interpreted according to the locale. Bytes which cannot be converted are skipped. Standard I/O works in ISO-8859-1 by default. Obviously all input is accepted. On output characters above U+00FF are replaced by ?. C# (mono) - Filenames use the list of encodings from the MONO_EXTERNAL_ENCODINGS environment variable, with UTF-8 implicitly added at the end. These encodings are tried in order. a) Interpreting. If a filename cannot be converted, it's skipped in a directory listing. The documentation says that if a filename, a command line argument etc. looks like valid UTF-8, it is treated as such first, and MONO_EXTERNAL_ENCODINGS is consulted only in remaining cases. The reality seems to not match this (mono-1.0.5). b) Creating. If UTF-8 is used, U+ throws an exception (System.ArgumentException: Path contains invalid chars), paired surrogates are treated correctly, and an isolated surrogate causes an internal error: ** ERROR **: file strenc.c: line 161 (mono_unicode_to_external): assertion failed: (utf8!=NULL) aborting... Command line arguments are treated in the same way, except that if an argument cannot be converted, the program dies at start: [Invalid UTF-8] Cannot determine the text encoding for argument 1 (xxx\xb1\xe6\xea). Please add the correct encoding to MONO_EXTERNAL_ENCODINGS and try again. Console.WriteLine emits UTF-8. Paired surrogates are treated correctly, unpaired surrogates are converted to pseudo-UTF-8. Console.ReadLine interprets text as UTF-8. Bytes which cannot be converted are skipped. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Glynn Clements [EMAIL PROTECTED] writes: If you provide wrapper functions which take String arguments, either they should have an encoding argument or the encoding should be a mutable per-terminal setting. There is already a mutable setting. It's called locale. It isn't a per-terminal setting. A separate setting would force users to configure an encoding just for the purposes of Haskell programs, as if the configuration wasn't already too fragmented. It's unwise to propose a new standard when an existing standard works well enough. It is possible for curses to be used with a terminal which doesn't use the locale's encoding. No, it will break under the new wide character curses API, Or expose the fact that the WC API is broken, depending upon your POV. It's the only curses API which allows to write full-screen programs in UTF-8 mode. Also, it's quite common to use non-standard encodings with terminals (e.g. codepage 437, which has graphic characters beyond the ACS_* set which terminfo understands). curses don't support that. Sure it does. You pass the appropriate bytes to waddstr() etc and they get sent to the terminal as-is. It doesn't support that and it will switch the terminal mode to user encoding (which is usually ISO-8859-x) on a first occasion, e.g. after an ACS_* macro was used, or maybe even at initialization. curses support two families of encodings: the current locale encoding and ACS. The locale encoding may be UTF-8 (works only with wide character API). For compatibility the default locale is C, but new programs which are prepared for I18N should do setlocale(LC_CTYPE, ) and setlocale(LC_MESSAGES, ). In practice, you end up continuously calling setlocale(LC_CTYPE, ) and setlocale(LC_CTYPE, C), depending upon whether the text is meant to be human-readable (locale-dependent) or a machine-readable format (locale-independent, i.e. C locale). I wrote LC_TYPE, not LC_ALL. LC_TYPE doesn't affect %f formatting, it only affects the encoding of texts emitted by gettext (including strerror) and the meaning of isalpha, toupper etc. The LC_* environment variables are the parameters for the encoding. But they are only really parameters at the exec() level. This is usually the right place to specify it. It's rare that they are even set separately for the given program - usually they are per-system or per-user. Once the program starts, the locale settings become global mutable state. I would have thought that, more than anyone else, the readership of this list would understand what's bad about that concept. You can treat it as immutable. Just don't call setlocale with different arguments again. Another problem with having a single locale: if a program isn't working, and you need to communicate with its developers, you will often have to run the program in an English locale just so that you will get error messages which the developers understand. You don't need to change LC_CTYPE for that. Just set LC_MESSAGES. Then how would a Haskell program know what encoding to use for stdout messages? It doesn't necessarily need to. If you are using message catalogues, you just read bytes from the catalogue and write them to stdout. gettext uses the locale to choose the encoding. Messages are internally stored as UTF-8 but emitted in the locale encoding. You are using the semantics I'm advocating without knowing that... How would it know how to interpret filenames for graphical display? An option menu on the file selector is one option; heuristics are another. Heuristics won't distinguish various ISO-8859-x from each other. An option menu on the file selector is user-unfriendly because users don't want to configure it for each program separately. They want to set it in one place and expect it to work everywhere. Currently there are two such places: the locale, and G_FILENAME_ENCODING (or older G_BROKEN_FILENAMES) for glib. It's unwise to introduce yet another convention, and it would be a horrible idea to make it per-program. At least Gtk-1 would attempt to display the filename; you would get the odd question mark but at least you could select the file; Gtk+2 also attempts to display the filename. It can be opened even though the filename has inconvertible characters escaped. The current locale mechanism is just a way of avoiding the issues as much as possible when you can't get away with avoiding them altogether. It's a way to communicate the encoding of the terminal, filenames, strerror, gettext etc. Unicode has been described (accurately, IMHO) as Esperanto for computers. Both use the same approach to try to solve essentially the same problem. And both will be about as successful in the long run. Unicode has no viable competition. Esperanto had English. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___
Re: [Haskell-cafe] invalid character encoding
Glynn Clements [EMAIL PROTECTED] writes: The (non-wchar) curses API functions take byte strings (char*), so the Haskell bindings should take CString or [Word8] arguments. Programmers will not want to use such interface. When they want to display a string, it will be in Haskell String type. And it prevents having a single Haskell interface which uses either the narrow or wide version of curses interface, depending on what is available. If you provide wrapper functions which take String arguments, either they should have an encoding argument or the encoding should be a mutable per-terminal setting. There is already a mutable setting. It's called locale. I don't know enough about the wchar version of curses to comment on that. It uses wcsrtombs or eqiuvalents to display characters. And the reverse to interpret keystrokes. It is possible for curses to be used with a terminal which doesn't use the locale's encoding. No, it will break under the new wide character curses API, and it will confuse programs which use the old narrow character API. The user (or the administrator) is responsible for matching the locale encoding with the terminal encoding. Also, it's quite common to use non-standard encodings with terminals (e.g. codepage 437, which has graphic characters beyond the ACS_* set which terminfo understands). curses don't support that. The locale encoding is the right encoding to use for conversion of the result of strerror, gai_strerror, msg member of gzip compressor state etc. When an I/O error occurs and the error code is translated to a Haskell exception and then shown to the user, why would the application need to specify the encoding and how? Because the application may be using multiple locales/encodings. But strerror always returns messages in the locale encoding. Just like Gtk+2 always accepts texts in UTF-8. For compatibility the default locale is C, but new programs which are prepared for I18N should do setlocale(LC_CTYPE, ) and setlocale(LC_MESSAGES, ). There are places where the encoding is settable independently, or stored explicitly. For them Haskell should have withCString / peekCString / etc. with an explicit encoding. And there are places which use the locale encoding instead of having a separate switch. [The most common example is printf(%f). You need to use the C locale (decimal point) for machine-readable text but the user's locale (locale-specific decimal separator) for human-readable text. This is a different thing, and it is what IMHO C did wrong. This isn't directly related to encodings per se, but a good example of why parameters are preferable to state.] The LC_* environment variables are the parameters for the encoding. There is no other convention to pass the encoding to be used for textual output to stdout for example. C libraries which use the locale do so as a last resort. No, they do it by default. The only reason that the C locale mechanism isn't a major nuisance is that you can largely ignore it altogether. Then how would a Haskell program know what encoding to use for stdout messages? How would it know how to interpret filenames for graphical display? Do you want to invent a separate mechanism for communicating that, so that an administrator has to set up a dozen of environment variables and teach each program separately about the encoding it should assume by default? We had this mess 10 years ago, and parts of it are still alive until today - you must sometimes configure xterm or Emacs separately, but it's being more common that programs know to use the system-supplied setting and don't have to be configured separately. Code which requires real I18N can use other mechanisms, and code which doesn't require any I18N can just pass byte strings around and leave encoding issues to code which actually has enough context to handle them correctly. Haskell can't just pass byte strings around without turning the Unicode support into a joke (which it is now). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Glynn Clements [EMAIL PROTECTED] writes: E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the use of the locale's encoding for filenames (if you have filenames in multiple encodings, you lose; filenames using the wrong encoding simply don't appear in file selectors). Actually they do appear, even though you can't type their names from the keyboard. The name shown in the GUI used to be escaped in different ways by different programs or even different places in one program (question marks, %hex escapes \oct escapes), but recently they added some functions to glib to make the behavior uniform. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Duncan Coutts [EMAIL PROTECTED] writes: It doesn't affect functions added by the hierarchical libraries, i.e. those functions are safe only with the ASCII subset. (There is a vague plan to make Foreign.C.String conform to the FFI spec, which mandates locale-based encoding, and thus would change all those, but it's still up in the air.) Hmm. I'm not convinced that automatically converting to the current locale is the ideal behaviour (it'd certianly break all my programs!). Certainly a function for converting into the encoding of the current locale would be useful for may users but it's important to be able to know the encoding with certainty. It should only be the default, not the only option. It should be possible to specify the encoding explicitly. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Glynn Clements [EMAIL PROTECTED] writes: It should be possible to specify the encoding explicitly. Conversely, it shouldn't be possible to avoid specifying the encoding explicitly. What encoding should a binding to readline or curses use? Curses in C comes in two flavors: the traditional byte version and a wide character version. The second version is easy if we can assume that wchar_t is Unicode, but it's not always available and until recently in ncurses it was buggy. Let's assume we are using the byte version. How to encode strings? A terminal uses an ASCII-compatible encoding. Wide character version of curses convert characters to the locale encoding, and byte version passes bytes unchanged. This means that if a Haskell binding to the wide character version does the obvious thing and passes Unicode directly, then an equivalent behavior can be obtained from the byte version (only limited to 256-character encodings) by using the locale encoding. The locale encoding is the right encoding to use for conversion of the result of strerror, gai_strerror, msg member of gzip compressor state etc. When an I/O error occurs and the error code is translated to a Haskell exception and then shown to the user, why would the application need to specify the encoding and how? If application code doesn't want to use the locale's encoding, it shouldn't be shoe-horned into doing so because a library developer decided to duck the encoding issues by grabbing whatever encoding was readily to hand (i.e. the locale's encoding). If a C library is written with the assumption that texts are in the locale encoding, a Haskell binding to such library should respect that assumption. Only some libraries allow to work with different, explicitly specified encodings. Many libraries don't, especially if the texts are not the core of the library functionality but error messages. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
John Meacham [EMAIL PROTECTED] writes: In any case, we need tools to be able to conform to the common cases of ascii-only (withCAStrirg) and current locale (withCString). withUTF8String would be a nice addition, but is much less important to come standard as it can easily be written by end users, unlike locale specific versions which are necessarily system dependent. IMHO the encoding should be a parameter of an extended variant of withCString (and peekCString etc.). We need a framework for implementing encoders/decoders first. A problem with designing the framework is that it should support both pure Haskell conversions and C functions like iconv which work on arrays. We must also provide a way to signal errors. A bonus is a way to handle errors coming from another recoder without causing it to fail completely. That way one could add a fallback for unrepresentable characters, e.g. HTML entities or approximations with stripped accents. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Bound threads
Wolfgang Thaller [EMAIL PROTECTED] writes: Indeed, my brain is melting, but I did it :-) Congratulations. How about we found a Bound-thread-induced brain melt victims' support group? The melt was entertaining :-) Besides simplicity, one of the main reasons for moving our select() call from the run-time system to the libraries was to avoid the performance hit of having to call select() every time through the scheduler loop rather than only once per IO operation. I use epoll when available. It's Linux-specific and allows to register and unregister descriptors separately from waiting. This not only saves process time to set up the array, but also kernel time scanning the array and hooking to files. I've heard BSD kqueue mechanism has similar properties. I unregister descriptors from epoll lazily: when epoll returned that data is available but no thread was in fact waiting for it. This saves repeated registration when a thread alternates between I/O and computation. When the scheduler determines that it has no thread to wake up immediately, it performs a GC before going to wait if the program did roughly at least half of work until the next normal GC. Imagine having one or more (unbound) threads that spend most of their time waiting for IO, and a bunch of (also unbound) threads that do some computation. If select() is part of the scheduler loop, you will get a select() call whenever a thread-switch between the computation threads happens. Actually once the next thread in the running and I/O queue is an I/O thread, not in every scheduler iteration. Or more precisely a consecutive span of I/O threads in this queue. epoll_wait takes 0.2 us here, poll takes 1 us, select takes 0.6 us (1 descriptor in each case). I wonder why poll is slower than select. I was thinking about integration with gtk/glib event loop. There are two ways: either we ask glib to poll using a function supplied by us, or we perform polling using glib functions instead of raw epoll / poll / select. The first choice seems better because otherwise callbacks registered at glib were started from the scheduler and this will not work, it's even not clear on behalf of which thread they should run. In this case we must provide a function with an interface of poll(). Without additional support in the runtime (other than making file objects which don't close their underlying file, but this is easy), the function can be implemented by starting a thread for each descriptor, collecting the results, and cancelling threads when some descriptor is ready or when the timeout expires. Let's assume that real poll is used by our scheduler and that no other thread does I/O at the moment, and see what really happens: - the threads are created at the end of the run queue - other threads in the program execute their time slices - each of the newly created threads is marked as waiting for I/O - other threads in the program execute again (ugh) - the scheduler looks at the first I/O thread and makes poll() - all threads whose I/O is ready are woken up - the next running thread is chosen (perhaps one of threads woken up in the previous step) - it notifies the manager thread that glib-poll-emulation is ready - when execution reaches the manager, it kills other threads and reports the result to glib It seems that other than a bunch of context switches there is not much work besides the required minimum. (It gets worse wich epoll, which is suitable for a mostly unchanging set of watched descriptors.) With GHC implementation I think each thread which adds a descriptor will wake up the service thread through a pipe, and later they will wake it up again to unregister files when they become cancelled. All threads except the thread performing the fork become unbound. [...] What happens when fork is called from an unbound thread? Does it become bound in the child process? No. But in ForkProcess and ForkProcessKillThreads this thread plays the role of the main thread: it receives Unix signals; it receives internal asynchronous signals like heap overflow and deadlock; when it terminates, the process terminates; if it terminates with an unhandled exception, a handler which normally prints the stack trace is called. AtExit handlers are not run though. I don't know what should be done, this got quite hairy. Actually the above termination semantics currently applies only when fork is called with an I/O action as an argument. It can also be called without, like C fork(), and in this case the behavior is ugly: if the new main thread in the child process was not the main thread before, its termination is not special and there will be a deadlock. This should be changed somehow. In any case, when the previous main thread terminates (as it's cancelled cleanly by ForkProcess), it checks whether it's still the main thread. If not, it disappears like normal threads, except that its OS thread will wait on a condition variable forever. -- __(
Re: [Haskell-cafe] Re: Bound threads
Simon Marlow [EMAIL PROTECTED] writes: I've now implemented a threaded runtime in my language Kogut, based on the design of Haskell. The main thread is bound. The thread which holds the capability performs I/O multiplexing itself, without a separate service thread. We found that doing this was excessively complex (well, I thought so anyway). Indeed, my brain is melting, but I did it :-) I think our approaches are incomparable in terms of additional overhead, it depends on the program. I have added some optimizations: If a thread which wants to perform a safe C call sees that there are no other threads running, waiting for I/O, or waiting for timeout, and that we are the thread which handles Unix signals, then it doesn't notify or start another thread to enter the scheduler. When a C call returns, it doesn't have to wake up the scheduler in this case. Even if other threads are running, if there is currently no scheduler doing epoll/poll/select, then a returning C call doesn't wake up the scheduler. It only links itself to a list which will be examined by the scheduler. * * * There are interesting complications with fork. POSIX only provides fork which causes other pthreads in the child process to evaporate. This is exactly what is wanted if the fork is soon followed by exec, but can be disastrous if the program tries to use other threads in the meantime. Depending on the system pthread_join on a thread which has existed before the fork either says that it has returned, or hangs, or fails with ESRCH or EINVAL. And there is no way to fork while keeping other threads running (there has been some proposal for forkall but it has been rejected). This means that a fork in an unfortunate state, e.g. while some thread was holding a mutex, will left the mutex permanently locked; pthread_atfork is supposed to protect against that. It also means that if our language tries to continue running its threads after the fork, then there is no way to do this if they are bound to other OS threads. And the worker pool is useless, it should better be emptied before the fork to reduce resource leak. There is no semantics of fork wrt. threads which would be correct in all cases. Shortly before implementing bound threads I've designed and implemented a semantics for three variants of fork, which were easy when I have full control over what happens with my threads in the child process (well, the third was a challenge to implement): - ForkProcessCloneThreads - easiest to describe, but the least useful. Threads continue to run in both processes. - ForkProcessKillThreads - other threads are atomically killed in the child process, similarly to raw POSIX. This is used before exec. If the program attempts to wait for the threads, the behavior is defined: they look as if they failed with ThreadKilled exception, even though they were killed without a chance to recover (this is a different exception than the one used for cancellation which signifies that it could recover). - ForkProcess - the safest default: all threads are sent signals (in the sense of asynchronous communication in my language) which cause them to be suspended when they have signal handling unblocked (this roughly corresponds to Haskell's blocking of asynchronous exceptions, but e.g. a thread holding a mutex has signals blocked by default). This includes chasing newly created threads. When all threads are suspended, we do ForkProcessCloneThreads. Then in the parent process threads are resumed, and in the child they are cancelled in a polite way so they can release resources. Bound threads introduced problems. They can partially be solved, e.g. the worker pool, the wakeup pipe, epoll descriptor are correctly recreated. But there is simply no way to return from callbacks because the corresponding C contexts no longer exist. So I made them as follows: All threads except the thread performing the fork become unbound. They have a chance to handle the thread cancellation exception until they return from their innermost callbacks. At this time they become killed. If ForkProcessAllThreads is done while some threads were executing non-blocking foreign code, they become killed as well. Besides this, there are at fork handlers, similar to pthread_atfork but scoped over the forking action. * * * I measured the speed of some syscalls on my system, to see what is worth optimizing: - pthread_mutex_lock + unlock (NPTL) 0.1 us - pthread_sigmask 0.3 us - setitimer0.3 us - read + write through a pipe 2.5 us - gettimeofday 1.9 us A producer/consumer test in my language (which uses mutexes and condition variables) needs 1.4 us for one iteration if both threads are unbound. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing
Re: [Haskell-cafe] Re: Bound threads
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: Why is the main thread bound? I can answer myself: if the main thread is unbound, the end of the program can be reached in a different OS thread, which may be a problem if we want to return cleanly to the calling code. I've now implemented a threaded runtime in my language Kogut, based on the design of Haskell. The main thread is bound. The thread which holds the capability performs I/O multiplexing itself, without a separate service thread. Producer/consumer ping-pong is 15 times slower between threads running on different OS threads than on two unbound threads. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Bound threads
Benjamin Franksen [EMAIL PROTECTED] writes: Producer/consumer ping-pong is 15 times slower between threads running on different OS threads than on two unbound threads. Which OS? Linux/NPTL. A context switch which changes OS threads involves: setitimer pthread_sigmask pthread_mutex_lock pthread_cond_signal pthread_cond_wait (starting) and in the other thread: pthread_cond_wait (returning) pthread_mutex_unlock pthread_sigmask setitimer setitimer is necessary because I tested that it is installed per thread rather than per process, even though SUSv3 says it should be per process. pthread_sigmask makes the thread holding the capability handle signals. I've heard that the interaction of signals and threads is broken in pre-NPTL Linux threads, I will have to check how it behaves and what should be used in this case (perhaps having signals unblocked in all threads). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Bound threads
Simon Marlow [EMAIL PROTECTED] writes: Is it important which thread executes Haskell code (I bet no) and unsafe foreign calls (I don't know)? If not, couldn't the same OS thread execute code of both threads until a safe foreign call is made? Actually in a bound thread, *all* foreign calls must be made using the correct OS thread, not just the safe ones. So your scheme would involve a context switch at every foreign call, which would end up being rather expensive. Ok. As I understand this (and as I'm trying to implement a similar scheme for my language), when an unbound thread performs a safe C call, the current OS thread transforms from a worker to a bound thread, and another worker is spawn if needed for remaining Haskell threads. So I have another idea: Why is the main thread bound? Wouldn't it be sufficient that, in cases where it's important to have the main Haskell thread bound to the main OS thread, the programmer wraps the main computation in a function which calls C and then calls back Haskell? Such function, if executed before spawning other threads which perform safe C calls, would in effect bind the threads together. That way there would be no OS thread synchronization needed when the main Haskell thread synchronizes with unbound threads. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Bound threads
Wolfgang Thaller [EMAIL PROTECTED] writes: Since the main thread is bound, and unbound threads are never executed on an OS thread which has some Haskell thread bound, this would imply that when the main thread spawns a Haskell thread and they synchronize a lot with each other using MVars, the synchronization needs OS-thread synchronization - the threads will not execute on a the same OS thread. Correct. Is it important which thread executes Haskell code (I bet no) and unsafe foreign calls (I don't know)? If not, couldn't the same OS thread execute code of both threads until a safe foreign call is made? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Bound threads
I'm trying to understand the semantics and implementation of bound threads basing on the conc-ffi paper and others. Since the main thread is bound, and unbound threads are never executed on an OS thread which has some Haskell thread bound, this would imply that when the main thread spawns a Haskell thread and they synchronize a lot with each other using MVars, the synchronization needs OS-thread synchronization - the threads will not execute on a the same OS thread. If I understand this correctly, doesn't it impose a significant overhead compared to synchronizing two unbound threads? If not, what am I missing? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] What is MonadPlus good for?
Josef Svenningsson [EMAIL PROTECTED] writes: You claimed that monad transformers break the mzero-is-right-identity-for-bind law because they can be applied to IO. I say, it's not the monad transformers fault. They cannot possibly be expected to repair the law if they are given a faulty monad. I agree. They as well could be said to break the core monad laws. It's not their fault. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] The Nature of Char and String
Ketil Malde [EMAIL PROTECTED] writes: The Haskell functions accept or return Strings but interface to OS functions which (at least on Unix) deal with arrays of bytes (char*), and the encoding issues are essentially ignored. If you pass strings containing anything other than ISO-8859-1, you lose. I'm not sure it's as bad as all that. You lose the correct Unicode code points (i.e. chars will have the wrong values, and strings may be the wrong lenght), but I think you will be able to get the same bytes out as you read in. So in that sense, Char-based IO is somewhat encoding neutral. So one can have Unicode both in IO and internally, it's just that you don't get both at the same time :-) That's the problem. Perl is similar: it uses the same strings for byte arrays and for Unicode strings whose characters happen to be Latin1. The interpretation sometimes depends on the function / library used, and sometimes on other libraries loaded. When I made an interface between Perl and my language Kogut (which uses Unicode internally and converts texts exchanged with the OS, even though conversion may fail e.g. for files not encoded using the locale encoding - I don't have a better design yet), I had trouble with converting Perl strings which have no characters above 0xFF. If I treat them as Unicode, then a filename passed between the two languages is interpreted differently. If I treat them as the locale encoding, then it's inconsistent and passing strings in both directions doesn't round-trip. So I'm currently treating them as Unicode. Perl's handling of Unicode is inconsistent with itself (e.g. for filenames containing characters above 0xFF), I don't think I made it more broken than it already is... -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: UTF-8 BOM, really!?
Graham Klyne [EMAIL PROTECTED] writes: How can it make sense to have a BOM in UTF-8? UTF-8 is a sequence of octets (bytes); what ordering is there here that can sensibly be varied? The *name* BOM doesn't make sense when applied to UTF-8, but some software uses UTF-8 encoded U+FEFF it as a marker that the file is encoded in UTF-8 rather than some other encoding. And Unicode seems to support this usage, even if it doesn't recommend it. I know only of Microsoft Notepad, and suspect other Microsoft tools (Notepad assumes UTF-8 with the marker and the current Windows codepage without). The HTML at http://www.microsoft.com/ begins with a BOM, but other pages linked from there do not. I think XML used to be silent about this, but later got amended to explicitly say that optional U+FEFF at the beginning is allowed and not treated as a part of document contents. OTOH various other sofrware, in particular generic Unix tools, don't treat UTF-8 BOM specially, and de facto implement the non-standard UTF-8 without a BOM. Technically in UTF-16/32 the BOM is handled in the translation between encoding form (sequence of 16- or 32-bit code units) and encoding scheme (these words serialized into bytes). I think it's supposed to be the same in UTF-8, i.e. the analogous translation is *almost* trivial - it translates bytes to the same bytes - except that initial BOM must be stripped on decoding, and it must be added on encoding when the first character of the contents is U+FEFF (and optionally in other cases). I mean that it is supposed to happen on decoding UTF-8 on the level of bytes, not after decoding on the level of code points. Anyway, on Unix it just doesn't happen at all, except in software which explicitly handles it. iconv() doesn't handle UTF-8 BOM. If I could decide about it, I would ban UTF-8 BOM at all. But perhaps Unicode Consortium can be at least persuaded to recognize that some software doesn't accept BOM in UTF-8, and could be conforming to the variant of UTF-8 without the BOM rather than non-conforming at all. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: File path programme
Peter Simons [EMAIL PROTECTED] writes: http://cryp.to/pathspec/PathSpec.hs There also is a function which changes a path specification into its canonic form, meaning that all redundant segments are stripped. It's incorrect: canon (read x/y/.. :: RelPath Posix) gives x, yet on Unix they aren't equivalent when y is a non-local symlink or doesn't exist. Also, x/. is not equivalent to x: rmdir can be used with x but not with x/.. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: File path programme
Stefan Monnier [EMAIL PROTECTED] writes: The various UTF encodings do not have this particular problem; if a UTF string is valid, then it is a unique representation of a unicode string. However, decoding is still a partial function and can fail. And while it is partly true, it is qualified by the problems relative to canonicalization (an in Unicode can both be represented as or as two chars (an e and an accent) and they should (ideally) compare equal). In what sense equal? They are supposed to be equivalent as far as the semantics of the text is concerned, but representations are clearly different and most programs distinguish them. In particular they are different filenames on both Unix and Windows. AFAIK MacOS normalizes filenames, but using a slightly different algorithm than Unicode (perhaps just an older version). IMHO it makes no sense to pretend that they are exactly the same when strings consist of code points or lower level units (and I don't believe another choice for the default string type would be practical). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
Glynn Clements [EMAIL PROTECTED] writes: Then of course there's the issue that Win32 edge labels are Unicode, while Posix edge labels are [Word8]. Hmm. Strictly speaking, they're [CChar], but I doubt that anyone will ever implement Haskell on a platform where a byte isn't 8 bits wide. On POSIX it's the same. It's not the same when only the C standard is concerned. There existed platforms with 9 bit chars (Unisys 1100). I don't know whether they are still in use. I doubt that any Haskell compiler will run on such a system. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
David Roundy [EMAIL PROTECTED] writes: No, it's not Unix-specific, it's portable. If you want to write portable C code, you have to use the standard library, which means that file names are represented as Ptr CChar. I disagree. We are talking about portable Haskell, not portable C. The native Windows filename encoding (in WinNT) is UTF-16 (without validation that surrogates are correctly paired). It provides API which works with such filenames. It also provides a compatibility layer on top of that, which tries to translate filenames to some 8-bit encoding, but it should only be used when porting C programs which can't be adapted to use Windows filenames through their native type. Especially as Haskell already uses Unicode - it's pointless to convert it to an 8-bit encoding and back. I don't know what these 8-bit WinAPI functions do when they encounter filenames unrepresentable in the current 8-bit encoding. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
Glynn Clements [EMAIL PROTECTED] writes: And it isn't a theoretical issue. E.g. in an environment where EUC-JP is used, filenames may begin with ESC$)B (designate JISX0208 to G1), or they may not (because G1 is assumed to contain JISX0208 initally). I think such encodings are never used as default encodings of a Unix locale. The various UTF encodings do not have this particular problem; if a UTF string is valid, then it is a unique representation of a unicode string. BOM is a problem. Unfortunately Unicode mandates that FEFF at the start of a UTF-8 text stream is a mark which doesn't belong to the text. It provides variants of UTF-16/32 with and without a BOM, but UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful encoding. Unix ignores this, it doesn't use BOM in UTF-8 except individual applications for individual file formats. iconv() on Linux and in libiconv don't process a BOM in UTF-8 (although in libiconv this is because it's old, basing on and old RFC with 31-bit code points which didn't include a BOM). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: File path programme
Aaron Denney [EMAIL PROTECTED] writes: Better yet would be to have the standard never allow the BOM. If I could decide, I would ban the BOM in UTF-8 altogetger, but I'm afraid the Unicode Consortium doesn't want to do this. Miscosoft Notepad puts a BOM in UTF-8 encoded files. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: mathematical notation and functional programming
Stefan Monnier [EMAIL PROTECTED] writes: OTOH I like the abc shorthand because it's both obvious and unambiguous (as long as the return value of can't be passed as an argument to , which is typically the case when the return value is boolean and there's no ordering defined on booleans). It's unambiguous even if the return value of can be passed as an argument to . Operators are usually left-associative, right-associative or non-associative. A non-associative operator can have an additional semantics defined when it's used multiple times, just like a,b,c in OCaml is neither a,(b,c) nor (a,b),c, or even a*b*c as a type. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Looking for these libraries...
John Goerzen [EMAIL PROTECTED] writes: I'm looking for libraries / interfaces to these systems from Haskell: LDAP ncurses zlib (the one in darcs doesn't suit my needs) bz2lib I once wrapped ncurses (incomplete), zlib and bz2lib. http://sourceforge.net/projects/qforeign/ It's quite old and will not compile with recent compilers without modifications. FFI libraries were being designed at that time, it should be possible to replace QForeign with Foreign, modulo some changes since that time. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
Robert Dockins [EMAIL PROTECTED] writes: More than you would think, if you follow the conventions of modern unix shells. eg, foo/.. is always equal to ., For the OS it's not the same if foo is a non-local symlink. Shells tend to resolve symlinks themselves on cd, and cd .. means to remove the last component of the unexpanded current directory, which may be different from the directory listed by ls ... (rather than doing a chdir on the .. hardlink, which does strange things in the presence of symlinks). The operation is sufficently useful that I think it should be included. It lets us know, for example, that /bar/../foo/tmp and /foo/tmp refer to the same file, without resorting to any IO operations. I disagree. The point is they are *not* the same file. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
John Meacham [EMAIL PROTECTED] writes: too bad we can't do things like #if exists(module System.Path) import System.Path #else ... #endif I still find it perplexing that there isn't a decent standard haskell preprocessor For my language Kogut I designed a syntax ifDefined condition { something } otherCondition { something else } ... where a condition is an identifier whose definedness is tested, or module ModuleName for existence of a module, or _ which is always true, or their combination using conjunctions, alternatives and negations. The construct can be an expression (no condition is true = Null, which makes sense with dynamic typing), a definition (no condition is true = nothing is defined) or a pattern (no condition is true = a pattern which never matches). It does not subsume Common Lisp's #+ and #- nor vice versa. In Lisp it is done at read time, not compile time, which has some advantages, but is incompatible with temporal separation of compilation phases. Even though definitions may in general be mutually recursive, similarly as in Haskell, ifDefined depends only on definitions above it. This is hard to avoid, because its expansion may influence the set of definitions which will be present. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: what is inverse of mzero and return?
Ashley Yakeley [EMAIL PROTECTED] writes: But only some instances (such as []) satisfy this: (mplus a b) = c = mplus (a = c) (b = c) Other instances (IO, Maybe) satisfy this: mplus (return a) b = return a I think mplus should be separated into two functions. This would prevent using mplus in a single parser which - depending on the underlying monad used - backtracks or not. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: File path programme
Aaron Denney [EMAIL PROTECTED] writes: What about splitFileExt foo.bar.? (foo, bar.) or (foo.bar., )? The latter makes more sense to me, as an extension of the first case you give and splitting foo.tar.gz to (foo.tar, gz). It's not that obvious: both choices are compatible with these. The former is produced by rules: - split the filename before the last dot *which is not the last character of the filename*, or at the end if there is no such dot - remove the first character of the extension if it's non-empty (the character must have been a dot) The latter is produced by rules: - split the filename before the last dot, or at the end if there is no dot at all - *if the extension is a sole dot, append a dot to the basename* - remove the first character of the extension if it's non-empty (the character must have been a dot) Special filenames of . and .. are treated separately, before these rules apply. Both choices are inverted by the same joinFileExt, which inserts a dot between the name and extension unless the extension is empty. These rules agree on foo, foo. and foo.tar.gz, yet disagree on foo.bar.; I don't know which is more natural. The difference influences the behavior of changeFileExt. These cases are the same with both choices: changeFileExt foo.bar = foo changeFileExt foo.tar.gz = foo.tar changeFileExt foo. = foo. changeFileExt foo. baz = foo..baz but these differ - first choice: changeFileExt foo.bar. = foo changeFileExt foo.bar. baz = foo.baz or the second: changeFileExt foo.bar. = foo.bar. changeFileExt foo.bar. baz = foo.bar..baz ? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Can't do basic time operations with System.Time
Peter Simons [EMAIL PROTECTED] writes: I was wondering: Does this calculation account for leap years? Does it have to? C itself leaves unspecified the question whether its time calculations take leap seconds into account. All other systems I know of ignore leap seconds: POSIX C, Common Lisp, Java, .NET. The NTP protocol also ignores leap seconds, i.e. its time is supposed to slow down near the transition. While supporting leap seconds is more correct, it makes interoperability harder: Haskell would have a different view than the rest of the system about either the current calendar time or seconds since the Epoch. Another problem is that leap seconds are not known further in advance than about half of a year, so programs compiled using different versions of Haskell would have slightly different views about the current time. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] I/O interface
Ben Rudiak-Gould [EMAIL PROTECTED] writes: Yes, this is a problem. In my original proposal InputStream and OutputStream were types, but I enthusiastically embraced Simon M's idea of turning them into classes. As you say, it's not without its disadvantages. This is my greatest single complaint about Haskell: that it doesn't support embedding either OO-style abstract supertypes, or dynamnic typing with the ability to use polymorphic operations on objects that we don't know the exact type. The Dynamic type doesn't count for the latter because you must guess the concrete type before using the object. You can't say it should be something implementing class Foo, I don't care what, and I only want to use Foo's methods with it. Haskell provides only: - algebraic types (must specify all subtypes in one place), - classes (requires foralls which limits applicability: no heterogeneous lists, I guess no implicit parameters), - classes wrapped in existentials, or records of functions (these two approaches don't support controlled downcasting, i.e. if this is a regular file, do something, otherwise do something else). The problem manifests itself more when we add more kinds of streams: transparent compression/decompression, character recoding, newline conversion, buffering, userspace /dev/null, concatenation of several input streams, making a copy of data as it's passed, automatic flushing of a related output stream when an input stream is read, etc. A case similar to streams which would benefit from this is DB interface. Should it use separate types for separate backends? Awkward to write code which works with multiple backends. Should it use a record of functions? Then we must decide at the beginning the complete set of supported operations, and if one backend provides something that another doesn't, it's impossible to write code which requires the first backend and uses the capability (unless we decide at the beginning about all possible extensions and make stubs which throw exceptions in cases it's not supported). I would like to mix these two approaches: if some code uses only operations supported by all backends, then it's fully polymorphic, and when it starts using specific operations, it becomes limited. Without two completely different designs for these cases. I don't know how to fit it into Haskell's type system. This has led me to exploring dynamic typing. Again, to try to avoid confusion, what you call a seekable file the library calls a file, and what you call a file I would call a Posix filehandle. So the incompleteness problem can be rephrased: the interface doesn't provide the functionality of open() with returns an arbitrary POSIX filehandle. By the same token, stdin is never a file, but the data which appears through stdin may ultimately be coming from a file, and it's sometimes useful, in that case, to bypass stdin and access the file directly. The way to handle this is to have a separate stdinFile :: Maybe File. And a third stdin, as POSIX filehandle, to be used e.g. for I/O redirection for a process. As for openFile: in the context of a certain filesystem at a certain time, a certain pathname may refer to * Nothing * A directory * A file (in the library sense); this might include things like /dev/hda and /dev/kmem * Both ends of a (named) pipe * A data source and a data sink which are related in some qualitative way (for example, keyboard and screen, or stdin and stdout) * A data source only * A data sink only * ... How to provide an interface to this zoo? In such cases I tend to just expose the OS interface, without trying to be smart. This way I can be sure I don't make anything worse than it already is. Yes, it probably makes portability harder. Suitability of this approach depends on our goals: either we want to provide a nice and portable abstraction over the basic functionality of all systems, or we want to make everything implementable in C also implementable in Haskell, including a Unix shell. Perhaps Haskell is in the first group. Maybe its goal is to invent an ideal interface to the computer's world, even if this means doing things differently than everyone else. It's hard to predict beforehand how far in being different we can go without alienating users. For my language I'm trying to do the second thing. I currently concentrate on Unix because there are enough Windows-inspired interfaces in .NET, while only Perl and Python seem to care about providing a rich access to Unix API from a different language than C. I try to separate interfaces which should be portable from interfaces to Unix-specific things. Unfortunately I have never programmed for Windows and I can make mistakes about which things are common to various systems and which are not. Time will tell and will fix this. Obviously I'm not copying the Unix interface literally. A file is distinguished from an integer, and an integer is distinguished from a Unix signal,
Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions
Glynn Clements [EMAIL PROTECTED] writes: They're similar, but not identical. Traditionally, Unix non-blocking I/O (along with asynchronous I/O, select() and poll()) were designed for slow streams such as pipes, terminals, sockets etc. Regular files and block devices are assumed to return the data immediately. Indeed. Reading from a slow block device is also not interruptible by a signal; a signal usually causes reading from a pipe/socket/terminal to fail with EINTR. There is no non-blocking interface to various functions like readdir, mkdir, stat etc. OTOH close() is interruptible. It seems that the only way to parallelize them is to use a separate OS thread. gethostbyname, gethostbyaddr, getservbyname and getservbyport are mostly superseded by getaddrinfo and getnameinfo. They are all blocking and non-interruptible by signals (they restart their loops on receiving EINTR from low-level calls). Glibc provides getaddrinfo_a which is non-blocking (implemented using pthreads). Contrary to documentation it's not interruptible by a signal (its implementation expects pthread_cond_wait to fail with EINTR which is not possible) and it's not cancellable in a useful way (the interface allows for cancellation, which may nevertheless answer that it cannot be cancelled, and the glibc implementation is able to cancel a request only if it hasn't yet started being processed by the thread pool). There is no non-blocking counterpart of getnameinfo. Since asynchronous name resolution is quite important, implementation of my language uses pthreads and getaddrinfo / getnameinfo, if pthreads are available. For simplicity I just make one thread per request. A tricky API to parallelize is waitpid. Pthreads are supposed to be able to wait for child processes started by any thread, but according to man pages this was broken in Linux before version 2.4. Fortunately it's easy to avoid blocking other threads indefinitely without OS threads if we agree to waste CPU time (not CPU cycles), such that a thread waiting for a process takes as much time as if it was doing some useful work. Because waitpid *is* interruptible by signals. So it will either finish, or the timer signal will interrupt it and control can be passed to other threads. Leaving the timer signal interrupting syscalls can break libraries which don't expect EINTR. For example the Python runtime doesn't handle EINTR specially and it is translated to a Python exception. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Glynn Clements [EMAIL PROTECTED] writes: We do use a thread pool. But you still need as many OS threads as there are blocked read() calls, unless you have a single thread doing select() as I described. How does the select() help? AFAIK, select() on a regular file or block device will always indicate that it is readable, even if a subsequent read() would have to read the data from disk. It doesn't help if we don't want I/O requests to delay one another, and not only avoiding delay of execution of pure Haskell code. BTW, poll is generally preferred to select. The maximum fd supported by select may be lower than the maximum fd supported by the system. And the interface of poll allows the cost to be proportional to the number of descriptors rather than to the highest descriptor. The timeout is specified in microseconds for select and in milliseconds for poll, but on Linux the actual resolution is the clock tick in both cases anyway (usually 1ms or 10ms). It's probably yet better to use epoll than poll. The difference is that with epoll you register fds using separate calls, and you don't have to provide them each time you wait (and the kernel doesn't have to scan the array each time). So it scales better to a large number of threads which perform I/O. It's available in Linux 2.6. Caveat: before Linux 2.6.8 epoll had a memory leak in the kernel because of a reference counting bug (0.5kB per epoll_create call, which means 0.5kB of physical memory lost per starting a program which waits for I/O using epoll). poll is in Single Unix Spec, epoll is Linux-specific. poll and epoll both take the timeout in the same format, but they interpret it differently: poll sleeps at least the given time (unless a fd is ready or a signal arrives), while epoll rounds it up to a whole number of clock ticks and then sleeps between this time and one tick shorter. I was told that this is intentional because it allows to sleep until the next clock tick by specifying the timeout of 1ms (a timeout of 0ms means to not sleep at all). Accurate sleeping requires to measure the time by which poll/epoll can make the timeout longer (it's 1 tick for epoll and 2 ticks for poll), subtract this time from the timeout passed to them, add 1ms, and sleep the remaining time by busy waiting calling gettimeofday interspersed with poll/epoll with no timeout. gettimeofday() is accurate to microseconds, it asks some clock chip instead of relying on the timer interrupt only. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [Haskell] Re: Why is getArgs in the IO monad?
Keean Schupke [EMAIL PROTECTED] writes: Surely both requirements can be satisfied if the programs arguments are made parameters of main: main :: [String] - IO () From info '(libc)Error Messages', about program_invocation_name and program_invocation_short_name: *Portability Note:* These two variables are GNU extensions. If you want your program to work with non-GNU libraries, you must save the value of `argv[0]' in `main', and then strip off the directory names yourself. We added these extensions to make it possible to write self-contained error-reporting subroutines that require no explicit cooperation from `main'. This suggests that making global parameters available only as arguments of main would be a bad idea. But they should be settable, to account for the rare case of wanting to substitute something else to a library which reads them itself. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] I/O interface (was: Re: Hugs vs GHC)
John Meacham [EMAIL PROTECTED] writes: I was thinking of it as a better implementation of a stream interface (when available). I'm not convinced that the stream interface (http://www.haskell.org/~simonmar/io/System.IO.html) works at all, i.e. whether it's complete, implementable and convenient. Convenience. I'm worried that it uses separate types for various kinds of streams: files, pipes, arrays (private memory), and sockets. Haskell is statically typed and lacks subsumption. This means that even though streams are unified by using a class, code which uses a stream of an unknown kind must be either polymorphic or use existential quantification. Completeness. Unless File{Input,Output}Stream uses {read,write}() rather than file{Read,Write}, openFile provides only a subset of the functionality of open(): it works only with seekable files, e.g. not with /dev/tty. What is the type of stdin/stdout? They may be devices or pipes (not seekable), regular files (seekable), sockets... Note that even when they are regular files, emulating stream I/O in terms of either pread/pwrite or mmap does not yield the correct semantics of sharing the file pointer between processes. If we have a shell script which runs Haskell programs which write to stdout, it should be possible to redirect the output of the script as a whole. Exploiting the advantages of mapped files for stream I/O http://www.cs.toronto.edu/pub/reports/csrg/267/267.ps The advantage of reducing copying between buffers is lost in Haskell: file{Read,Write} use a buffer provided by the caller instead of giving a buffer for the caller to examine or fill. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: FFI and C99 complex
Aaron Denney [EMAIL PROTECTED] writes: So, I have heard claims that C 99 specifies that a variable v of type complex t is stored as if declared as t v[2], with v[0] the real part and v[1] the imaginary part. I don't have a copy of the spec at hand. Could someone who does verify this? This is true (in C99 draft; I don't think this has changed in the final standard). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions
Pete Chown [EMAIL PROTECTED] writes: of course, [mmap] can only be done on a limited type of file on some architectures, so it should be an optimization under the hood rather than an exposed interface. In particular, you have to be careful not to run out of address space on 32-bit architectures. If you try to do file access by mapping the whole file into virtual memory, you won't be able to handle files larger than 2G or so. You also have to be careful not to map a number of file chunks, which in total exceed the address space available. Which means that it *should* be explicit, rather than done under the hood. It's a tradeoff, the programmer should be aware when to apply it, the runtime should not impose a policy which would be hard to reverse. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Linear shuffle
Henning Thielemann [EMAIL PROTECTED] writes: I did some shuffling based on mergesort, that is a list is randomly split (unzipped) into two lists and the parts are concatenated afterwards. You must repeat this some times. It even works for infinite lists. I think it doesn't guarantee equal probabilities of all permutations. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: I/O interface
Ketil Malde [EMAIL PROTECTED] writes: It seemed to me, though, that streams are related to channels, I'm not sure what exactly do you mean by streams (because they are only being designed), but differences are: - A stream is either an input stream or an output stream, while a single channel supports reading from one end and writing to the other end. - A stream passes around bytes, which are usually grouped in blocks for efficiency. A channel is polymorphic wrt. the element type and elements are always processed one by one. - A stream may be backed by an OS file, pipe, socket etc., while a channel exists purely in Haskell. - A channel is never closed. Reading more data than have been put blocks until someone puts more data. A stream can reach its end, which is a condition a reader can detect. A stream backed by a pipe is similar to a channel of bytes in that the reader blocks until someone puts more data, but it can be closed too, which causes the reader to observe end of file. A writer to a stream can block too when the internal buffer in the kernel is full. - A stream can be inherited by child processes, and it generally continues to work by being linked to the same data sink or source as before. A channel is inherited as a whole: there is no communication between the two versions of the channel in the two processes. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: I/O interface
Ben Rudiak-Gould [EMAIL PROTECTED] writes: First of all, I don't think any OS shares file pointers between processes. Unix does. It's because shared files are usually stdin/stdout/stderr (I mean that they are visible as stdin/stdout/stderr, rather than about their nature as terminals - they may be regular files), which are usually accessed sequentially without seeking. Quite often they are not seekable at all (terminals or pipes), which means that they behave as if the file pointer was always positioned at the end (for writing) or beginning (for reading) and shared. If they are seekable, the position is shared so you can redirect I/O to a process running subprograms. Otherwise it would be practically impossible to safely use an inherited filehandle via any API. Pipes are not seekable and always behave as if the position is shared. It doesn't make them impossible to safely inherit. They are inherited on fork because they are anonymous objects, so it's the only way to connect them; after fork most programs close the reading end in one of the processes and the writing end in the other. It's rare that two processes read from the same pipe or write to the same pipe. If they do, one of them is usually a helper program started by the other, and the other waits for the helper to finish. If you want two processes to access the same file indepenently, pass a file name and open it twice. The file interface in this library is only used for files, which are always seekable (by definition). What do you mean by files? What you get from open() is not always seekable, because it's not always a regular file. The name may refer to a device (whether it's seekable depends on the particular device; block devices are seekable but most character devices are not, e.g. /dev/ttyS0, /dev/lp0 - serial and parallel ports) or to a named pipe (not seekable). -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: I/O interface
Ben Rudiak-Gould [EMAIL PROTECTED] writes: is there *any* way to get, without an exploitable race condition, two filehandles to the same file which don't share a file pointer? AFAIK it's not possible if the only thing you know is one of the descriptors. Of course independent open() calls which refer to the same file have separate file pointers (I mean the true filename, not /proc/*/fd/*). On Linux the current file position is stored in struct file in the kernel. struct file includes void *private_data whose internals depend on the nature of the file, in particular they can be reference counted. Among polymorphic operations on files in struct file_operations there is nothing which clones the struct file. This means that a device driver would have no means to specify how private_data of its files should be duplicated (e.g. by bumping the reference count). If my understanding is correct, it implies that the kernel has no way to clone an arbitrary struct file. Just don't use the current position of seekable files if you don't like it: use pread/pwrite. Is there any way to pass a filehandle as stdin to an untrusted/ uncooperative child process in such a way that the child can't interfere with your attempts to (say) append to the same file? You can set O_APPEND flag to force each write to happen at the end of file. It doesn't prevent the process from clearing the flag. If it's untrusted, how do you know that it won't truncate the file or just write garbage to it where you would have written something? If the file is seekable, you can use pread/pwrite. If it's not seekable, the concept of concurrent but non-interfering reads or writes is meaningless. I think we just need more kinds of streams. With regard to file-backed streams, there are three cases: 1. We open a file and use it in-process. 2. We open a file and share it with child processes. 3. We get a handle at process startup which happens to be a file. I disagree. IMHO the only distinction is whether we want to perform I/O at the current position (shared between processes) or explicitly specified position (possible only in case of seekable files). Neither can be emulated in terms of the other. In case 2 we could avoid OS problems by creating a pipe and managing our end in-process. It's not transparent: it translates only read and write, but not sendto/recvfrom, setsockopt, ioctl, lseek etc., and obviously it will stop working when our process finishes but the other does not. A pipe can be created when the program really wants this, but it should not be created autimatically whenever we redirect stdin/stdout/stderr of another program to a file we have opened. Case 3 is the most interesting. In an ideal world I would argue for treating stdin/out/err simply as streams, but that's not practical. Failing that, if we have pread and pwrite, we should provide two versions of stdin/out/err, one of type InputStream/OutputStream and the other of type Maybe File. We can safely layer other streams on top of these files (if they exist) without interfering with the stream operation. I'm not sure what do you mean. Haskell should not use pread/pwrite for functions like putStr, even if stdout is seekable. The current file position *should* be shared between processes by default, otherwise redirection of stdout to a file will break if the program delegates some work with corresponding output to other programs it runs. Indeed, file positions are exactly as evil as indices into shared memory arrays, which is to say not evil at all. But suppose each shared memory array came with a shared current index, and there was no way to create additional ones. Bad analogy: if you open() the file independently, the position is not shared. The position is not tied to a file with its shared contents but to the given *open* file structure. And there is pread/pwrite (on some OSes at least). It's not suitable as the basic API of all reads and writes though. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugs vs GHC (again) was: Re: Some randomnewbiequestions
Ben Rudiak-Gould [EMAIL PROTECTED] writes: http://www.haskell.org/~simonmar/io/System.IO.html fileRead :: File - FileOffset - Integer - Buffer - IO () This is unimplementable safely if the descriptor is read concurrently by different processes. The current position is shared. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Character predicates (was: Re: [Haskell-cafe] Hugs vs GHC (again))
Dimitry Golubovsky [EMAIL PROTECTED] writes: |Sebastien's| Marcin's | Hugs ---+---+--+-- alnum | L* N* | L* N*| L*, M*, N* 1 alpha | L*| L* | L* 1 cntrl | Cc| Cc Zl Zp | Cc digit | N*| Nd | '0'..'9' lower | Ll| Ll | Ll 1 punct | P*| P* | P* upper | Lu| Lt Lu| Lu Lt 1 blank | Z* \t\n\r | Z*(except| ' ' \t\n\r\f\v U+00A0 U+00A0 U+2007 U+202F) \t\n\v\f\r U+0085 1: for characters outside Latin1 range. For Latin1 characters (0 to 255), there is a lookup table defined as unsigned char charTable[NUM_LAT1_CHARS]; If the table coincides with Unicode character category, then it's just an implementation detail. I changed c ' ' || c = '\DEL' c = '\x9f' to Cc for Hugs because it's the same. So there might be a bunch of (perhaps autogenerated, from localedef files) modules for each locale/encoding, like ISO8859_1 or KOI_8. I disagree. Char is supposed to mean Unicode only, and data is converted to Unicode on boundaries with those parts of the world which use different encodings. With Unicode in mind it still makes sense to talk about digits as '0'..'9' only; most programming languages specify numeric literals as constisting of these digits only. Other contexts may require a wider set, including today's Arabic digits etc. This is not because of the encoding but because of the intended set of characters. One reason why the predicates are not obvious is that when the features encodable as text become more sophisticated, old algorithms for handling text become limited. For example if an identifier is specified as a letter followed by a sequence of letters or numbers, then combining marks are not allowed in identifiers, even though the corresponding precomposed characters are allowed. I guess this is why Hugs includes M* in isAlphaNum. This is a lie which allows old code to work better. These characters are not alphanumeric; it's the definition of identifiers which is no longer appropriate. (Unicode recommends a particular definition of identifiers in programming languages which want to permit with non-ASCII identifiers; it has various exceptions because it's intended to be somehow compatible with older versions of itself.) Another case when old interfaces are not sufficient is toUpper toLower. They should be defined on strings, not characters. Besides '' there are other characters which uppercase or lowercase to several code points: ligatures, precomposed characters which lack precomposed variants in the other case but can be decomposed, Greek iota below which is specified to uppercase to a separate iota after the letter (some people believe this is wrong but it's how it's currently specified in Unicode) and some cases with accents over I and i. Case mapping is also context-dependent for sigma. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
I/O interface (was: Re: [Haskell-cafe] Re: Hugs vs GHC (again))
Ben Rudiak-Gould [EMAIL PROTECTED] writes: fileRead can be implemented in terms of OS primitives, Only if they already support reading from a fixed offset (like pread). I'm not sure if we can rely on something like this being always available, or whether it should be emulated using lseek which is safe only as long as we are the only process using the given open file. pread requires that the file is seekable, which means that it can't be used for all file handles: not for pipes, sockets, terminals nor various other devices. and it's easy enough to implement a thread-safe seek/read interface on top of it. Not if it must cooperate with other processes, and you *do* want to set a file position before running another program with redirected standard I/O. In this case it's not enough that you set a private Haskell variable holding its logical file position - you must perform the lseek syscall. Doing something differently than everybody else has a risk of limited interoperability, even if the new way is better, and thus must be carefully evaluated to check whether all lost functionality is unimportant enough to lose. BTW, on Unix sockets and files are the same, but probably not on Windows. I don't know details about WinAPI. I know that basic file I/O uses HANDLEs, winsock uses ints which emulate Unix descriptors; what I don't know is whether you can also use HANDLEs for sockets (perhaps each winsock fd has an associated HANDLE with easy translation in both directions? or is there another API for sockets on HANDLEs?) and how do you perform redirection of standard I/O, in terms of HANDLEs or what - in particular I don't know whether you can redirect standard I/O to a socket. How should Haskell view this? I mean that on Unix it should somehow make files and sockets interchangeable, in order to support I/O redirection for programs being run; but it's not easy if you use completely different interfaces for files and sockets, as the streams proposal seems to do. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions
Aaron Denney [EMAIL PROTECTED] writes: Does open(/dev/fd/n) or (/proc/self/fd/n) act as dup() or a fresh open() to underlying file?) As a dup(), with a side effect of resetting the file pointer to the beginning. It would not help anyway: if it's a terminal or pipe, it *has* to act as a dup() (in this case the file pointer is obviously not reset), it's not seekable, and thus pread/pwrite fail. wc -c tries to measure the size in bytes by seeking, and proceeds to actually read the contents only if the file is not seekable (or if it's also told to count words or lines, obviously). This applies to stdin too. In this case it must remember to subtract the starting position from the size (and to bump the result to 0 if it would be negative), in order to give consistent results whether it manages to skip reading the contents or not. I actually don't see the problem with interacting with other processes that we've forked. File positions are not evil. They allow to treat files and devices in a uniform way. If you run a program which - writes some opening words to stdout - forks a subprocess, which executes another program, which writes something to stdout - the parent waits for the subprocess to finish - writes closing words to stdout then all output will appear in order: the opening words, then the part from the other program, and finally the closing words. This applies *also* when stdout has been redirected to a file. In this case it's essential that the file position is shared between all processes! So even if stdout is a regular file, output should be done using write(), not pwrite(). If processes don't synchronize their writes and write at the same time, then output will be intermixed. They can choose to synchronize themselves; in particular one process can avoid writing anything while the other process is running. So it's not true that allowing them to share a file position will necessarily be unsafe and will risk mangled data. Unix doesn't guarantee that programs will behave well, but it allows well-behaving programs to cooperate. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions
Jrmy Bobbio [EMAIL PROTECTED] writes: Once this is agreed, it would be easy to make scripts which generate C code from UnicodeData.txt tables from Unicode. I think table-driven predicates and toUpper/toLower should better be implemented in C; Haskell is not good at static constant tables with numbers. Sebastien Carlier already wrote this for hOp, see : http://etudiants.insia.org/~jbobbio/hOp/Gen_wctype.hs And I've done a similar thing for my language Kogut some time ago: http://cvs.sourceforge.net/viewcvs.py/kokogut/kokogut/runtime/make-char-tables.in?view=markup http://cvs.sourceforge.net/viewcvs.py/kokogut/kokogut/lib/Core/Kokogut/Characters.ko?view=markup Let's see how these separately developed interpretations of predicates differ (mine also have different names and there are a few more): |Sebastien's| mine ---+---+-- alnum | L* N* | L* N* alpha | L*| L* cntrl | Cc| Cc Zl Zp digit | N*| Nd lower | Ll| Ll punct | P*| P* upper | Lu| Lt Lu blank | Z* \t\n\r | Z*(except U+00A0 U+2007 U+202F) \t\n\v\f\r U+0085 Note that the interpretation of digit differs from both C and Haskell 98 which specify it to be ASCII-only. Actually I have ASCII-only variants of IsDigit parametrized by the number base. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions
Simon Marlow [EMAIL PROTECTED] writes: - Do the character class functions (isUpper, isAlpha etc.) work correctly on the full range of Unicode characters? This is true in Hugs. It's true with GHC on some systems (basically we were lazy and used the underlying C library's support here, which is patchy). It's not obvious what the predicates should really mean, e.g. should isDigit and isHexDigit include non-ASCII digits or should isSpace include non-breaking space characters. Haskell 98 report gives some guidelines which don't necessarily coincide with the C practice nor with expectations from Unicode people. Once this is agreed, it would be easy to make scripts which generate C code from UnicodeData.txt tables from Unicode. I think table-driven predicates and toUpper/toLower should better be implemented in C; Haskell is not good at static constant tables with numbers. Another issue is that the set of predicates provided by Haskell 98 library report is not enough to implement e.g. a Haskell 98 lexer, which needs any Unicode symbol or punctuation. Case mapping would be better done string - string rather than character - character; this breaks a long established Haskell interface. Case mapping is locale-sensitive (in very minor ways). Haskell doesn't provide algorithms like normalization or collation. In general the Haskell 98 interface is not enough for complex Unicode processing. - Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. The problem with designing an API of recoders is that depending on whether the recoder is implemented in Haskell or interfaced from C, it needs different data representation. Pure Haskell recoders prefer lazy lists of characters or bytes (except that a desire to detect source errors or characters unavailable in the target encoding breaks this), while high performance C prefers pointers to buffers with chunks of text. Transparent recoding makes some behavior hard to express. Imagine parsing HTTP headers followed by \r\n\r\n and a binary file. If you read headers line by line and decoding is performed in blocks, then once you determine where the headers end it's too late to find the start of the binary file: a part of it has already been decoded into text. You have to determine the end of the headers while working with bytes, not characters, and only convert the first part. Not performing the recoding in blocks is tricky if the decoder is implemented in C. Giving 1-byte buffers for lots of iconv() calls is not nice. Or imagine parsing a HTML file with the encoding specified inside it in a meta element. Switching the encoding in the middle is incompatible with buffering. Maybe the best option is to parse the beginning in ISO-8859-1 just to determine the encoding, and then reparse everything again once the encoding is known. If characters are recoded automatically on I/O, one is tempted to extend the framework for other conversions like compression, line ending convention, HTML character escaping etc. - What about Unicode FilePaths? This was discussed a few months ago on the haskell(-cafe) list, no support yet in any compiler. Nobody knows what the semantics should be. I've once written elsewhere a short report about handling filename encodings in various languages and environments which use Unicode as their string representation. Here it is (I've been later corrected that Unicode non-characters are valid in UTF-x): I describe here languages which exclusively use Unicode strings. Some languages have both byte strings and Unicode strings (e.g. Python) and then byte strings are generally used for strings exchanged with the OS, the programmer is responsible for the conversion if he wishes to use Unicode. I consider situations when the encoding is implicit. For I/O of file contents it's always possible to set the encoding explicitly somehow. Corrections are welcome. This is mostly based on experimentation. Java (Sun) -- Strings are UTF-16. Filenames are assumed to be in the locale encoding. a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD. b) Creating. Characters which cannot be converted are replaced by ?. Command line arguments and standard I/O are treated in the same way. Java (GNU) -- Strings are UTF-16. Filenames are assumed to be in Java-modified UTF-8. a) Interpreting. If a filename cannot be converted, a directory listing contains a null instead of a string object. b) Creating. All Java characters are representable in Java-modified UTF-8. Obviously not all potential filenames can be represented. Command line arguments are interpreted according to the locale. Bytes which cannot be converted are skipped. Standard I/O works in ISO-8859-1 by default. Obviously all input is accepted. On output characters above U+00FF are replaced by ?. C# (mono) - Strings are UTF-16. Filenames use
Re: [Haskell-cafe] Some random newbie questions
Jorge Adriano Aires [EMAIL PROTECTED] writes: Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy? Hi, One simple example would be, reverse = foldl (flip (:)) [] No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. A function which requires lazy foldl for correctness would have to be sometimes lazy in its first argument, and at the same time some partial results would have to be undefined. By function I mean the first argument of foldl, treated as a binary function. Here this doesn't apply because flip (:) x y is always defined. And another common case for foldl, sum, doesn't apply because (+) is usually strict on both arguments (although in principle it does not have to be true because of overloading, which implies that a compiler can only optimize particular specializations of sum, not generic sum). I don't know of any real-life example. Perhaps there are cases where evaluating partial results is correct but inefficient. I don't know such case either. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
Jorge Adriano Aires [EMAIL PROTECTED] writes: No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. Is this also true if your just going to use the first few elements after reversing it? Yes. A strict fold would evaluate cons cells of the result while they are constructed, not list elements. They are all defined (flip (:) x y is always defined), so a strict foldl is correct. Making a cons cell should be not more expensive than making a thunk which will make a cons cell when evaluated. Well, unless the implementation doesn't inline flip and thus making these thunks is actually faster than running them. In this case a lazy foldl is more efficient than a strict foldl, as long as a sufficiently small part of the result is used; it's always less efficient if the whole result is examined. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Problem with fundeps.
[EMAIL PROTECTED] writes: This is what I want. For a given set of vectors, the associated scalars are unique, otherwise I would have problems with norm. In the instance Vspace a a the compiler doesn't know that a is supposed to be a scalar only. It matches vector types (functions) too. And adding a context won't help. An instance of the form instance Ctx a = Cls (T a) means T a can always be used as an instance of Cls, and such usage will yield a further requirement of Ctx a which must be fulfilled rather than T a can be used as an instance of Cls as long as Ctx a holds. In particular it will overlap with any other instance whose head can be unified with Cls (T a). Instance overlapping doesn't take instance contexts into account, only instance heads. The problem can be solved by enumerating concrete scalar types instead of using a generic instance Vspace a a. I'm afraid Haskell classes are not expressive enough for a generic instance in this case. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] safe code to run in finalizers: ACIO revisited
Robert Dockins [EMAIL PROTECTED] writes: So, to be safe, the action of a finalizer must commute with every other finalizer (they must be central). What does should mean? There are useful finalizers which don't have this property. E.g. a finalizer can remove an entry from a weak dictionary, call a C function which will free some foreign object, or send a message over a network that a particular object is no longer needed. Even if they have these properties in an abstract sense, they are not true technically. So a separate ACIO monad for affine central IO actions would be the appropriate context for finalizers. It would not be enough without unsafeIOtoACIO. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] The difference between ($) and application
Andres Loeh [EMAIL PROTECTED] writes: The function ($) is the identity function, restricted to functions. Almost. With the standard definition of f $ x = f x it happens that ($) undefined `seq` () = () id undefined `seq` () = undefined -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Force evaluation
Ben Rudiak-Gould [EMAIL PROTECTED] writes: And here's a slight variation in which force has the type a - a, eliminating the need for the helper function eval. I'm not sure which version is better. The version with () does less redundant forcing, although the compiler could perhaps optimize them statically. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Mutable data design question
GoldPython [EMAIL PROTECTED] writes: In the case of writing something like a text editor where the data involved is by its very nature mutable, what sort of design paradigm would you use in a functional language? This is specific to text editors: 1. Use a traditional mutable data structure, e.g. IOArray of IOUArrays (lines), each coupled with first filled and last filled indices and resized when needed. Implement undo by the list of changes. Easy conceptually. Has good performance in common cases and poor performance for unusual data (e.g. very long lines). Easy to make bugs in undo handling. 2. Use a persistent data structure with logarithmic cost of most operations: a balanced tree of text fragments, called a rope (Hans Boehm has made one for C). Undo can be made by simply keeping old versions. Hard to implement the core data structure. If done right, the rest is easy, in particular undo handling is very robust. There are some overheads for all operations, but the cost of operations scales to extreme cases. The second way is more interesting, but I don't know how to implement a rope in details. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Mutable data design question
Keean Schupke [EMAIL PROTECTED] writes: What happens with this method when the display needs refreshing, does the current state have to be recomputed every time ... No. The new state is constructed from bits of old state and the changed data. Applying a change on average requires logarithmic time memory wrt. the whole size. It does not invalidate the old state - state is immutable. http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: not possible with monad transformers ?
Jules Bean [EMAIL PROTECTED] writes: When writing a compiler, it makes sense to collect errors as by the writer monad, and not abort anything - producing dummy values instead (except perhaps some fatal errors when it's inconvenient). Or you could use the monad: data Perhaps a = Success a | Failure a [Error] This is just a special case of the writer monad, I think. It's the same as the writer monad, with redundant encoding of the case when there are no errors. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ACIO versus Execution Contexts
Keean Schupke [EMAIL PROTECTED] writes: I disagee - Allowing unique state is a mistake in my opinion. I want Haskell to be the operating system - how can I do this if I cannot create new process contexts. Why does it matter if you can't compile code for new programs at runtime, to become a part of the same process? -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe