Re: concurrency guarentees clarification

2006-04-25 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 * Foreign concurrent calls, handle IO, and all IO actions that directly
   interact with the world outside the current process all must be
   yield-points. (in addition to any yield-points implied by the progress
   guarentee)

If an IO call includes a long period of waiting, we don't only want
it to yield before or after it. We want it to allow other threads
to proceed during the whole wait.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: unsafePerformIO and cooperative concurrency

2006-04-24 Thread Marcin 'Qrczak' Kowalczyk
Ashley Yakeley [EMAIL PROTECTED] writes:

 Is there a ticket for this? I would prefer that unsafePerformIO and
 friends not be part of the standard.

I would prefer otherwise. Every implementation supports it, which
proves that it's useful. And it's no less unsafe than FFI.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: FFI, safe vs unsafe

2006-04-13 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 Checking thread local state for _every_ foregin call is definitly
 not an option either. (but for specificially annotated ones it is
 fine.)

BTW, does Haskell support foreign code calling Haskell in a thread
which the Haskell runtime has not seen before? Does it work in GHC?

If so, does it show the same ThreadId from that point until OS
thread's death (like in Kogut), or a new ThreadId for each callback
(like in Python)?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: FFI, safe vs unsafe

2006-04-12 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 I object to the idea that concurrent calls are 'safer'. getting it
 wrong either way is a bug. it should fail in the most obvious way
 rather than the way that can remain hidden for a long time.

I wouldn't consider it a bug of an implementation if it makes a call
behave like concurrent when it's specified as non-concurrent. If a
library wants to make it a critical section, it should use a mutex
(MVar).

Or there should be another kind of foreign call which requires
serialization of calls. But of which calls? it's rarely the case that
it needs to be serialized with other calls to the same function only,
and also rare that it must be serialized with everything else, so the
granularity of the mutex must be explicit. It's fine to code the mutex
explicitly if there is a kosher way to make it global.

Non-concurrent calls which really blocks other thread should be
treated only as an efficiency trick, as in implementations where the
runtime is non-reentrant and dispatches threads running Haskell code
internally, making such call without ensuring that other Haskell
threads have other OS threads to run them is faster.

OTOH in implementations which run Haskell threads truly in parallel,
the natural default is to let C code behave concurrently. Ensuring
that it is serialized would require extra work which is counter-productive.
For functions like sqrt() the programmer wants to say that there is no
need to make it concurrent, without also saying that it requires calls
to be serialized.

 Which is why I'd prefer some term involving 'blocking' because that
 is the issue. blocking calls are exactly those you need to make
 concurrent in order to ensure the progress guarentee.

What about getaddrinfo()? It doesn't synchronize with the rest of the
program, it will eventually complete no matter whether other threads
make progress, so making it concurrent is not necessary for correctness.
It should be made concurrent nevertheless because it might take a long
time. It does block; if it didn't block but needed the same time for
an internal computation which doesn't go back to Haskell, it would
still benefit from making the call concurrent.

It is true that concurrent calls often coincide with blocking. It's
simply the most common reason for a single non-calling-back function
to take a long time, and one which can often be predicted statically
(operations of extremely long integers might take a long time too,
but it would be hard to differentiate them from the most which don't).

The name 'concurrent' would be fine with me if the default is 'not
necessarily concurrent'. If concurrent calls are the default, the name
'nonconcurrent' is not so good, because it would seem to imply some
serialization which is not mandatory.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: Signals + minimal proposal

2006-04-11 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 How does cancelling a thread differ from sending it an exception?

It doesn't. By cancelling I mean just sending a particular async
exception.

 Can a thread be GC'd without being sent an exception first?

Yes, but I'm now changing this.

Unfortunately it doesn't seem possible to guarantee proper stack
unwinding in all cases:

POSIX threads evaporate after fork() in the child process. This means
that foreign code performing callbacks doesn't exist there, except in
the thread doing the fork; the C stacks are inaccessible. So in Kogut
in this case bound threads become unbound, and they only run up to the
end of the nearest callback from foreign code: then they are killed
immediately.

Another technical limitation: in a build where OS threads are not used,
callbacks returning in a non-LIFO order must wait for the callback
using the top of the OS stack to return. It's impossible to cause such
thread to continue immediately even if it gets an exception.

There are also limitations caused by principles I've adopted myself.
I have scoped mutex locking and scoped unlocking. In particular
waiting for a condition unlocks the mutex and always relocks it before
returning. When a thread is waiting to relock a mutex when exiting a
scope (rather than when entering a scope), it must absolutely lock it
before it can continue, in order to guarantee consistent state of the
mutex in regions of code.

So I'm going to send threads about to be GC'd a signal rather than an
exception; it will be handled only if the thread has signals unblocked.

There is another case similar to GC'ing a thread: when the runtime
discovers that there are no threads to be run, to wait for I/O, to
wait for a timeout, and the thread handling system signals doesn't
seem to be intentionally waiting for signals (it's not blocked on a
Kogut construct similar to POSIX sigwait), the runtime attempts to
wake up the thread handling system signals with a Deadlock signal,
so the program can react to a total deadlock. Of course a deadlock
of only a subset of threads won't be detected if the threads are
not GC'd.

When waiting for the rest of threads at program exit, it might happen
that some threads won't want to return after being cancelled, e.g.
they have signals blocked or they lock up during cleanup. Such case
would normally be a deadlock (the main thread is waiting until they
finish, and they are waiting for something else), but the above
mechanism causes the main thread to be woken up and continue even
though some threads have not finished.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: Signals + minimal proposal

2006-04-11 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 forkIO action = forkIO' action' where 
 action' = do
 myThreadId = onExit . throwTo PleaseExit
 action

This would be a memory leak: even after the thread finishes,
its onExit handler would remain registered.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: asynchronous exceptions

2006-04-07 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 BTW, I just realised a better way to express block.  If block is
 supposed to count nesting, then we have a problem that you can still
 unblock exceptions even within a block by using sufficient number of
 unblocks, so the right way is to give block this type:

   block :: ((IO b - IO b) - IO a) - IO a

Or perhaps:
block :: ((forall b. IO b - IO b) - IO a) - IO a

It doesn't fit classic mutexes and conditions with my semantics of
implicit blocking, because condition wait should restore the blocking
state to the one before taking the mutex - but the condition wait is
written without explicit passing of any value from the point of taking
the mutex.

I'm not sure how it carries over to Haskell patterns though.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: asynchronous exceptions

2006-04-06 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I think it's unnecessary to treat signals in the way you do - you're
 assuming that a signal interrupts the current thread and runs a new
 computation (the signal handler) on the same stack, completely blocking
 the interrupted thread until the signal handler completes.  This is the
 wrong way to view signal handlers, IMO: they should run in completely
 separate threads (perhaps a higher priority thread, if possible).

This can be emulated in my model: by designating a thread for system
signals, possibly even letting it spawn a new thread for each signal.

Most Unix signals are supposed to abort the process however, and thus
a mechanism for aborting one thread from another is needed anyway.
I think async exceptions are not that much easier than
async signals.

Async signals include the ability to pause threads in safe points,
which is needed for SIGSTOP / SIGTSTP and for my fork() wrapper. This
is not archievable with signals spawning threads + async exceptions.

   + you don't have to block signals just because you happen to
 be holding a mutex.  Synchronisation with a signal handler
 is just synchronisation with another thread.

It's still very probable that taking a mutex coincides with the need
to block async exceptions: an async exception in the middle of a
critical section implies a danger of leaving data in an inconsistent
state. Reasons for automatic blocking of async signals carry over to
async exceptions.

 I agree with your assessment of the problems with interruptible
 operations in GHC: that it is impossible to completely block async
 exceptions across a computation.  We could certainly add a way to
 do this.  Is that the substance of your objection to GHC's async
 exception mechanism?

Regarding interruptible operations, this and one more thing that I
haven't written there:

The fact that some function uses an interruptible operation internally
is a visible aspect of its behavior, both in GHC design and in mine.
This means that choosing which operations are interruptible should be
done carefully: even if some operation blocks the thread, it might be
a bad choice for an interruption point, because usage of some blocking
operations should better not have to be exposed. In my case such
blocking but uninterruptible operations include waiting for a mutex,
and waiting for a lazy variable, among others.

But Concurrent Haskell uses a single construct of MVars as mutexes,
semaphores, or communication channels. The runtime can't recognize the
pattern of usage to distinguish these cases.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: important news: refocusing discussion

2006-04-05 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I think it would be a mistake to relegate concurrency to an addendum;
 it is a central feature of the language, and in fact is one area where
 Haskell (strictly speaking GHC) is really beginning to demonstrate
 significant advantages over other languages.  We should make the most
 of it.

I agree.

Concurrency is needed for finalizers (except those which only call
system functions, without mutating other objects).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: asynchronous exceptions

2006-04-05 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I'm not sure whether asynchronous exceptions should be in Haskell'.
 I don't feel entirely comfortable about the interruptible operations
 facet of the design,

I designed that differently for my language. There is a distinct
synchronous mode where asynchronous exceptions are handled by
certain operations only, similarly to POSIX deferred mode of thread
cancellation. This allows to use blocking operations without being
interrupted.

http://www.cs.ioc.ee/tfp-icfp-gpce05/tfp-proc/06num.pdf

Actually I support asynchronous signals, not just exceptions:
the reaction to a signal can be something other than throwing an
exception. For example Linux SIGWINCH should be handled by resizing
and repainting the screen at an appropriate moment, not by aborting
any computation in progress.

 The fact that throwTo can interrupt a takeMVar, but can't interrupt
 a foreign call, even a concurrent one, is a bit strange.

When entering foreign mode in C code embedded in Kogut (which
corresponds to concurrent foreign imports in Haskell but is less
convenient to use), it's possible to specify how that thread wishes
to be interrupted in case someone else sends it an asynchronous signal
during the foreign trip.

The only implemented application of this mechanism is sending a Unix
signal. This is enough to interrupt blocking syscalls like waitpid.
If waitpid fails and errno == EINTR, pending signals for this thread
are processed and waiting continues (unless some signal handler has
thrown an exception).

Implementing this without race conditions requires a sigsafe library
or something equivalent.


John Meacham [EMAIL PROTECTED] writes:

  * do we require the thrower to 'block' until the signal is recieved?
   (only relevant to pre-emptive implementations)

My language doesn't do it, and I'm not convinced that Haskell should
block. It's more efficient to make this non-blocking, and I think
usually this is what is needed.

  * what happens if mutilple thrown exceptions pile up before the
catcher gets to them?

In my language each thread has a queue of pending asynchronous
signals, and they are processed in order.

Handling an asynchronous signal, or throwing an exception until it is
handled, blocks further signals automatically, so more signals are
processed only after the previous signal was handled.

An exception handler is not in a tail position wrt. the catching
construct, for two reasons: the state of asynchronous signals is
restored after handling the exception, and a stack trace shown when
the exception is propagated to the toplevel without being handled
includes code in unfinished exception handlers.

There is a separate exception handling syntax when the exception
should be considered already handled, for cases when the exception
handler should be in a tail context.

  * what happns to exceptions that fall off the end of threads, or the
main thread? (should be answered anyway)

In my case a thread body ends with a value or with an exception,
and this can be examined when joining a thread, or by default the
exception is propagated in the joiner. This has a disadvantage
that errors in threads nobody waits for might be left undetected,
unless they use an explicit wrapper.

For the main thread there is a settable handler of exceptions
reaching the toplevel, which by default handles some exceptions
specially (Unix signals, and a request of program termination),
and others are printed along with a stack trace.

  * promtness? how much work is the target allowed to do before it sees
the exception? pthreads allows an implementation to delay processing
an exception to a cancellation point do we want the same thing in
haskell?

Perhaps. My design includes that.


David Roundy [EMAIL PROTECTED] writes:

 It would also be nice to address signal behavior, and by default state that
 signals should be converted to asynchronous exceptions.

This is not enough for SIGWINCH, or for SIGHUP used to trigger
reloading configuration files.

OTOH purity of Haskell's functional subsystem has some nice
consequences for asynchronous exceptions which don't have to carry
over to asynchronous signals which don't necessarily abort the
computation. If the signal is guaranteed to abort some part of
IO code, then it makes sense to revert thunks under evaluation.
If the signal only causes to execute some handler, then reverting
them might be wasteful, as they will soon be needed again.

 The only downside I can see of this as default behavior would be
 that in cooperative systems the response to a sigTERM might be
 very slow.

Right, it's a pity, and I agree that benefits outweigh this problem.

In my implementation the thread which handles system signals
(settable, defaults to the main thread) needs to be chosen by the
scheduler in order to process the signal. It might take some time
if there is a lot of threads.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ 

Re: How to find out the C type signature corresponding to a Haskell function type in FFI?

2006-03-08 Thread Marcin 'Qrczak' Kowalczyk
Brian Hulley [EMAIL PROTECTED] writes:

 I've got a Haskell module with the following ffi import:

 foreign import ccall duma_init :: Int - IO Int

 However my problem is that I've got no idea what the type signature
 for the corresponding C function should be,

HsInt duma_init(HsInt arg);

Or use int on the C side and CInt on the Haskell side.

fromIntegral can be used for converting integers in Haskell.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Export lists in modules

2006-02-23 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 The main difference is that I'm doing away with parentheses, commas, and
 export specifiers, and using layout and full declaration syntax instead.
 (I don't really want to discuss this very rough idea any more though,
 it's just a distraction, and I'm not sure I like it anyway).

I like this general idea, I was thinking about something similar
a long time ago.

But in case of a large datatype, e.g. an AST, we certainly don't
want to duplicate it in whole. It should be sufficient in the
export section.

So perhaps what is really needed is the ability to split a module
into a public part and a private part, and allowing to duplicate
certain definitions in a partial form, e.g. have both 'data Foo'
and 'data Foo = Con1 | Con2' in the same module. The details are
not obvious though because the syntax has not been designed with
this in mind.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: [Haskell-cafe] Re: standard poll/select interface

2006-02-23 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I think the reason we set O_NONBLOCK is so that we don't have to test
 with select() before reading, we can just call read().  If you don't
 use O_NONBLOCK, you need two system calls to read/write instead of
 one. This probably isn't a big deal, given that we're buffering anyway.

I've heard that for Linux sockets select/poll/epoll might say that
data is available where it in fact is not (it may be triggered by
socket activity which doesn't result in new data). Select/poll/epoll
are designed to work primarily with non-blocking I/O.

In my implementation of my language pthreads are optionally used in
the way very similar to your paper Extending the Haskell Foreign
Function Interface with Concurrency. This means that I have a choice
of using blocking or non-blocking I/O for a given descriptor, both
work similarly, but blocking I/O takes up an OS thread. Each file
has a blocking flag kept in its data.

A non-blocking I/O is done in the same thread. The timer signal is
kept active, so if another process has switched the file to blocking,
it will be woken up by the timer signal and won't block the whole
process. The thread performing the I/O will only waste its timeslices.

A blocking I/O temporarily releases access to the the runtime, setting
up a worker OS thread for other threads if needed etc. As an
optimization, if there are no other threads to be run by the scheduler
(no running threads, nor waiting for I/O, nor waiting for a timeout,
and we are the thread which handles system signals), then runtime is
not physically released (no worker OS threads, no unlinking of the
thread structure), only the signal mask is changed so the visible
semantics is maintained. This is common to other such potentially
blocking system calls. I don't know if GHC does something similar.

(I recently made it working even if a thread that my runtime has not
seen before wants to access the runtime. If the optimization of not
physically releasing the runtime was in place, the new thread performs
the actions on behalf of the previous thread.)

In either case EAGAIN causes the thread to block, asking the scheduler
to wake it up when I/O is ready. This means that even if some other
process has switched the file to non-blocking, the process will only
do unnecessary context switches.

It's important to make this working when the blocking flag is out
of sync. The Unix blocking flag is not even associated with the
descriptor but with an open file, i.e. it's shared with descriptors
created by dup(), so it might be hard to predict without asking the
OS.

If pthreads are available, stdin, stdout and stderr are kept blocking,
because they are often shared with other processes, and making them
blocking works well. Without pthreads they are non-blocking, because
I felt it was more important to not waste timeslices of the thread
performing I/O than to be nice to other processes. In both cases pipes
and sockets are non-blocking, while named files are blocking. The
programmer can change the blocking state explicitly, but this is
probably useful only when setting up redirections before exec*().

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: standard poll/select interface

2006-02-23 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I agree that a generic select/poll interface would be nice.

We must be aware that epoll (and I think kqueue too) registers event
sources in advance, separately from waiting, which is its primary
advantage over poll.

The interface should use this model because it's easy to implement it
in terms of select/poll without losing efficiency, but the converse
would lose the benefit of epoll.

(My runtime has a generic interface on the C level only, for hooking
another implementation to be used by the scheduler.)

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why is $ right associative instead of leftassociative?

2006-02-12 Thread Marcin 'Qrczak' Kowalczyk
Brian Hulley [EMAIL PROTECTED] writes:

 My final suggestion if anyone is interested is as follows:

 1) Use : for types
 2) Use , instead of ; in the block syntax so that all brace blocks
 can be replaced by layout if desired (including record blocks)
 3) Use ; for list cons. ; is already used for forming lists in
 natural language, and has the added advantage that (on my keyboard at
 least) you don't even need to press the shift key! ;-)

My language uses \ for cons, and ? for lambda.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library

2006-02-12 Thread Marcin 'Qrczak' Kowalczyk
Bulat Ziganshin [EMAIL PROTECTED] writes:

 i reported only the speed of the buffering transformers. this don't
 include speed of char encoding that should be very low at this time.

Recoding will be slow if it's done on top of buffering and if encoding
itself has heavy startup. Buffering should be on the very top, so it
amortizes the cost of starting the recoder.

It should be possible to use iconv for recoding. Iconv works on blocks
and it should not be applied to one character at a time.

Byte streams and character streams should be distinguished in types,
preferably by class-constrained parametric polymorphism. In particular
byte buffers and char buffers should be reperesented differently,
so block copying between byte streams moves whole blocks of memory.

I have designed and implemented these issues for my language Kogut,
and now I'm trying to port them to Haskell.

Static typing gets in the way in various places (hiding the type
behind an existentially qualified type, passing optional named
arguments), but it looks doable.

The design uses buffers internally. A buffer is a queue of characters
or bytes, with amortized O(1) cost of operating at an end, and fast
block operations. A buffer itself is a stream too (reading eats its
contents from the beginning, writing appends to the end).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library

2006-02-12 Thread Marcin 'Qrczak' Kowalczyk
Bulat Ziganshin [EMAIL PROTECTED] writes:

 MQK It should be possible to use iconv for recoding. Iconv works on
 MQK blocks and it should not be applied to one character at a time.

 recoding don't need any startup.

Calling iconv (or other similar routine) does need startup. And you
really don't want to reimplement all encoders/decoders by hand in
Haskell.

Processing a stateful encoding needs the time to pick up the state
and convert the materialized state into a form used during recoding
Dispatching to the encoding function (usually not known statically)
takes time. When we generically convert an encoder which fails for
invalid data, to an encoder which replaces invalid data with U+FFFD
or question marks, setting up exception handlers takes time. These
are all little times, but they can be avoided.

Converting newlines takes time, and it's very similar to character
recoding. It should be done transparently; network protocols often use
CR-LF newlines, and it's painful to remember to output a '\r' before
every newline by hand. It should be done on top of character recoding;
consider UTF-16, where newline conversion works in terms of characters
rather than bytes.

Some conversions can be implemented with tight loops which keep data
in machine registers. The tightness matters when there are many
iterations; loop startup is amortized by buffering.

Buffering can provide arbitrarily far lookahead, arbitrarily long
putback, and checking for end of stream while logically not moving
the current position. But this works only if buffering is the last
stage which changes stream contents.

 MQK Byte streams and character streams should be distinguished in types,
 MQK preferably by class-constrained parametric polymorphism. In particular

 so that vGetBuf, vGetChar, and getWord32 can't be used at the same
 stream?

You can get bytes from a given byte stream, and get bytes from a
character stream put on top of that byte stream. Buf if the protocol
mixes bytes with characters and is specified in terms of bytes, it's
probably better to work in terms of bytes, and convert byte strings
to character strings after determining where they end.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re[2]: Streams: the extensible I/O library

2006-02-12 Thread Marcin 'Qrczak' Kowalczyk
Bulat Ziganshin [EMAIL PROTECTED] writes:

 recoding don't need any startup. each vGetChar or vPutChar
 just executes one or more vGetByte/vPutByte calls, according to
 encoding rules. this should be fast enough

Hmm, your interface for the encoder (String - [Word8]) doesn't
support stateful encodings at all, like ISO-2022-JP.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: UTF-8 decoding

2006-02-11 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 Another possibility is quasi-utf8 encoding. where it passes through any
 invalid utf8 sequences as latin1 characters. in practice, this works
 very well as interpreting both latin1 and utf8 transparently but is
 more than somewhat hacky.

It would not be reliable. I'm strongly against that: it gives an
illusion that Latin1 works, but it breaks in very rare cases.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: FilePath as ADT

2006-02-04 Thread Marcin 'Qrczak' Kowalczyk
Axel Simon [EMAIL PROTECTED] writes:

 The solution of representing a file name abstractly is also used by
 the Java libraries.

I think it is not. Besides using Java UTF-16 strings for filenames,
there is the File class, but it also uses Java strings. The
documentation of listFiles() says that each resulting File is made
using the File(File, String) constructor. The GNU Java implementation
uses a single Java string inside it.

On Windows the OS uses UTF-16 strings natively rather than byte
sequences. UTF-16 and Unicode is almost interconvertible (modulo
illegal sequences of surrogates), while converting between UTF-16
and byte sequences is messy. This means that unconditionally using
Word8 as the representation of filenames would be bad.

I don't know a good solution.

  *   *   *

Encouraged by Mono, for my language Kogut I adopted a hack that
Unicode people hate: the possibility to use a modified UTF-8 variant
where byte sequences which are illegal in UTF-8 are decoded into
U+ followed by another character. This encoding is used as the
default encoding instead of the true UTF-8 if the locale says that
UTF-8 should be used and a particular environment variable is set
(KO_UTF8_ESCAPED_BYTES=1).

The encoding has the following properties:

- Any byte sequence is decodable to a character sequence, which
  encodes back to the original byte sequence.

- Different character sequences encode to different byte sequences
  (the U+ escape is valid only when it would be necessary).

- It coincides with UTF-8 for valid UTF-8 byte sequences not
  containing 0x00, and character sequences not containing U+.

It's a hack, and doesn't address other encodings than UTF-8, but it
was good enough for me; it allows to maintain the illusion that OS
strings are character strings. Alternatives were:

* Use byte strings and character strings in different places,
  sometimes using a different type depending on the OS (Windows
  filenames would be character strings).

  Disadvantages: It's hard to write a filename to a text file.
  The API is more complex. The programmer must too often care
  about the kind of a string.

* Fail when encountering byte strings which can't be decoded.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: give equal rights to types and classes! :)

2006-02-03 Thread Marcin 'Qrczak' Kowalczyk
Bulat Ziganshin [EMAIL PROTECTED] writes:

 if my idea was incorporated in Haskell, this change don't require
 even changing signatures of most functions working with arrays -
 just Array type become Array interface, what a much difference?

What would 'Eq - Eq - Ord - Bool' mean?
'(Eq a, Eq b, Ord c) = a - b - c - Bool'?
  '(Eq a, Ord b) = a - a - b - Bool'?
  '(Eq a, Ord a) = a - a - a - Bool'?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: Test performance impact

2006-02-01 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 Summary: 2 programs failed to compile due to type errors (anna, gg).
 One program did 19% more allocation, a few other programs increased
 allocation very slightly (2%).

I wonder how many programs would fail to compile if local identifier
bindings without type signatures would be treated as pattern bindings
(monomorphic, no matter whether overloaded or not) and global ones as
polymorphic. This is the solution I would make.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://haskell.org/mailman/listinfo/haskell-prime


Re: [Haskell] IO == ST RealWorld

2006-01-30 Thread Marcin 'Qrczak' Kowalczyk
Axel Simon [EMAIL PROTECTED] writes:

 One function that particularly annoyed me is in Control.Exception

 handle :: (Exception - IO a) - IO a - IO a

 should be

 handle :: MonadIO m = (Exception - m a) - m a - m a

I think it would be unimplementable.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Strictness question

2005-06-07 Thread Marcin 'Qrczak' Kowalczyk
Ben Lippmeier [EMAIL PROTECTED] writes:

 To gloss over details: it'll reduce x far enough so it knows that it's
 an Integer, but it won't nessesarally compute that integers value.

No, Integers don't contain any lazy components.
It statically knows that it's an integer.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell-cafe] Space questions about intern and sets

2005-06-01 Thread Marcin 'Qrczak' Kowalczyk
Gracjan Polak [EMAIL PROTECTED] writes:

 intern :: Ord a = a - a
 intern x = unsafePerformIO $ internIO x

 iorefset :: Ord a = IORef(Map.Map a a)
 iorefset = unsafePerformIO $ do
  newIORef $ Map.empty

It will not work because you can't put values of different types as
keys of the same dictionary, as you can't compare them.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Are new sequences really O(1)?

2005-05-31 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 Nice trick. Unfortunately the same assumptions don't hold for GHC's
 garbage collector - objects are aged in the youngest generation, so
 it is usually at least two GCs before an object is promoted. We
 could still do the same trick, but instead we'd have to find maximum
 stack depth at which all objects below are in an older generation.

I don't know how to do it under these assumptions, i.e. how to find
that depth.

 Incedentally, how do you mark the stack depth? Overwrite a return
 address with a special one, and keep the real one around in a known
 place?

This is the first thing I tried :-) and even implemented it before I
realized that it doesn't work under my assumptions. The problem is
that I pass the return address in a virtual register, and it's saved
on the stack by the callee only if it  performs some non-tail calls.
The return address is saved at the end of the stack frame.

This means that a tail call followed by a non-tail call might read the
special return address from the stack, but without jumping through it
immediately. This return address is put again on the stack by a
different function, with a possibly different stack frame size, so it
lands in a different position on the stack, and thus it can't be found
when GC wants to restore it.

While changing the stack frame layout could perhaps make this
workable, I found a much simpler solution. Since forever each stack
frame contained a pointer used only by the GC and by stack trace
printing. It points to a static structure containing the frame size
and return address - source location mapping. This pointer is now
marked in the lowest bit by the GC. Pushing a fresh stack frame always
puts an even pointer.

BTW, a few days earlier I fixed a similar behavior for extensible
arrays. Most mutable objects use a write barrier which stores changed
locations, but arrays store references to changed objects because
their payload is malloced and may be reallocated without a GC. This
meant that code which repeatedly appends to a given array has O(N^2)
complexity for huge arrays, as each GC scans the whole array. So I now
store the size of the initial part of the array which is known to be
unchanged since last GC. This is updated manually by particular
operations.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Are new sequences really O(1)?

2005-05-27 Thread Marcin 'Qrczak' Kowalczyk
[moved from libraries to glasgow-haskell-users]

Ross Paterson [EMAIL PROTECTED] writes:

 GCs that happen during this process will be more expensive, as they
 have to scan the stack. I suspect that GC costs are swamping
 everything else for large n.

I just tweaked the implementation GC in my compiler of my language,
so that minor collection doesn't scan the whole stack, but only the
part up to the deepest point where the stack pointer has been since
the previous collection. Deeper regions contain only old pointers
so they don't need to be scanned (I have only two generations).

A program which builds a list of length 270,000 non-tail-recursively,
which in a strict language leads to proportional usage of the stack,
and performs on average 4kB of temporary allocations for each element,
so there are 9,000 GCs in total with the young heap of 128 kB, runs 10
times faster after the change. GC takes 5% of the time instead of 88%.

The implementation relies on the fact that only the topmost stack
frame can be mutated, so it's enough to look only one frame deeper
than the stack pointer reached. Each frame contained a pointer which
is used only for GC and for stack trace printing, and thus it can be
marked in the lowest bit without impacting normal operations.

I prefer to make non-tail recursion efficient than to have to rewrite
algorithms to use a large heap instead of the stack.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: [Haskell] Beyond ASCII only editors for Haskell

2005-05-24 Thread Marcin 'Qrczak' Kowalczyk
Benjamin Franksen [EMAIL PROTECTED] writes:

 Please forgive me for taking this as an opportunity to rant about the 
 single misfeature of Haskell's layout syntax, which is how if/then/else 
 must be layed out.

For me it's worse that I can't write like this:

foo x = do
   y - foo x
   let z = some long line which must be
  split into two
   return (y, z)

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Re: Existing Haskell IPv6 Code

2005-05-12 Thread Marcin 'Qrczak' Kowalczyk
Peter Simons [EMAIL PROTECTED] writes:

   [URIs might be the answer]

   But what URI should represent e.g. unix datagram sockets?

 I don't think it's worth even trying to hide both stream-
 and packet-oriented services behind the same API. These are
 completely different things, treated them differently is
 fine, IMHO.

But they don't differ in addressing. In BSD sockets the difference
between streams and packets lies in socket type, while addresses
are split into address families which bijectively correspond to
protocol families.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Re: Existing Haskell IPv6 Code

2005-05-12 Thread Marcin 'Qrczak' Kowalczyk
Tony Finch [EMAIL PROTECTED] writes:

 But they don't differ in addressing. In BSD sockets the difference
 between streams and packets lies in socket type, while addresses
 are split into address families which bijectively correspond to
 protocol families.

 I believe there are some obscure protocol families that have more than
 one address family, which is why the two concepts exist in the API.

Single Unix Specification v3 specifies only AF_* constants, used for
both.

Linux man socket says:

NOTE
   The manifest constants used under BSD 4.*  for  protocol  families  are
   PF_UNIX,  PF_INET,  etc., while AF_UNIX etc. are used for address fami-
   lies. However, already the BSD man page promises: The protocol  family
   generally  is the same as the address family, and subsequent standards
   use AF_* everywhere.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell-cafe] Specify array or list size?

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
Antti-Juhani Kaijanaho [EMAIL PROTECTED] writes:

  As far as I know, the last programming language that included
  arrays' sizes in their types was Standard Pascal,
 
 There have been many such languages since Standard Pascal.  For
 example C, C++, C#, Java, Ada, VHDL, and NU-Prolog.

 C, C++ and Java do not belong to that list.  I can't speak about the
 others, not being very familiar with them.

Java and C# don't belong to that list, but C and C++ do.
I don't know about others.

 In C and C++, the declaration int n[50]; introduces an array variable
 with size 50 having the type array of int.  The size is *not* part of
 the type.

No, it introduces a variable of type array of 50 ints, which can be
converted to pointer to int.

It matters when you make a pointer of such arrays, an array of such
arrays, or sizeof such array. In C++ the size can be matched by
template parameter, and you can have separate overloadings for
separate array sizes.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Specify array or list size?

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
Sebastian Sylvan [EMAIL PROTECTED] writes:

 A list is, for me, more of a logical entity (as opposed to
 structural). It's a sequence of stuff not a particular way to
 store it (singly-linked, doubly-linked, arraylists etc.).

I call it sequence.
A list is usually a concrete type in a given language.

Exceptions: Python calls its arrays lists, and Perl calls its
sequences lists.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Compiling with NHC98

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
Daniel Carrera [EMAIL PROTECTED] writes:

 $ nhc98 prng.hs -o prng
 I/O error (user-defined), call to function `userError':
In file ./RC4.hi:
 1:1-1:6 Found _module_ but expected a interface

GHC and NHC confuse each other with prng.hi files they produce and
examine, in incompatible formats.

You can delete them; they are needed to compile other modules which
use that module.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Specify array or list size?

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
Thomas Davie [EMAIL PROTECTED] writes:

 I'm not familiar with your C++ example (not being familiar with C++),
 but I think that it's a bit of a stretch of the imagination to say
 that C introduces a variable of type array of 50 ints, the fact
 that this is now an array of 50 integers is never checked at any
 point in the compilation or run, and I'm not sure it can be even if
 KR had wanted to.

The size is taken into account when such array type is an element of
another array, and by sizeof.

int (*p)[50]; /* p may legally point only to arrays of 50 ints each */
++p; /* p is assumed to point into an array, and is moved by one
element, i.e. by 50 ints */

 As an example:

 int bobsArray[5];
 bobsArray[6] = 23;

 is not badly typed - it is merely a badly broken program.

Because the array size is not taken into account by indexing. But it's
a part of the type. These issues are independent, for example in C#
both are the opposite.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Specify array or list size?

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
David Roundy [EMAIL PROTECTED] writes:

 No, int (*p)[50] is a multidimensional array, one of the most useless
 concepts in C, and is equivalent to int p[50][] (or is it p[][50]...
 I always get my matrix subscripts messed up).

No, it's not equivalent to either. Array type are not the same as
pointer type.

(Except that in function parameters an array type really means a pointer.)

int p[50][] is an error because array element type must be a complete type,
and an array without a dimension is an incomplete type.

int p[][50] declares p as an array with unknown size, of arrays of size 50.
Such declaration makes sense:
1. as a declaration of an array defined elsewhere (usually with extern);
   it must have a size specified in some place;
2. with an initializer in braces, whose size determines the array size;
3. as function parameter, where it means a pointer.

int (*p)[50] declares p as a pointer to an array of size 50. Such declaration
can be used anywhere without restrictions - this is a complete type.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Specify array or list size?

2005-05-07 Thread Marcin 'Qrczak' Kowalczyk
Hamilton Richards [EMAIL PROTECTED] writes:

 That's not the case in C, C++, Java, or Ada. In C and C++, for
 example, given two arrays

   int X[50];
   int Y[100];

 and a function declared as

   void P( int a[] )

 then these calls

   P( X )
   P( Y )

 are both valid, because the sizes of X and Y are not part of their type.

But here you don't pass the array but a pointer to its first element.
You can even call P(x) where x is an int, not an array at all.

Consider this:

int X[50];
int Y[100];
void P(int (a)[50]) {}
int main() {
   P(X); // valid
   P(Y); // invalid
}

 In C and C++, there's not even any way for a function to discover
 the size of an array argument.

templateint N
int size(int (a)[N]) {
return N;
}

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Haskell vs OCaml

2005-05-04 Thread Marcin 'Qrczak' Kowalczyk
Donn Cave [EMAIL PROTECTED] writes:

 I have been able to build ocaml everywhere I have wanted it, including
 the native code compiler.

And it builds itself much faster than GHC.

(I couldn't measure how much, because GHC didn't build at all, failing
to find HsBaseConfig.h.in.)

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Haskell vs OCaml

2005-05-03 Thread Marcin 'Qrczak' Kowalczyk
John Goerzen [EMAIL PROTECTED] writes:

 I'd say that there are probably no features OCaml has that Haskell
 lacks that are worth mentioning.

Its type system has some interesting features: polymorphic variants,
parametric modules, labeled and optional arguments, objects, variance
annotations of type parameters used for explicit subtyping.

It has more convenient exceptions: the exn type can be extended with
new cases which look like variants of algebraic types.

There is camlp4 for extending the syntax or changing it completely.

OTOH Haskell provides type classes, better integrated arbitrary
precision integer type, type variables with kinds other than *,
polymorphic recursion, much better FFI, and with GHC extensions:
universal and existential quantifiers in function types (OTOH OCaml
recently got universal quantifiers in record fields), GADTs, implicit
parameters, template Haskell.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Haskell vs OCaml

2005-05-03 Thread Marcin 'Qrczak' Kowalczyk
Michael Vanier [EMAIL PROTECTED] writes:

 I also learned ocaml before learning haskell, and the biggest single
 difference I found is that haskell is a lazy, purely functional language
 and ocaml is a strict, mostly functional language.

Indeed.

In contrast to this one, my differences were not inherent in the
languages - I think most of them could be ported to the other language
without conceptual difficulties and without changing its core properties
(they don't depend on purity/impurity nor laziness/strictness).

 Another big difference between ocaml and haskell is that haskell has type
 classes and ocaml does not.

OCaml people recognize this but said that it's too big piece of
design, it would complicate OCaml type system too much. Especially
as sometimes modules or objects can be used for the same purpose,
it would increase the overlap of OCaml features.

There is some experimental design of overloading, called generics
there (someone has said that languages use the term generics for
the kind of polymorphism we didn't have). I didn't like it, or
perhaps I didn't understand it enough; it was less expressive than
type classes, you couldn't extend a given function in several modules
independently and then combine all extensions.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Eternal Compatibility In Theory

2005-05-02 Thread Marcin 'Qrczak' Kowalczyk
robert dockins [EMAIL PROTECTED] writes:

 Is there a way to reliably and automatically check if two versions of
 a haskell module are interface compatible?

No, because it would have to check whether the semantics of functions
is the same, even if they are written differently.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell-cafe] Instances of constrained datatypes

2005-04-06 Thread Marcin 'Qrczak' Kowalczyk
Arjun Guha [EMAIL PROTECTED] writes:

   data (Eq v) = EqList v = EqList [v]

 I'd like to make it an instance of Functor.  However, fmap takes an
 arbitrary function  of type a - b.  I need an Eq constraint on a and
 b.  Is there any way to do this without creating my own `EqFunctor'
 class with explicitly-kinded quantification:

No.

Why not to remove the Eq v constraint in the type?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] URLs in haskell module namespace

2005-03-23 Thread Marcin 'Qrczak' Kowalczyk
Here is what I designed and implemented for my language Kogut:

There is a file format of compilation parameters (compiler options,
source file encoding, directories to look for interface files for
imported modules, directories to look for libraries, C libraries
to link, directories to look for packages, packages used).

Parameters are gathered from several places:
- From the command line (in a slightly different format than a file).
- From the file name derived from the source file, with a changed
  extension.
- From common.kop in the directory of the source file.
- From common.kop in parent directories of the source, up to the
  current directory.
- From used packages. A package actually corresponds to such parameter
  file, nothing else. Packages are included in the global dependency
  order, with duplicates removed.
- From the default file in the compiler installation.

Some parameters are accumulated (e.g. directory lists or libraries;
with duplicates removed; the order is important for static libraries)
while others are overridden (e.g. the C compiler to use).

So if a package is needed to bring one module used in one place,
it can be specified near the file which needs it (and there is
less chance that the dependency will stick forgotten when no longer
needed). OTOH a package used all over the place will be given in the
common.kop in the root of the directory tree.

A package usually has two parameter files: one common.kop used during
its compilation, and another named after the package which is used by
its clients.

Since compilation options can be put in parameter files corresponding
to source files, it's not necessary to invent a way to specify them
per-file in Makefiles.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Unicode Source / Keyboard Layout

2005-03-21 Thread Marcin 'Qrczak' Kowalczyk
Sven Moritz Hallberg [EMAIL PROTECTED] writes:

   1. In addition to the backslash, accept mathematical * small
 lamda (U+1D6CC, U+1D706, U+1D740, U+1D77A, and U+1D7B4) for lambda
 abstractions. Leave greek small letter lamda as a regular letter,
 so the Greeks can write their native language.

I think making a lone Greek lambda a keyword would be better. It would
be available as a letter but not as a whole word.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell-cafe] invalid character encoding

2005-03-19 Thread Marcin 'Qrczak' Kowalczyk
Wolfgang Thaller [EMAIL PROTECTED] writes:

 Also, IIRC, Java strings are supposed to be unicode, too -
 how do they deal with the problem?

Java (Sun)
--

Filenames are assumed to be in the locale encoding.

a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD.

b) Creating. Characters which cannot be converted are replaced by ?.

Command line arguments and standard I/O are treated in the same way.


Java (GNU)
--

Filenames are assumed to be in Java-modified UTF-8.

a) Interpreting. If a filename cannot be converted, a directory listing
   contains a null instead of a string object.

b) Creating. All Java characters are representable in Java-modified UTF-8.
   Obviously not all potential filenames can be represented.

Command line arguments are interpreted according to the locale.
Bytes which cannot be converted are skipped.

Standard I/O works in ISO-8859-1 by default. Obviously all input is
accepted. On output characters above U+00FF are replaced by ?.


C# (mono)
-

Filenames use the list of encodings from the MONO_EXTERNAL_ENCODINGS
environment variable, with UTF-8 implicitly added at the end. These
encodings are tried in order.

a) Interpreting. If a filename cannot be converted, it's skipped in
   a directory listing.

   The documentation says that if a filename, a command line argument
   etc. looks like valid UTF-8, it is treated as such first, and
   MONO_EXTERNAL_ENCODINGS is consulted only in remaining cases.
   The reality seems to not match this (mono-1.0.5).

b) Creating. If UTF-8 is used, U+ throws an exception
   (System.ArgumentException: Path contains invalid chars), paired
   surrogates are treated correctly, and an isolated surrogate causes
   an internal error:
** ERROR **: file strenc.c: line 161 (mono_unicode_to_external): assertion 
failed: (utf8!=NULL)
aborting...

Command line arguments are treated in the same way, except that if an
argument cannot be converted, the program dies at start:
[Invalid UTF-8]
Cannot determine the text encoding for argument 1 (xxx\xb1\xe6\xea).
Please add the correct encoding to MONO_EXTERNAL_ENCODINGS and try again.

Console.WriteLine emits UTF-8. Paired surrogates are treated
correctly, unpaired surrogates are converted to pseudo-UTF-8.

Console.ReadLine interprets text as UTF-8. Bytes which cannot be
converted are skipped.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] invalid character encoding

2005-03-18 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

  If you provide wrapper functions which take String arguments,
  either they should have an encoding argument or the encoding should
  be a mutable per-terminal setting.
 
 There is already a mutable setting. It's called locale.

 It isn't a per-terminal setting.

A separate setting would force users to configure an encoding just
for the purposes of Haskell programs, as if the configuration wasn't
already too fragmented. It's unwise to propose a new standard when an
existing standard works well enough.

  It is possible for curses to be used with a terminal which doesn't
  use the locale's encoding.
 
 No, it will break under the new wide character curses API,

 Or expose the fact that the WC API is broken, depending upon your POV.

It's the only curses API which allows to write full-screen programs in
UTF-8 mode.

  Also, it's quite common to use non-standard encodings with terminals
  (e.g. codepage 437, which has graphic characters beyond the ACS_* set
  which terminfo understands).
 
 curses don't support that.

 Sure it does. You pass the appropriate bytes to waddstr() etc and they
 get sent to the terminal as-is.

It doesn't support that and it will switch the terminal mode to user
encoding (which is usually ISO-8859-x) on a first occasion, e.g. after
an ACS_* macro was used, or maybe even at initialization.

curses support two families of encodings: the current locale encoding
and ACS. The locale encoding may be UTF-8 (works only with wide
character API).

 For compatibility the default locale is C, but new programs
 which are prepared for I18N should do setlocale(LC_CTYPE, )
 and setlocale(LC_MESSAGES, ).

 In practice, you end up continuously calling setlocale(LC_CTYPE, )
 and setlocale(LC_CTYPE, C), depending upon whether the text is meant
 to be human-readable (locale-dependent) or a machine-readable format
 (locale-independent, i.e. C locale).

I wrote LC_TYPE, not LC_ALL. LC_TYPE doesn't affect %f formatting,
it only affects the encoding of texts emitted by gettext (including
strerror) and the meaning of isalpha, toupper etc.

 The LC_* environment variables are the parameters for the encoding.

 But they are only really parameters at the exec() level.

This is usually the right place to specify it. It's rare that they
are even set separately for the given program - usually they are
per-system or per-user.

 Once the program starts, the locale settings become global mutable
 state. I would have thought that, more than anyone else, the
 readership of this list would understand what's bad about that
 concept.

You can treat it as immutable. Just don't call setlocale with
different arguments again.

 Another problem with having a single locale: if a program isn't
 working, and you need to communicate with its developers, you will
 often have to run the program in an English locale just so that you
 will get error messages which the developers understand.

You don't need to change LC_CTYPE for that. Just set LC_MESSAGES.

 Then how would a Haskell program know what encoding to use for
 stdout messages?

 It doesn't necessarily need to. If you are using message catalogues,
 you just read bytes from the catalogue and write them to stdout.

gettext uses the locale to choose the encoding. Messages are
internally stored as UTF-8 but emitted in the locale encoding.

You are using the semantics I'm advocating without knowing that...

 How would it know how to interpret filenames for graphical
 display?

 An option menu on the file selector is one option; heuristics are
 another.

Heuristics won't distinguish various ISO-8859-x from each other.

An option menu on the file selector is user-unfriendly because users
don't want to configure it for each program separately. They want to
set it in one place and expect it to work everywhere.

Currently there are two such places: the locale, and
G_FILENAME_ENCODING (or older G_BROKEN_FILENAMES) for glib. It's
unwise to introduce yet another convention, and it would be a horrible
idea to make it per-program.

 At least Gtk-1 would attempt to display the filename; you would get
 the odd question mark but at least you could select the file;

Gtk+2 also attempts to display the filename. It can be opened
even though the filename has inconvertible characters escaped.

 The current locale mechanism is just a way of avoiding the issues
 as much as possible when you can't get away with avoiding them
 altogether.

It's a way to communicate the encoding of the terminal, filenames,
strerror, gettext etc.

 Unicode has been described (accurately, IMHO) as Esperanto for
 computers. Both use the same approach to try to solve essentially the
 same problem. And both will be about as successful in the long run.

Unicode has no viable competition.
Esperanto had English.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___

Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 The (non-wchar) curses API functions take byte strings (char*),
 so the Haskell bindings should take CString or [Word8] arguments.

Programmers will not want to use such interface. When they want to
display a string, it will be in Haskell String type.

And it prevents having a single Haskell interface which uses either
the narrow or wide version of curses interface, depending on what is
available.

 If you provide wrapper functions which take String arguments,
 either they should have an encoding argument or the encoding should
 be a mutable per-terminal setting.

There is already a mutable setting. It's called locale.

 I don't know enough about the wchar version of curses to comment on
 that.

It uses wcsrtombs or eqiuvalents to display characters. And the
reverse to interpret keystrokes.

 It is possible for curses to be used with a terminal which doesn't
 use the locale's encoding.

No, it will break under the new wide character curses API, and it will
confuse programs which use the old narrow character API.

The user (or the administrator) is responsible for matching the locale
encoding with the terminal encoding.

 Also, it's quite common to use non-standard encodings with terminals
 (e.g. codepage 437, which has graphic characters beyond the ACS_* set
 which terminfo understands).

curses don't support that.

 The locale encoding is the right encoding to use for conversion of the
 result of strerror, gai_strerror, msg member of gzip compressor state
 etc. When an I/O error occurs and the error code is translated to a
 Haskell exception and then shown to the user, why would the application
 need to specify the encoding and how?

 Because the application may be using multiple locales/encodings.

But strerror always returns messages in the locale encoding.
Just like Gtk+2 always accepts texts in UTF-8.

For compatibility the default locale is C, but new programs
which are prepared for I18N should do setlocale(LC_CTYPE, )
and setlocale(LC_MESSAGES, ).

There are places where the encoding is settable independently,
or stored explicitly. For them Haskell should have withCString /
peekCString / etc. with an explicit encoding. And there are
places which use the locale encoding instead of having a separate
switch.

 [The most common example is printf(%f). You need to use the C
 locale (decimal point) for machine-readable text but the user's
 locale (locale-specific decimal separator) for human-readable text.

This is a different thing, and it is what IMHO C did wrong.

 This isn't directly related to encodings per se, but a good example
 of why parameters are preferable to state.]

The LC_* environment variables are the parameters for the encoding.
There is no other convention to pass the encoding to be used for
textual output to stdout for example.

 C libraries which use the locale do so as a last resort.

No, they do it by default.

 The only reason that the C locale mechanism isn't a major nuisance
 is that you can largely ignore it altogether.

Then how would a Haskell program know what encoding to use for stdout
messages? How would it know how to interpret filenames for graphical
display?

Do you want to invent a separate mechanism for communicating that, so
that an administrator has to set up a dozen of environment variables
and teach each program separately about the encoding it should assume
by default? We had this mess 10 years ago, and parts of it are still
alive until today - you must sometimes configure xterm or Emacs
separately, but it's being more common that programs know to use the
system-supplied setting and don't have to be configured separately.

 Code which requires real I18N can use other mechanisms, and code
 which doesn't require any I18N can just pass byte strings around and
 leave encoding issues to code which actually has enough context to
 handle them correctly.

Haskell can't just pass byte strings around without turning the
Unicode support into a joke (which it is now).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the
 use of the locale's encoding for filenames (if you have filenames in
 multiple encodings, you lose; filenames using the wrong encoding
 simply don't appear in file selectors).

Actually they do appear, even though you can't type their names
from the keyboard. The name shown in the GUI used to be escaped in
different ways by different programs or even different places in one
program (question marks, %hex escapes \oct escapes), but recently
they added some functions to glib to make the behavior uniform.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] invalid character encoding

2005-03-16 Thread Marcin 'Qrczak' Kowalczyk
Duncan Coutts [EMAIL PROTECTED] writes:

 It doesn't affect functions added by the hierarchical libraries,
 i.e. those functions are safe only with the ASCII subset. (There is
 a vague plan to make Foreign.C.String conform to the FFI spec,
 which mandates locale-based encoding, and thus would change all
 those, but it's still up in the air.)

 Hmm. I'm not convinced that automatically converting to the current
 locale is the ideal behaviour (it'd certianly break all my programs!).
 Certainly a function for converting into the encoding of the current
 locale would be useful for may users but it's important to be able to
 know the encoding with certainty.

It should only be the default, not the only option. It should be
possible to specify the encoding explicitly.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] invalid character encoding

2005-03-16 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 It should be possible to specify the encoding explicitly.

 Conversely, it shouldn't be possible to avoid specifying the
 encoding explicitly.

What encoding should a binding to readline or curses use?

Curses in C comes in two flavors: the traditional byte version and a
wide character version. The second version is easy if we can assume
that wchar_t is Unicode, but it's not always available and until
recently in ncurses it was buggy. Let's assume we are using the byte
version. How to encode strings?

A terminal uses an ASCII-compatible encoding. Wide character version
of curses convert characters to the locale encoding, and byte version
passes bytes unchanged. This means that if a Haskell binding to the
wide character version does the obvious thing and passes Unicode
directly, then an equivalent behavior can be obtained from the byte
version (only limited to 256-character encodings) by using the locale
encoding.

The locale encoding is the right encoding to use for conversion of the
result of strerror, gai_strerror, msg member of gzip compressor state
etc. When an I/O error occurs and the error code is translated to a
Haskell exception and then shown to the user, why would the application
need to specify the encoding and how?

 If application code doesn't want to use the locale's encoding, it
 shouldn't be shoe-horned into doing so because a library developer
 decided to duck the encoding issues by grabbing whatever encoding
 was readily to hand (i.e. the locale's encoding).

If a C library is written with the assumption that texts are in the
locale encoding, a Haskell binding to such library should respect that
assumption.

Only some libraries allow to work with different, explicitly specified
encodings. Many libraries don't, especially if the texts are not the
core of the library functionality but error messages.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] invalid character encoding

2005-03-16 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 In any case, we need tools to be able to conform to the common cases
 of ascii-only (withCAStrirg) and current locale (withCString).

 withUTF8String would be a nice addition, but is much less important to
 come standard as it can easily be written by end users, unlike locale
 specific versions which are necessarily system dependent.

IMHO the encoding should be a parameter of an extended variant of
withCString (and peekCString etc.).

We need a framework for implementing encoders/decoders first.
A problem with designing the framework is that it should support
both pure Haskell conversions and C functions like iconv which work
on arrays. We must also provide a way to signal errors.

A bonus is a way to handle errors coming from another recoder without
causing it to fail completely. That way one could add a fallback for
unrepresentable characters, e.g. HTML entities or approximations with
stripped accents.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Bound threads

2005-03-03 Thread Marcin 'Qrczak' Kowalczyk
Wolfgang Thaller [EMAIL PROTECTED] writes:

 Indeed, my brain is melting, but I did it :-)

 Congratulations. How about we found a Bound-thread-induced brain
 melt victims' support group?

The melt was entertaining :-)

 Besides simplicity, one of the main reasons for moving our select()
 call from the run-time system to the libraries was to avoid the
 performance hit of having to call select() every time through the
 scheduler loop rather than only once per IO operation.

I use epoll when available. It's Linux-specific and allows to register
and unregister descriptors separately from waiting. This not only
saves process time to set up the array, but also kernel time scanning
the array and hooking to files.

I've heard BSD kqueue mechanism has similar properties.

I unregister descriptors from epoll lazily: when epoll returned
that data is available but no thread was in fact waiting for it. This
saves repeated registration when a thread alternates between I/O and
computation.

When the scheduler determines that it has no thread to wake up
immediately, it performs a GC before going to wait if the program
did roughly at least half of work until the next normal GC.

 Imagine having one or more (unbound) threads that spend most of their
 time waiting for IO, and a bunch of (also unbound) threads that do
 some computation. If select() is part of the scheduler loop, you will
 get a select() call whenever a thread-switch between the computation
 threads happens.

Actually once the next thread in the running and I/O queue is an I/O
thread, not in every scheduler iteration. Or more precisely a consecutive
span of I/O threads in this queue.

epoll_wait takes 0.2 us here, poll takes 1 us, select takes 0.6 us
(1 descriptor in each case). I wonder why poll is slower than select.

I was thinking about integration with gtk/glib event loop. There are
two ways: either we ask glib to poll using a function supplied by us,
or we perform polling using glib functions instead of raw epoll / poll
/ select. The first choice seems better because otherwise callbacks
registered at glib were started from the scheduler and this will not
work, it's even not clear on behalf of which thread they should run.
In this case we must provide a function with an interface of poll().

Without additional support in the runtime (other than making file
objects which don't close their underlying file, but this is easy),
the function can be implemented by starting a thread for each
descriptor, collecting the results, and cancelling threads when some
descriptor is ready or when the timeout expires. Let's assume that
real poll is used by our scheduler and that no other thread does I/O
at the moment, and see what really happens:
- the threads are created at the end of the run queue
- other threads in the program execute their time slices
- each of the newly created threads is marked as waiting for I/O
- other threads in the program execute again (ugh)
- the scheduler looks at the first I/O thread and makes poll()
- all threads whose I/O is ready are woken up
- the next running thread is chosen (perhaps one of threads woken up
  in the previous step)
- it notifies the manager thread that glib-poll-emulation is ready
- when execution reaches the manager, it kills other threads and
  reports the result to glib

It seems that other than a bunch of context switches there is not much
work besides the required minimum. (It gets worse wich epoll, which is
suitable for a mostly unchanging set of watched descriptors.)

With GHC implementation I think each thread which adds a descriptor
will wake up the service thread through a pipe, and later they will
wake it up again to unregister files when they become cancelled.

 All threads except the thread performing the fork become unbound.
 [...]

 What happens when fork is called from an unbound thread? Does it
 become bound in the child process?

No. But in ForkProcess and ForkProcessKillThreads this thread plays
the role of the main thread: it receives Unix signals; it receives
internal asynchronous signals like heap overflow and deadlock; when it
terminates, the process terminates; if it terminates with an unhandled
exception, a handler which normally prints the stack trace is called.
AtExit handlers are not run though.

I don't know what should be done, this got quite hairy. Actually the
above termination semantics currently applies only when fork is called
with an I/O action as an argument. It can also be called without,
like C fork(), and in this case the behavior is ugly: if the new main
thread in the child process was not the main thread before, its
termination is not special and there will be a deadlock. This should
be changed somehow.

In any case, when the previous main thread terminates (as it's
cancelled cleanly by ForkProcess), it checks whether it's still the
main thread. If not, it disappears like normal threads, except that
its OS thread will wait on a condition variable forever.

-- 
   __( 

Re: [Haskell-cafe] Re: Bound threads

2005-03-02 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 I've now implemented a threaded runtime in my language Kogut, based
 on the design of Haskell. The main thread is bound. The thread which
 holds the capability performs I/O multiplexing itself, without a
 separate service thread.

 We found that doing this was excessively complex (well, I thought so
 anyway).

Indeed, my brain is melting, but I did it :-)

I think our approaches are incomparable in terms of additional
overhead, it depends on the program. I have added some optimizations:

If a thread which wants to perform a safe C call sees that there are
no other threads running, waiting for I/O, or waiting for timeout,
and that we are the thread which handles Unix signals, then it doesn't
notify or start another thread to enter the scheduler. When a C call
returns, it doesn't have to wake up the scheduler in this case.

Even if other threads are running, if there is currently no scheduler
doing epoll/poll/select, then a returning C call doesn't wake up the
scheduler. It only links itself to a list which will be examined by
the scheduler.

* * *

There are interesting complications with fork. POSIX only provides
fork which causes other pthreads in the child process to evaporate.
This is exactly what is wanted if the fork is soon followed by exec,
but can be disastrous if the program tries to use other threads in
the meantime.

Depending on the system pthread_join on a thread which has existed
before the fork either says that it has returned, or hangs, or fails
with ESRCH or EINVAL. And there is no way to fork while keeping other
threads running (there has been some proposal for forkall but it has
been rejected).

This means that a fork in an unfortunate state, e.g. while some
thread was holding a mutex, will left the mutex permanently locked;
pthread_atfork is supposed to protect against that. It also means that
if our language tries to continue running its threads after the fork,
then there is no way to do this if they are bound to other OS threads.
And the worker pool is useless, it should better be emptied before the
fork to reduce resource leak.

There is no semantics of fork wrt. threads which would be correct in
all cases.

Shortly before implementing bound threads I've designed and
implemented a semantics for three variants of fork, which were easy
when I have full control over what happens with my threads in the
child process (well, the third was a challenge to implement):

- ForkProcessCloneThreads - easiest to describe, but the least useful.
  Threads continue to run in both processes.

- ForkProcessKillThreads - other threads are atomically killed in the
  child process, similarly to raw POSIX. This is used before exec.
  If the program attempts to wait for the threads, the behavior is
  defined: they look as if they failed with ThreadKilled exception,
  even though they were killed without a chance to recover (this is
  a different exception than the one used for cancellation which
  signifies that it could recover).

- ForkProcess - the safest default: all threads are sent signals (in
  the sense of asynchronous communication in my language) which cause
  them to be suspended when they have signal handling unblocked (this
  roughly corresponds to Haskell's blocking of asynchronous exceptions,
  but e.g. a thread holding a mutex has signals blocked by default).
  This includes chasing newly created threads. When all threads are
  suspended, we do ForkProcessCloneThreads. Then in the parent process
  threads are resumed, and in the child they are cancelled in a polite
  way so they can release resources.

Bound threads introduced problems. They can partially be solved,
e.g. the worker pool, the wakeup pipe, epoll descriptor are correctly
recreated. But there is simply no way to return from callbacks because
the corresponding C contexts no longer exist. So I made them as
follows:

All threads except the thread performing the fork become unbound.
They have a chance to handle the thread cancellation exception until
they return from their innermost callbacks. At this time they become
killed. If ForkProcessAllThreads is done while some threads were
executing non-blocking foreign code, they become killed as well.

Besides this, there are at fork handlers, similar to pthread_atfork
but scoped over the forking action.

* * *

I measured the speed of some syscalls on my system, to see what is
worth optimizing:

- pthread_mutex_lock + unlock (NPTL)   0.1 us
- pthread_sigmask  0.3 us
- setitimer0.3 us
- read + write through a pipe  2.5 us
- gettimeofday 1.9 us

A producer/consumer test in my language (which uses mutexes and
condition variables) needs 1.4 us for one iteration if both threads
are unbound.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing 

Re: [Haskell-cafe] Re: Bound threads

2005-03-01 Thread Marcin 'Qrczak' Kowalczyk
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes:

 Why is the main thread bound?

I can answer myself: if the main thread is unbound, the end of the
program can be reached in a different OS thread, which may be a
problem if we want to return cleanly to the calling code.

I've now implemented a threaded runtime in my language Kogut, based
on the design of Haskell. The main thread is bound. The thread which
holds the capability performs I/O multiplexing itself, without a
separate service thread.

Producer/consumer ping-pong is 15 times slower between threads running
on different OS threads than on two unbound threads.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Bound threads

2005-03-01 Thread Marcin 'Qrczak' Kowalczyk
Benjamin Franksen [EMAIL PROTECTED] writes:

 Producer/consumer ping-pong is 15 times slower between threads
 running on different OS threads than on two unbound threads.

 Which OS?

Linux/NPTL.

A context switch which changes OS threads involves:
   setitimer
   pthread_sigmask
   pthread_mutex_lock
   pthread_cond_signal
   pthread_cond_wait (starting)
and in the other thread:
   pthread_cond_wait (returning)
   pthread_mutex_unlock
   pthread_sigmask
   setitimer

setitimer is necessary because I tested that it is installed per
thread rather than per process, even though SUSv3 says it should be
per process.

pthread_sigmask makes the thread holding the capability handle signals.
I've heard that the interaction of signals and threads is broken in
pre-NPTL Linux threads, I will have to check how it behaves and what
should be used in this case (perhaps having signals unblocked in all
threads).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Bound threads

2005-02-28 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

 Is it important which thread executes Haskell code (I bet no) and
 unsafe foreign calls (I don't know)? If not, couldn't the same OS
 thread execute code of both threads until a safe foreign call is made?

 Actually in a bound thread, *all* foreign calls must be made using the
 correct OS thread, not just the safe ones.  So your scheme would involve
 a context switch at every foreign call, which would end up being rather
 expensive.

Ok.

As I understand this (and as I'm trying to implement a similar scheme
for my language), when an unbound thread performs a safe C call, the
current OS thread transforms from a worker to a bound thread, and
another worker is spawn if needed for remaining Haskell threads.

So I have another idea:

Why is the main thread bound? Wouldn't it be sufficient that, in cases
where it's important to have the main Haskell thread bound to the main
OS thread, the programmer wraps the main computation in a function
which calls C and then calls back Haskell? Such function, if executed
before spawning other threads which perform safe C calls, would in
effect bind the threads together. That way there would be no OS thread
synchronization needed when the main Haskell thread synchronizes with
unbound threads.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Bound threads

2005-02-26 Thread Marcin 'Qrczak' Kowalczyk
Wolfgang Thaller [EMAIL PROTECTED] writes:

 Since the main thread is bound, and unbound threads are never executed
 on an OS thread which has some Haskell thread bound, this would imply
 that when the main thread spawns a Haskell thread and they synchronize
 a lot with each other using MVars, the synchronization needs OS-thread
 synchronization - the threads will not execute on a the same OS thread.

 Correct.

Is it important which thread executes Haskell code (I bet no) and
unsafe foreign calls (I don't know)? If not, couldn't the same OS
thread execute code of both threads until a safe foreign call is made?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Bound threads

2005-02-25 Thread Marcin 'Qrczak' Kowalczyk
I'm trying to understand the semantics and implementation of bound threads
basing on the conc-ffi paper and others.

Since the main thread is bound, and unbound threads are never executed
on an OS thread which has some Haskell thread bound, this would imply
that when the main thread spawns a Haskell thread and they synchronize
a lot with each other using MVars, the synchronization needs OS-thread
synchronization - the threads will not execute on a the same OS thread.

If I understand this correctly, doesn't it impose a significant
overhead compared to synchronizing two unbound threads? If not,
what am I missing?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] What is MonadPlus good for?

2005-02-14 Thread Marcin 'Qrczak' Kowalczyk
Josef Svenningsson [EMAIL PROTECTED] writes:

 You claimed that monad transformers break the
 mzero-is-right-identity-for-bind law because they can be applied to
 IO. I say, it's not the monad transformers fault. They cannot possibly
 be expected to repair the law if they are given a faulty monad.

I agree. They as well could be said to break the core monad laws.
It's not their fault.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] The Nature of Char and String

2005-02-02 Thread Marcin 'Qrczak' Kowalczyk
Ketil Malde [EMAIL PROTECTED] writes:

 The Haskell functions accept or return Strings but interface to OS
 functions which (at least on Unix) deal with arrays of bytes (char*),
 and the encoding issues are essentially ignored. If you pass strings
 containing anything other than ISO-8859-1, you lose.

 I'm not sure it's as bad as all that. You lose the correct Unicode
 code points (i.e. chars will have the wrong values, and strings may be
 the wrong lenght), but I think you will be able to get the same bytes
 out as you read in.  So in that sense, Char-based IO is somewhat
 encoding neutral.

 So one can have Unicode both in IO and internally, it's just that you
 don't get both at the same time :-)

That's the problem. Perl is similar: it uses the same strings for byte
arrays and for Unicode strings whose characters happen to be Latin1.
The interpretation sometimes depends on the function / library used,
and sometimes on other libraries loaded.

When I made an interface between Perl and my language Kogut (which
uses Unicode internally and converts texts exchanged with the OS,
even though conversion may fail e.g. for files not encoded using the
locale encoding - I don't have a better design yet), I had trouble
with converting Perl strings which have no characters above 0xFF.
If I treat them as Unicode, then a filename passed between the two
languages is interpreted differently. If I treat them as the locale
encoding, then it's inconsistent and passing strings in both
directions doesn't round-trip.

So I'm currently treating them as Unicode. Perl's handling of Unicode
is inconsistent with itself (e.g. for filenames containing characters
above 0xFF), I don't think I made it more broken than it already is...

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: UTF-8 BOM, really!?

2005-01-31 Thread Marcin 'Qrczak' Kowalczyk
Graham Klyne [EMAIL PROTECTED] writes:

 How can it make sense to have a BOM in UTF-8?  UTF-8 is a sequence of
 octets (bytes);  what ordering is there here that can sensibly be
 varied?

The *name* BOM doesn't make sense when applied to UTF-8, but some
software uses UTF-8 encoded U+FEFF it as a marker that the file is
encoded in UTF-8 rather than some other encoding. And Unicode seems
to support this usage, even if it doesn't recommend it.

I know only of Microsoft Notepad, and suspect other Microsoft tools
(Notepad assumes UTF-8 with the marker and the current Windows
codepage without). The HTML at http://www.microsoft.com/ begins with
a BOM, but other pages linked from there do not.

I think XML used to be silent about this, but later got amended to
explicitly say that optional U+FEFF at the beginning is allowed and
not treated as a part of document contents.

OTOH various other sofrware, in particular generic Unix tools, don't
treat UTF-8 BOM specially, and de facto implement the non-standard
UTF-8 without a BOM.

Technically in UTF-16/32 the BOM is handled in the translation between
encoding form (sequence of 16- or 32-bit code units) and encoding
scheme (these words serialized into bytes). I think it's supposed
to be the same in UTF-8, i.e. the analogous translation is *almost*
trivial - it translates bytes to the same bytes - except that initial
BOM must be stripped on decoding, and it must be added on encoding
when the first character of the contents is U+FEFF (and optionally in
other cases). I mean that it is supposed to happen on decoding UTF-8
on the level of bytes, not after decoding on the level of code points.

Anyway, on Unix it just doesn't happen at all, except in software
which explicitly handles it. iconv() doesn't handle UTF-8 BOM.

If I could decide about it, I would ban UTF-8 BOM at all. But perhaps
Unicode Consortium can be at least persuaded to recognize that some
software doesn't accept BOM in UTF-8, and could be conforming to the
variant of UTF-8 without the BOM rather than non-conforming at all.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: File path programme

2005-01-31 Thread Marcin 'Qrczak' Kowalczyk
Peter Simons [EMAIL PROTECTED] writes:

   http://cryp.to/pathspec/PathSpec.hs

 There also is a function which changes a path specification
 into its canonic form, meaning that all redundant segments
 are stripped.

It's incorrect: canon (read x/y/.. :: RelPath Posix) gives x,
yet on Unix they aren't equivalent when y is a non-local symlink
or doesn't exist.

Also, x/. is not equivalent to x: rmdir can be used with x
but not with x/..

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: File path programme

2005-01-30 Thread Marcin 'Qrczak' Kowalczyk
Stefan Monnier [EMAIL PROTECTED] writes:

 The various UTF encodings do not have this particular problem; if a UTF
 string is valid, then it is a unique representation of a unicode string.
 However, decoding is still a partial function and can fail.

 And while it is partly true, it is qualified by the problems relative to
 canonicalization (an  in Unicode can both be represented as  or as two
 chars (an e and an accent) and they should (ideally) compare equal).

In what sense equal? They are supposed to be equivalent as far
as the semantics of the text is concerned, but representations are
clearly different and most programs distinguish them. In particular
they are different filenames on both Unix and Windows. AFAIK MacOS
normalizes filenames, but using a slightly different algorithm than
Unicode (perhaps just an older version).

IMHO it makes no sense to pretend that they are exactly the same when
strings consist of code points or lower level units (and I don't
believe another choice for the default string type would be practical).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File path programme

2005-01-30 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 Then of course there's the issue that Win32 edge 
 labels are Unicode, while Posix edge labels are [Word8]. Hmm.

 Strictly speaking, they're [CChar], but I doubt that anyone will ever
 implement Haskell on a platform where a byte isn't 8 bits wide.

On POSIX it's the same.

It's not the same when only the C standard is concerned. There existed
platforms with 9 bit chars (Unisys 1100). I don't know whether they
are still in use. I doubt that any Haskell compiler will run on such
a system.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File path programme

2005-01-30 Thread Marcin 'Qrczak' Kowalczyk
David Roundy [EMAIL PROTECTED] writes:

 No, it's not Unix-specific, it's portable. If you want to write
 portable C code, you have to use the standard library, which means
 that file names are represented as Ptr CChar.

I disagree. We are talking about portable Haskell, not portable C.

The native Windows filename encoding (in WinNT) is UTF-16 (without
validation that surrogates are correctly paired). It provides API
which works with such filenames.

It also provides a compatibility layer on top of that, which tries
to translate filenames to some 8-bit encoding, but it should only be
used when porting C programs which can't be adapted to use Windows
filenames through their native type. Especially as Haskell already
uses Unicode - it's pointless to convert it to an 8-bit encoding and
back.

I don't know what these 8-bit WinAPI functions do when they encounter
filenames unrepresentable in the current 8-bit encoding.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File path programme

2005-01-30 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 And it isn't a theoretical issue. E.g. in an environment where EUC-JP
 is used, filenames may begin with ESC$)B (designate JISX0208 to G1),
 or they may not (because G1 is assumed to contain JISX0208 initally).

I think such encodings are never used as default encodings of a Unix
locale.

 The various UTF encodings do not have this particular problem; if a UTF 
 string is valid, then it is a unique representation of a unicode string.

BOM is a problem. Unfortunately Unicode mandates that FEFF at the
start of a UTF-8 text stream is a mark which doesn't belong to the
text. It provides variants of UTF-16/32 with and without a BOM, but
UTF-8 only has the variant with a BOM. This makes UTF-8 a stateful
encoding.

Unix ignores this, it doesn't use BOM in UTF-8 except individual
applications for individual file formats. iconv() on Linux and
in libiconv don't process a BOM in UTF-8 (although in libiconv this
is because it's old, basing on and old RFC with 31-bit code points
which didn't include a BOM).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: File path programme

2005-01-30 Thread Marcin 'Qrczak' Kowalczyk
Aaron Denney [EMAIL PROTECTED] writes:

 Better yet would be to have the standard never allow the BOM.

If I could decide, I would ban the BOM in UTF-8 altogetger, but I'm
afraid the Unicode Consortium doesn't want to do this.

Miscosoft Notepad puts a BOM in UTF-8 encoded files.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: mathematical notation and functional programming

2005-01-29 Thread Marcin 'Qrczak' Kowalczyk
Stefan Monnier [EMAIL PROTECTED] writes:

 OTOH I like the abc shorthand because it's both obvious and
 unambiguous (as long as the return value of  can't be passed as an
 argument to , which is typically the case when the return value is
 boolean and there's no ordering defined on booleans).

It's unambiguous even if the return value of  can be passed as an
argument to . Operators are usually left-associative, right-associative
or non-associative. A non-associative operator can have an additional
semantics defined when it's used multiple times, just like a,b,c in
OCaml is neither a,(b,c) nor (a,b),c, or even a*b*c as a type.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Looking for these libraries...

2005-01-27 Thread Marcin 'Qrczak' Kowalczyk
John Goerzen [EMAIL PROTECTED] writes:

 I'm looking for libraries / interfaces to these systems from Haskell:

 LDAP
 ncurses
 zlib  (the one in darcs doesn't suit my needs)
 bz2lib

I once wrapped ncurses (incomplete), zlib and bz2lib.
http://sourceforge.net/projects/qforeign/

It's quite old and will not compile with recent compilers without
modifications. FFI libraries were being designed at that time, it
should be possible to replace QForeign with Foreign, modulo some
changes since that time.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File path programme

2005-01-27 Thread Marcin 'Qrczak' Kowalczyk
Robert Dockins [EMAIL PROTECTED] writes:

 More than you would think, if you follow the conventions of modern
 unix shells. eg, foo/.. is always equal to .,

For the OS it's not the same if foo is a non-local symlink.

Shells tend to resolve symlinks themselves on cd, and cd .. means
to remove the last component of the unexpanded current directory,
which may be different from the directory listed by ls ...

 (rather than doing a chdir on the .. hardlink, which does strange
 things in the presence of symlinks). The operation is sufficently
 useful that I think it should be included. It lets us know, for
 example, that /bar/../foo/tmp and /foo/tmp refer to the same
 file, without resorting to any IO operations.

I disagree. The point is they are *not* the same file.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File path programme

2005-01-27 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 too bad we can't do things like

 #if exists(module System.Path) 
 import System.Path
 #else
 ...
 #endif

 I still find it perplexing that there isn't a decent standard haskell
 preprocessor 

For my language Kogut I designed a syntax

   ifDefined condition {
  something
   }
   otherCondition {
  something else
   }
   ...

where a condition is an identifier whose definedness is tested, or
module ModuleName for existence of a module, or _ which is always
true, or their combination using conjunctions, alternatives and
negations.

The construct can be an expression (no condition is true = Null,
which makes sense with dynamic typing), a definition (no condition
is true = nothing is defined) or a pattern (no condition is true =
a pattern which never matches).

It does not subsume Common Lisp's #+ and #- nor vice versa. In Lisp
it is done at read time, not compile time, which has some advantages,
but is incompatible with temporal separation of compilation phases.

Even though definitions may in general be mutually recursive,
similarly as in Haskell, ifDefined depends only on definitions above
it. This is hard to avoid, because its expansion may influence the set
of definitions which will be present.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: what is inverse of mzero and return?

2005-01-23 Thread Marcin 'Qrczak' Kowalczyk
Ashley Yakeley [EMAIL PROTECTED] writes:

 But only some instances (such as []) satisfy this:

   (mplus a b) = c = mplus (a = c) (b = c)

 Other instances (IO, Maybe) satisfy this:

   mplus (return a) b = return a

 I think mplus should be separated into two functions.

This would prevent using mplus in a single parser which - depending on
the underlying monad used - backtracks or not.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: File path programme

2005-01-23 Thread Marcin 'Qrczak' Kowalczyk
Aaron Denney [EMAIL PROTECTED] writes:

 What about splitFileExt foo.bar.? (foo, bar.) or (foo.bar., )?

 The latter makes more sense to me, as an extension of the first case
 you give and splitting foo.tar.gz to (foo.tar, gz).

It's not that obvious: both choices are compatible with these.

The former is produced by rules:
- split the filename before the last dot *which is not the last
  character of the filename*, or at the end if there is no such dot
- remove the first character of the extension if it's non-empty
  (the character must have been a dot)

The latter is produced by rules:
- split the filename before the last dot, or at the end if there is
  no dot at all
- *if the extension is a sole dot, append a dot to the basename*
- remove the first character of the extension if it's non-empty
  (the character must have been a dot)

Special filenames of . and .. are treated separately, before these
rules apply.

Both choices are inverted by the same joinFileExt, which inserts a dot
between the name and extension unless the extension is empty.

These rules agree on foo, foo. and foo.tar.gz, yet disagree on
foo.bar.; I don't know which is more natural.

The difference influences the behavior of changeFileExt. These cases
are the same with both choices:
   changeFileExt foo.bar  = foo
   changeFileExt foo.tar.gz  = foo.tar
   changeFileExt foo.  = foo.
   changeFileExt foo. baz = foo..baz
but these differ - first choice:
   changeFileExt foo.bar.  = foo
   changeFileExt foo.bar. baz = foo.baz
or the second:
   changeFileExt foo.bar.  = foo.bar.
   changeFileExt foo.bar. baz = foo.bar..baz
?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Can't do basic time operations with System.Time

2005-01-21 Thread Marcin 'Qrczak' Kowalczyk
Peter Simons [EMAIL PROTECTED] writes:

 I was wondering: Does this calculation account for leap
 years? Does it have to?

C itself leaves unspecified the question whether its time calculations
take leap seconds into account. All other systems I know of ignore
leap seconds: POSIX C, Common Lisp, Java, .NET.

The NTP protocol also ignores leap seconds, i.e. its time is supposed
to slow down near the transition.

While supporting leap seconds is more correct, it makes interoperability
harder: Haskell would have a different view than the rest of the system
about either the current calendar time or seconds since the Epoch.

Another problem is that leap seconds are not known further in advance
than about half of a year, so programs compiled using different
versions of Haskell would have slightly different views about the
current time.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] I/O interface

2005-01-19 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 Yes, this is a problem. In my original proposal InputStream and
 OutputStream were types, but I enthusiastically embraced Simon M's
 idea of turning them into classes. As you say, it's not without its
 disadvantages.

This is my greatest single complaint about Haskell: that it doesn't
support embedding either OO-style abstract supertypes, or dynamnic
typing with the ability to use polymorphic operations on objects that
we don't know the exact type.

The Dynamic type doesn't count for the latter because you must guess
the concrete type before using the object. You can't say it should be
something implementing class Foo, I don't care what, and I only want
to use Foo's methods with it.

Haskell provides only:
- algebraic types (must specify all subtypes in one place),
- classes (requires foralls which limits applicability:
  no heterogeneous lists, I guess no implicit parameters),
- classes wrapped in existentials, or records of functions
  (these two approaches don't support controlled downcasting,
  i.e. if this is a regular file, do something, otherwise do
  something else).

The problem manifests itself more when we add more kinds of streams:
transparent compression/decompression, character recoding, newline
conversion, buffering, userspace /dev/null, concatenation of several
input streams, making a copy of data as it's passed, automatic
flushing of a related output stream when an input stream is read, etc.

A case similar to streams which would benefit from this is DB
interface. Should it use separate types for separate backends? Awkward
to write code which works with multiple backends. Should it use a
record of functions? Then we must decide at the beginning the complete
set of supported operations, and if one backend provides something
that another doesn't, it's impossible to write code which requires
the first backend and uses the capability (unless we decide at the
beginning about all possible extensions and make stubs which throw
exceptions in cases it's not supported). I would like to mix these
two approaches: if some code uses only operations supported by all
backends, then it's fully polymorphic, and when it starts using
specific operations, it becomes limited. Without two completely
different designs for these cases. I don't know how to fit it into
Haskell's type system. This has led me to exploring dynamic typing.

 Again, to try to avoid confusion, what you call a seekable file the
 library calls a file, and what you call a file I would call a
 Posix filehandle.

So the incompleteness problem can be rephrased: the interface doesn't
provide the functionality of open() with returns an arbitrary POSIX
filehandle.

 By the same token, stdin is never a file, but the data which appears
 through stdin may ultimately be coming from a file, and it's sometimes
 useful, in that case, to bypass stdin and access the file directly.
 The way to handle this is to have a separate stdinFile :: Maybe File.

And a third stdin, as POSIX filehandle, to be used e.g. for I/O
redirection for a process.

 As for openFile: in the context of a certain filesystem at a certain
 time, a certain pathname may refer to

   * Nothing
   * A directory
   * A file (in the library sense); this might include things like
 /dev/hda and /dev/kmem
   * Both ends of a (named) pipe
   * A data source and a data sink which are related in some
 qualitative way (for example, keyboard and screen, or stdin and stdout)
   * A data source only
   * A data sink only
   * ...

 How to provide an interface to this zoo?

In such cases I tend to just expose the OS interface, without trying
to be smart. This way I can be sure I don't make anything worse than
it already is.

Yes, it probably makes portability harder. Suitability of this
approach depends on our goals: either we want to provide a nice and
portable abstraction over the basic functionality of all systems,
or we want to make everything implementable in C also implementable
in Haskell, including a Unix shell.

Perhaps Haskell is in the first group. Maybe its goal is to invent
an ideal interface to the computer's world, even if this means doing
things differently than everyone else. It's hard to predict beforehand
how far in being different we can go without alienating users.

For my language I'm trying to do the second thing. I currently
concentrate on Unix because there are enough Windows-inspired
interfaces in .NET, while only Perl and Python seem to care about
providing a rich access to Unix API from a different language than C.

I try to separate interfaces which should be portable from interfaces
to Unix-specific things. Unfortunately I have never programmed for
Windows and I can make mistakes about which things are common to
various systems and which are not. Time will tell and will fix this.

Obviously I'm not copying the Unix interface literally. A file is
distinguished from an integer, and an integer is distinguished from a
Unix signal, 

Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-19 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 They're similar, but not identical. Traditionally, Unix non-blocking
 I/O (along with asynchronous I/O, select() and poll()) were designed
 for slow streams such as pipes, terminals, sockets etc. Regular
 files and block devices are assumed to return the data immediately.

Indeed. Reading from a slow block device is also not interruptible by
a signal; a signal usually causes reading from a pipe/socket/terminal
to fail with EINTR.

There is no non-blocking interface to various functions like readdir,
mkdir, stat etc.

OTOH close() is interruptible.

It seems that the only way to parallelize them is to use a separate
OS thread.

gethostbyname, gethostbyaddr, getservbyname and getservbyport are
mostly superseded by getaddrinfo and getnameinfo. They are all
blocking and non-interruptible by signals (they restart their loops
on receiving EINTR from low-level calls).

Glibc provides getaddrinfo_a which is non-blocking (implemented using
pthreads). Contrary to documentation it's not interruptible by a
signal (its implementation expects pthread_cond_wait to fail with
EINTR which is not possible) and it's not cancellable in a useful way
(the interface allows for cancellation, which may nevertheless answer
that it cannot be cancelled, and the glibc implementation is able to
cancel a request only if it hasn't yet started being processed by the
thread pool). There is no non-blocking counterpart of getnameinfo.

Since asynchronous name resolution is quite important, implementation
of my language uses pthreads and getaddrinfo / getnameinfo, if
pthreads are available. For simplicity I just make one thread per
request.

A tricky API to parallelize is waitpid. Pthreads are supposed to be
able to wait for child processes started by any thread, but according
to man pages this was broken in Linux before version 2.4. Fortunately
it's easy to avoid blocking other threads indefinitely without OS
threads if we agree to waste CPU time (not CPU cycles), such that a
thread waiting for a process takes as much time as if it was doing
some useful work. Because waitpid *is* interruptible by signals. So it
will either finish, or the timer signal will interrupt it and control
can be passed to other threads.

Leaving the timer signal interrupting syscalls can break libraries
which don't expect EINTR. For example the Python runtime doesn't
handle EINTR specially and it is translated to a Python exception.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-19 Thread Marcin 'Qrczak' Kowalczyk
Glynn Clements [EMAIL PROTECTED] writes:

 We do use a thread pool.  But you still need as many OS threads as there
 are blocked read() calls, unless you have a single thread doing select()
 as I described.

 How does the select() help? AFAIK, select() on a regular file or block
 device will always indicate that it is readable, even if a subsequent
 read() would have to read the data from disk.

It doesn't help if we don't want I/O requests to delay one another,
and not only avoiding delay of execution of pure Haskell code.

BTW, poll is generally preferred to select. The maximum fd supported
by select may be lower than the maximum fd supported by the system.
And the interface of poll allows the cost to be proportional to the
number of descriptors rather than to the highest descriptor.

The timeout is specified in microseconds for select and in milliseconds
for poll, but on Linux the actual resolution is the clock tick in both
cases anyway (usually 1ms or 10ms).

It's probably yet better to use epoll than poll. The difference is
that with epoll you register fds using separate calls, and you don't
have to provide them each time you wait (and the kernel doesn't have
to scan the array each time). So it scales better to a large number of
threads which perform I/O. It's available in Linux 2.6.

Caveat: before Linux 2.6.8 epoll had a memory leak in the kernel
because of a reference counting bug (0.5kB per epoll_create call,
which means 0.5kB of physical memory lost per starting a program
which waits for I/O using epoll).

poll is in Single Unix Spec, epoll is Linux-specific.

poll and epoll both take the timeout in the same format, but they
interpret it differently: poll sleeps at least the given time (unless
a fd is ready or a signal arrives), while epoll rounds it up to a
whole number of clock ticks and then sleeps between this time and one
tick shorter. I was told that this is intentional because it allows
to sleep until the next clock tick by specifying the timeout of 1ms
(a timeout of 0ms means to not sleep at all).

Accurate sleeping requires to measure the time by which poll/epoll can
make the timeout longer (it's 1 tick for epoll and 2 ticks for poll),
subtract this time from the timeout passed to them, add 1ms, and sleep
the remaining time by busy waiting calling gettimeofday interspersed
with poll/epoll with no timeout. gettimeofday() is accurate to
microseconds, it asks some clock chip instead of relying on the timer
interrupt only.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [Haskell] Re: Why is getArgs in the IO monad?

2005-01-18 Thread Marcin 'Qrczak' Kowalczyk
Keean Schupke [EMAIL PROTECTED] writes:

 Surely both requirements can be satisfied if the programs arguments are made
 parameters of main:

 main :: [String] - IO ()

From info '(libc)Error Messages', about program_invocation_name
and program_invocation_short_name:

   *Portability Note:* These two variables are GNU extensions.  If you
want your program to work with non-GNU libraries, you must save the
value of `argv[0]' in `main', and then strip off the directory names
yourself.  We added these extensions to make it possible to write
self-contained error-reporting subroutines that require no explicit
cooperation from `main'.

This suggests that making global parameters available only as
arguments of main would be a bad idea. But they should be settable,
to account for the rare case of wanting to substitute something else
to a library which reads them itself.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] I/O interface (was: Re: Hugs vs GHC)

2005-01-16 Thread Marcin 'Qrczak' Kowalczyk
John Meacham [EMAIL PROTECTED] writes:

 I was thinking of it as a better implementation of a stream interface
 (when available).

I'm not convinced that the stream interface
(http://www.haskell.org/~simonmar/io/System.IO.html) works at all,
i.e. whether it's complete, implementable and convenient.

Convenience. I'm worried that it uses separate types for various
kinds of streams: files, pipes, arrays (private memory), and sockets.
Haskell is statically typed and lacks subsumption. This means that
even though streams are unified by using a class, code which uses
a stream of an unknown kind must be either polymorphic or use
existential quantification.

Completeness. Unless File{Input,Output}Stream uses {read,write}()
rather than file{Read,Write}, openFile provides only a subset of
the functionality of open(): it works only with seekable files,
e.g. not with /dev/tty.

What is the type of stdin/stdout? They may be devices or pipes
(not seekable), regular files (seekable), sockets...

Note that even when they are regular files, emulating stream I/O
in terms of either pread/pwrite or mmap does not yield the correct
semantics of sharing the file pointer between processes. If we have
a shell script which runs Haskell programs which write to stdout,
it should be possible to redirect the output of the script as a whole.

 Exploiting the advantages of mapped files for stream I/O
  http://www.cs.toronto.edu/pub/reports/csrg/267/267.ps

The advantage of reducing copying between buffers is lost in Haskell:
file{Read,Write} use a buffer provided by the caller instead of giving
a buffer for the caller to examine or fill.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: FFI and C99 complex

2005-01-16 Thread Marcin 'Qrczak' Kowalczyk
Aaron Denney [EMAIL PROTECTED] writes:

 So, I have heard claims that C 99 specifies that a variable v
 of type complex t is stored as if declared as t v[2], with
 v[0] the real part and v[1] the imaginary part.

 I don't have a copy of the spec at hand.  Could someone who does verify
 this?

This is true (in C99 draft; I don't think this has changed in the
final standard).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-15 Thread Marcin 'Qrczak' Kowalczyk
Pete Chown [EMAIL PROTECTED] writes:

 of course, [mmap] can only be done on a limited type of file on some
 architectures, so it should be an optimization under the hood rather
 than an exposed interface.

 In particular, you have to be careful not to run out of address space on
 32-bit architectures.  If you try to do file access by mapping the whole
 file into virtual memory, you won't be able to handle files larger than 2G
 or so.  You also have to be careful not to map a number of file chunks,
 which in total exceed the address space available.

Which means that it *should* be explicit, rather than done under the
hood. It's a tradeoff, the programmer should be aware when to apply it,
the runtime should not impose a policy which would be hard to reverse.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Linear shuffle

2005-01-14 Thread Marcin 'Qrczak' Kowalczyk
Henning Thielemann [EMAIL PROTECTED] writes:

 I did some shuffling based on mergesort, that is a list is randomly split
 (unzipped) into two lists and the parts are concatenated afterwards. You
 must repeat this some times. It even works for infinite lists.

I think it doesn't guarantee equal probabilities of all permutations.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: I/O interface

2005-01-13 Thread Marcin 'Qrczak' Kowalczyk
Ketil Malde [EMAIL PROTECTED] writes:

 It seemed to me, though, that streams are related to channels,

I'm not sure what exactly do you mean by streams (because they are
only being designed), but differences are:

- A stream is either an input stream or an output stream, while a
  single channel supports reading from one end and writing to the
  other end.

- A stream passes around bytes, which are usually grouped in blocks
  for efficiency. A channel is polymorphic wrt. the element type and
  elements are always processed one by one.

- A stream may be backed by an OS file, pipe, socket etc., while
  a channel exists purely in Haskell.

- A channel is never closed. Reading more data than have been put
  blocks until someone puts more data. A stream can reach its end,
  which is a condition a reader can detect. A stream backed by a pipe
  is similar to a channel of bytes in that the reader blocks until
  someone puts more data, but it can be closed too, which causes the
  reader to observe end of file. A writer to a stream can block too
  when the internal buffer in the kernel is full.

- A stream can be inherited by child processes, and it generally
  continues to work by being linked to the same data sink or source as
  before. A channel is inherited as a whole: there is no communication
  between the two versions of the channel in the two processes.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: I/O interface

2005-01-12 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 First of all, I don't think any OS shares file pointers between
 processes.

Unix does.

It's because shared files are usually stdin/stdout/stderr (I mean
that they are visible as stdin/stdout/stderr, rather than about their
nature as terminals - they may be regular files), which are usually
accessed sequentially without seeking. Quite often they are not
seekable at all (terminals or pipes), which means that they behave as
if the file pointer was always positioned at the end (for writing) or
beginning (for reading) and shared. If they are seekable, the position
is shared so you can redirect I/O to a process running subprograms.

 Otherwise it would be practically impossible to safely use an
 inherited filehandle via any API.

Pipes are not seekable and always behave as if the position is shared.
It doesn't make them impossible to safely inherit.

They are inherited on fork because they are anonymous objects, so
it's the only way to connect them; after fork most programs close the
reading end in one of the processes and the writing end in the other.

It's rare that two processes read from the same pipe or write to the
same pipe. If they do, one of them is usually a helper program started
by the other, and the other waits for the helper to finish.

If you want two processes to access the same file indepenently, pass a
file name and open it twice.

 The file interface in this library is only used for files, which are
 always seekable (by definition).

What do you mean by files? What you get from open() is not always
seekable, because it's not always a regular file. The name may refer
to a device (whether it's seekable depends on the particular device;
block devices are seekable but most character devices are not, e.g.
/dev/ttyS0, /dev/lp0 - serial and parallel ports) or to a named pipe
(not seekable).

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: I/O interface

2005-01-12 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 is there *any* way to get, without an exploitable race condition,
 two filehandles to the same file which don't share a file pointer?

AFAIK it's not possible if the only thing you know is one of the
descriptors. Of course independent open() calls which refer to the
same file have separate file pointers (I mean the true filename,
not /proc/*/fd/*).

On Linux the current file position is stored in struct file in the
kernel. struct file includes void *private_data whose internals
depend on the nature of the file, in particular they can be reference
counted. Among polymorphic operations on files in struct file_operations
there is nothing which clones the struct file. This means that a
device driver would have no means to specify how private_data of
its files should be duplicated (e.g. by bumping the reference count).
If my understanding is correct, it implies that the kernel has no way
to clone an arbitrary struct file.

Just don't use the current position of seekable files if you don't
like it: use pread/pwrite.

 Is there any way to pass a filehandle as stdin to an untrusted/
 uncooperative child process in such a way that the child can't
 interfere with your attempts to (say) append to the same file?

You can set O_APPEND flag to force each write to happen at the end
of file. It doesn't prevent the process from clearing the flag.

If it's untrusted, how do you know that it won't truncate the file
or just write garbage to it where you would have written something?

If the file is seekable, you can use pread/pwrite. If it's not
seekable, the concept of concurrent but non-interfering reads or
writes is meaningless.

 I think we just need more kinds of streams. With regard to file-backed
 streams, there are three cases:

   1. We open a file and use it in-process.
   2. We open a file and share it with child processes.
   3. We get a handle at process startup which happens to be a file.

I disagree. IMHO the only distinction is whether we want to perform
I/O at the current position (shared between processes) or explicitly
specified position (possible only in case of seekable files). Neither
can be emulated in terms of the other.

 In case 2 we could avoid OS problems by creating a pipe and managing
 our end in-process.

It's not transparent: it translates only read and write, but not
sendto/recvfrom, setsockopt, ioctl, lseek etc., and obviously it will
stop working when our process finishes but the other does not.

A pipe can be created when the program really wants this, but it should
not be created autimatically whenever we redirect stdin/stdout/stderr
of another program to a file we have opened.

 Case 3 is the most interesting. In an ideal world I would argue for
 treating stdin/out/err simply as streams, but that's not practical.
 Failing that, if we have pread and pwrite, we should provide two
 versions of stdin/out/err, one of type InputStream/OutputStream and
 the other of type Maybe File. We can safely layer other streams on top
 of these files (if they exist) without interfering with the stream
 operation.

I'm not sure what do you mean. Haskell should not use pread/pwrite for
functions like putStr, even if stdout is seekable. The current file
position *should* be shared between processes by default, otherwise
redirection of stdout to a file will break if the program delegates
some work with corresponding output to other programs it runs.

 Indeed, file positions are exactly as evil as indices into shared
 memory arrays, which is to say not evil at all. But suppose each
 shared memory array came with a shared current index, and there
 was no way to create additional ones.

Bad analogy: if you open() the file independently, the position is not
shared. The position is not tied to a file with its shared contents
but to the given *open* file structure.

And there is pread/pwrite (on some OSes at least). It's not suitable
as the basic API of all reads and writes though.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hugs vs GHC (again) was: Re: Some randomnewbiequestions

2005-01-11 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 http://www.haskell.org/~simonmar/io/System.IO.html

fileRead :: File - FileOffset - Integer - Buffer - IO ()

This is unimplementable safely if the descriptor is read concurrently
by different processes. The current position is shared.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Character predicates (was: Re: [Haskell-cafe] Hugs vs GHC (again))

2005-01-11 Thread Marcin 'Qrczak' Kowalczyk
Dimitry Golubovsky [EMAIL PROTECTED] writes:

|Sebastien's| Marcin's | Hugs
 ---+---+--+--
  alnum | L* N* | L* N*| L*, M*, N* 1
  alpha | L*| L*   | L* 1
  cntrl | Cc| Cc Zl Zp | Cc
  digit | N*| Nd   | '0'..'9'
  lower | Ll| Ll   | Ll 1
  punct | P*| P*   | P*
  upper | Lu| Lt Lu| Lu Lt 1
  blank | Z* \t\n\r | Z*(except| ' ' \t\n\r\f\v U+00A0
  U+00A0
  U+2007
  U+202F)
  \t\n\v\f\r U+0085

 1: for characters outside Latin1 range. For Latin1 characters
 (0 to 255), there is a lookup table defined as
 unsigned char   charTable[NUM_LAT1_CHARS];

If the table coincides with Unicode character category, then it's just
an implementation detail.

I changed
   c  ' ' || c = '\DEL'  c = '\x9f'
to Cc for Hugs because it's the same.

 So there might be a bunch of (perhaps autogenerated, from localedef
 files) modules for each locale/encoding, like ISO8859_1 or KOI_8.

I disagree. Char is supposed to mean Unicode only, and data is
converted to Unicode on boundaries with those parts of the world which
use different encodings.

With Unicode in mind it still makes sense to talk about digits as
'0'..'9' only; most programming languages specify numeric literals as
constisting of these digits only. Other contexts may require a wider
set, including today's Arabic digits etc. This is not because of the
encoding but because of the intended set of characters.

One reason why the predicates are not obvious is that when the
features encodable as text become more sophisticated, old algorithms
for handling text become limited. For example if an identifier is
specified as a letter followed by a sequence of letters or numbers,
then combining marks are not allowed in identifiers, even though the
corresponding precomposed characters are allowed. I guess this is why
Hugs includes M* in isAlphaNum. This is a lie which allows old code
to work better. These characters are not alphanumeric; it's the
definition of identifiers which is no longer appropriate. (Unicode
recommends a particular definition of identifiers in programming
languages which want to permit with non-ASCII identifiers; it has
various exceptions because it's intended to be somehow compatible
with older versions of itself.)

Another case when old interfaces are not sufficient is toUpper 
toLower. They should be defined on strings, not characters. Besides
'' there are other characters which uppercase or lowercase to several
code points: ligatures, precomposed characters which lack precomposed
variants in the other case but can be decomposed, Greek iota below
which is specified to uppercase to a separate iota after the letter
(some people believe this is wrong but it's how it's currently
specified in Unicode) and some cases with accents over I and i.
Case mapping is also context-dependent for sigma.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


I/O interface (was: Re: [Haskell-cafe] Re: Hugs vs GHC (again))

2005-01-11 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 fileRead can be implemented in terms of OS primitives,

Only if they already support reading from a fixed offset (like pread).
I'm not sure if we can rely on something like this being always
available, or whether it should be emulated using lseek which is safe
only as long as we are the only process using the given open file.

pread requires that the file is seekable, which means that it can't
be used for all file handles: not for pipes, sockets, terminals nor
various other devices.

 and it's easy enough to implement a thread-safe seek/read interface
 on top of it.

Not if it must cooperate with other processes, and you *do* want to
set a file position before running another program with redirected
standard I/O. In this case it's not enough that you set a private
Haskell variable holding its logical file position - you must perform
the lseek syscall.

Doing something differently than everybody else has a risk of limited
interoperability, even if the new way is better, and thus must be
carefully evaluated to check whether all lost functionality is
unimportant enough to lose.

BTW, on Unix sockets and files are the same, but probably not on
Windows. I don't know details about WinAPI. I know that basic file I/O
uses HANDLEs, winsock uses ints which emulate Unix descriptors; what I
don't know is whether you can also use HANDLEs for sockets (perhaps
each winsock fd has an associated HANDLE with easy translation in both
directions? or is there another API for sockets on HANDLEs?) and how
do you perform redirection of standard I/O, in terms of HANDLEs or
what - in particular I don't know whether you can redirect standard
I/O to a socket. How should Haskell view this? I mean that on Unix it
should somehow make files and sockets interchangeable, in order to
support I/O redirection for programs being run; but it's not easy if
you use completely different interfaces for files and sockets, as the
streams proposal seems to do.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-11 Thread Marcin 'Qrczak' Kowalczyk
Aaron Denney [EMAIL PROTECTED] writes:

 Does open(/dev/fd/n) or (/proc/self/fd/n) act as dup() or a fresh
 open() to underlying file?)

As a dup(), with a side effect of resetting the file pointer to the
beginning.

It would not help anyway: if it's a terminal or pipe, it *has* to act
as a dup() (in this case the file pointer is obviously not reset),
it's not seekable, and thus pread/pwrite fail.

wc -c tries to measure the size in bytes by seeking, and proceeds
to actually read the contents only if the file is not seekable (or if
it's also told to count words or lines, obviously). This applies to
stdin too. In this case it must remember to subtract the starting
position from the size (and to bump the result to 0 if it would be
negative), in order to give consistent results whether it manages to
skip reading the contents or not.

 I actually don't see the problem with interacting with other
 processes that we've forked.

File positions are not evil. They allow to treat files and devices
in a uniform way.

If you run a program which
- writes some opening words to stdout
- forks a subprocess, which executes another program, which writes
  something to stdout
- the parent waits for the subprocess to finish
- writes closing words to stdout
then all output will appear in order: the opening words, then the part
from the other program, and finally the closing words. This applies
*also* when stdout has been redirected to a file. In this case it's
essential that the file position is shared between all processes!
So even if stdout is a regular file, output should be done using
write(), not pwrite().

If processes don't synchronize their writes and write at the same
time, then output will be intermixed. They can choose to synchronize
themselves; in particular one process can avoid writing anything while
the other process is running. So it's not true that allowing them to
share a file position will necessarily be unsafe and will risk mangled
data. Unix doesn't guarantee that programs will behave well, but it
allows well-behaving programs to cooperate.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions

2005-01-10 Thread Marcin 'Qrczak' Kowalczyk
Jrmy Bobbio [EMAIL PROTECTED] writes:

 Once this is agreed, it would be easy to make scripts which generate
 C code from UnicodeData.txt tables from Unicode. I think table-driven
 predicates and toUpper/toLower should better be implemented in C;
 Haskell is not good at static constant tables with numbers.

 Sebastien Carlier already wrote this for hOp, see :
 http://etudiants.insia.org/~jbobbio/hOp/Gen_wctype.hs

And I've done a similar thing for my language Kogut some time ago:
http://cvs.sourceforge.net/viewcvs.py/kokogut/kokogut/runtime/make-char-tables.in?view=markup
http://cvs.sourceforge.net/viewcvs.py/kokogut/kokogut/lib/Core/Kokogut/Characters.ko?view=markup

Let's see how these separately developed interpretations of predicates differ
(mine also have different names and there are a few more):

|Sebastien's| mine
 ---+---+--
  alnum | L* N* | L* N*
  alpha | L*| L*
  cntrl | Cc| Cc Zl Zp
  digit | N*| Nd
  lower | Ll| Ll
  punct | P*| P*
  upper | Lu| Lt Lu
  blank | Z* \t\n\r | Z*(except U+00A0 U+2007 U+202F) \t\n\v\f\r U+0085

Note that the interpretation of digit differs from both C and
Haskell 98 which specify it to be ASCII-only. Actually I have
ASCII-only variants of IsDigit parametrized by the number base.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

  - Do the character class functions (isUpper, isAlpha etc.) work
correctly on the full range of Unicode characters?  This is true in
Hugs.  It's true with GHC on some systems (basically we were lazy
and used the underlying C library's support here, which is patchy).

It's not obvious what the predicates should really mean, e.g. should
isDigit and isHexDigit include non-ASCII digits or should isSpace
include non-breaking space characters. Haskell 98 report gives some
guidelines which don't necessarily coincide with the C practice nor
with expectations from Unicode people.

Once this is agreed, it would be easy to make scripts which generate
C code from UnicodeData.txt tables from Unicode. I think table-driven
predicates and toUpper/toLower should better be implemented in C;
Haskell is not good at static constant tables with numbers.

Another issue is that the set of predicates provided by Haskell 98
library report is not enough to implement e.g. a Haskell 98 lexer,
which needs any Unicode symbol or punctuation. Case mapping would
be better done string - string rather than character - character;
this breaks a long established Haskell interface. Case mapping is
locale-sensitive (in very minor ways). Haskell doesn't provide
algorithms like normalization or collation. In general the Haskell 98
interface is not enough for complex Unicode processing.

  - Can you do String I/O in some encoding of Unicode?  No Haskell
compiler has support for this yet, and there are design decisions
to be made.

The problem with designing an API of recoders is that depending on
whether the recoder is implemented in Haskell or interfaced from C, it
needs different data representation. Pure Haskell recoders prefer lazy
lists of characters or bytes (except that a desire to detect source
errors or characters unavailable in the target encoding breaks this),
while high performance C prefers pointers to buffers with chunks of
text.

Transparent recoding makes some behavior hard to express. Imagine
parsing HTTP headers followed by \r\n\r\n and a binary file. If you
read headers line by line and decoding is performed in blocks, then
once you determine where the headers end it's too late to find the
start of the binary file: a part of it has already been decoded into
text. You have to determine the end of the headers while working with
bytes, not characters, and only convert the first part. Not performing
the recoding in blocks is tricky if the decoder is implemented in C.
Giving 1-byte buffers for lots of iconv() calls is not nice.

Or imagine parsing a HTML file with the encoding specified inside
it in a meta element. Switching the encoding in the middle is
incompatible with buffering. Maybe the best option is to parse the
beginning in ISO-8859-1 just to determine the encoding, and then
reparse everything again once the encoding is known.

If characters are recoded automatically on I/O, one is tempted to
extend the framework for other conversions like compression, line
ending convention, HTML character escaping etc.

  - What about Unicode FilePaths?  This was discussed a few months ago
on the haskell(-cafe) list, no support yet in any compiler.

Nobody knows what the semantics should be.

I've once written elsewhere a short report about handling filename
encodings in various languages and environments which use Unicode as
their string representation. Here it is (I've been later corrected
that Unicode non-characters are valid in UTF-x):

I describe here languages which exclusively use Unicode strings.
Some languages have both byte strings and Unicode strings (e.g. Python)
and then byte strings are generally used for strings exchanged with
the OS, the programmer is responsible for the conversion if he wishes
to use Unicode.

I consider situations when the encoding is implicit. For I/O of file
contents it's always possible to set the encoding explicitly somehow.

Corrections are welcome. This is mostly based on experimentation.


Java (Sun)
--

Strings are UTF-16.

Filenames are assumed to be in the locale encoding.

a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD.

b) Creating. Characters which cannot be converted are replaced by ?.

Command line arguments and standard I/O are treated in the same way.


Java (GNU)
--

Strings are UTF-16.

Filenames are assumed to be in Java-modified UTF-8.

a) Interpreting. If a filename cannot be converted, a directory listing
   contains a null instead of a string object.

b) Creating. All Java characters are representable in Java-modified UTF-8.
   Obviously not all potential filenames can be represented.

Command line arguments are interpreted according to the locale.
Bytes which cannot be converted are skipped.

Standard I/O works in ISO-8859-1 by default. Obviously all input is
accepted. On output characters above U+00FF are replaced by ?.


C# (mono)
-

Strings are UTF-16.

Filenames use 

Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Jorge Adriano Aires [EMAIL PROTECTED] writes:

 Naive use of foldl. I tend to think the default foldl should be
 strict (ie. replaced by foldl') -- are there important cases where it
 needs to be lazy?

 Hi, 
 One simple example would be,
 reverse = foldl (flip (:)) []

No, it would work with strict foldl too. In fact in the absence
of optimization it would work better (uses less time and space).
The optimization required is inlining and strictness analysis.

A function which requires lazy foldl for correctness would have
to be sometimes lazy in its first argument, and at the same time
some partial results would have to be undefined. By function
I mean the first argument of foldl, treated as a binary function.

Here this doesn't apply because flip (:) x y is always defined. And
another common case for foldl, sum, doesn't apply because (+) is
usually strict on both arguments (although in principle it does not
have to be true because of overloading, which implies that a compiler
can only optimize particular specializations of sum, not generic sum).

I don't know of any real-life example.

Perhaps there are cases where evaluating partial results is correct
but inefficient. I don't know such case either.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Jorge Adriano Aires [EMAIL PROTECTED] writes:

 No, it would work with strict foldl too. In fact in the absence
 of optimization it would work better (uses less time and space).
 The optimization required is inlining and strictness analysis.

 Is this also true if your just going to use the first few elements after 
 reversing it?

Yes. A strict fold would evaluate cons cells of the result while they
are constructed, not list elements. They are all defined (flip (:) x y
is always defined), so a strict foldl is correct.

Making a cons cell should be not more expensive than making a thunk
which will make a cons cell when evaluated. Well, unless the
implementation doesn't inline flip and thus making these thunks
is actually faster than running them. In this case a lazy foldl is
more efficient than a strict foldl, as long as a sufficiently small
part of the result is used; it's always less efficient if the whole
result is examined.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Problem with fundeps.

2005-01-02 Thread Marcin 'Qrczak' Kowalczyk
[EMAIL PROTECTED] writes:

 This is what I want. For a given set of vectors, the associated
 scalars are unique, otherwise I would have problems with norm.

In the instance Vspace a a the compiler doesn't know that a is
supposed to be a scalar only. It matches vector types (functions) too.

And adding a context won't help. An instance of the form
   instance Ctx a = Cls (T a)
means T a can always be used as an instance of Cls, and such usage
will yield a further requirement of Ctx a which must be fulfilled
rather than T a can be used as an instance of Cls as long as Ctx a
holds. In particular it will overlap with any other instance whose
head can be unified with Cls (T a). Instance overlapping doesn't take
instance contexts into account, only instance heads.

The problem can be solved by enumerating concrete scalar types
instead of using a generic instance Vspace a a. I'm afraid Haskell
classes are not expressive enough for a generic instance in this case.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] safe code to run in finalizers: ACIO revisited

2004-12-18 Thread Marcin 'Qrczak' Kowalczyk
Robert Dockins [EMAIL PROTECTED] writes:

 So, to be safe, the action of a finalizer must commute with every
 other finalizer (they must be central).

What does should mean? There are useful finalizers which don't have
this property. E.g. a finalizer can remove an entry from a weak
dictionary, call a C function which will free some foreign object,
or send a message over a network that a particular object is no longer
needed. Even if they have these properties in an abstract sense, they
are not true technically.

 So a separate ACIO monad for affine central IO actions would be the
 appropriate context for finalizers.

It would not be enough without unsafeIOtoACIO.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] The difference between ($) and application

2004-12-14 Thread Marcin 'Qrczak' Kowalczyk
Andres Loeh [EMAIL PROTECTED] writes:

 The function ($) is the identity function, restricted to functions.

Almost. With the standard definition of
   f $ x = f x
it happens that
   ($) undefined `seq` () = ()
   id  undefined `seq` () = undefined

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell] Force evaluation

2004-12-06 Thread Marcin 'Qrczak' Kowalczyk
Ben Rudiak-Gould [EMAIL PROTECTED] writes:

 And here's a slight variation in which force has the type a - a,
 eliminating the need for the helper function eval. I'm not sure which
 version is better.

The version with () does less redundant forcing, although the compiler
could perhaps optimize them statically.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell-cafe] Mutable data design question

2004-12-03 Thread Marcin 'Qrczak' Kowalczyk
GoldPython [EMAIL PROTECTED] writes:

 In the case of writing something like a text editor where the data
 involved is by its very nature mutable, what sort of design paradigm
 would you use in a functional language?

This is specific to text editors:

1. Use a traditional mutable data structure, e.g. IOArray of IOUArrays
   (lines), each coupled with first filled and last filled indices
   and resized when needed. Implement undo by the list of changes.

   Easy conceptually. Has good performance in common cases and poor
   performance for unusual data (e.g. very long lines). Easy to make
   bugs in undo handling.

2. Use a persistent data structure with logarithmic cost of most
   operations: a balanced tree of text fragments, called a rope
   (Hans Boehm has made one for C). Undo can be made by simply
   keeping old versions.

   Hard to implement the core data structure. If done right, the rest
   is easy, in particular undo handling is very robust. There are
   some overheads for all operations, but the cost of operations
   scales to extreme cases.

The second way is more interesting, but I don't know how to implement
a rope in details.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Mutable data design question

2004-12-03 Thread Marcin 'Qrczak' Kowalczyk
Keean Schupke [EMAIL PROTECTED] writes:

 What happens with this method when the display needs refreshing, does
 the current state have to be recomputed every time ...

No. The new state is constructed from bits of old state and the
changed data. Applying a change on average requires logarithmic time
 memory wrt. the whole size. It does not invalidate the old state -
state is immutable.

http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: not possible with monad transformers ?

2004-12-01 Thread Marcin 'Qrczak' Kowalczyk
Jules Bean [EMAIL PROTECTED] writes:

 When writing a compiler, it makes sense to collect errors as by the
 writer monad, and not abort anything - producing dummy values instead
 (except perhaps some fatal errors when it's inconvenient).

 Or you could use the monad:

 data Perhaps a = Success a | Failure a [Error]

 This is just a special case of the writer monad, I think.

It's the same as the writer monad, with redundant encoding of the case
when there are no errors.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ACIO versus Execution Contexts

2004-11-30 Thread Marcin 'Qrczak' Kowalczyk
Keean Schupke [EMAIL PROTECTED] writes:

 I disagee - Allowing unique state is a mistake in my opinion.
 I want Haskell to be the operating system - how can I do
 this if I cannot create new process contexts.

Why does it matter if you can't compile code for new programs at
runtime, to become a part of the same process?

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe


  1   2   3   4   5   6   7   >