RE: Re[2]: new i/o library

2006-02-03 Thread Simon Marlow
On 03 February 2006 08:34, Bulat Ziganshin wrote:

 moreover - we can implement locking as special converter type,
 that can be applied to any mutable object - stream, collection,
 counter. that allows to simplify implementations and add locking
 only to those Streams where we really need it. like these:
 
 h - openFD test
= addUsingOfSelect
= addBuffering 65536
= addCharEncoding utf8
= attachUserData dictionary
= addLocking
 
 This is really nice - exactly what I'd like to see in the I/O
 library. The trick is making it perform well, though...  but I'm
 sure that's your main focus too.
 
 basically idea is very simple - every stream implements the Stream
 interface, what is clone of Handle interface. Stream transformers is
 just a types what has Stream parameters and in turn implement the same
 interface. all Stream operations are translated to calls of inner
 Stream. typical example:
 
 data WithLocking h = WithLocking h (MVar ())

There's a choice here; I did it with existentials:

   data ThreadSafeStream = forall h . Stream h = TSStream h !(MVar ())
   instance Stream ThreadSafeStream where ...

What are the tradeoffs?  Well, existentials aren't standard for one
thing, but what about performance?  Every stream operation on the outer
stream translates to a dynamic call through the dictionary stored in the
stream.  Lots of layers means lots of dynamic calls, which probably
won't be efficient.

What about compared to your version:

 instance (Stream IO h) = Stream IO (WithLocking h) where

so a stream might have a type like this:

  WithLocking (WithCharEncoding (WithBuffer FileStream))

and calling any overloaded stream operation will have to build a *new*
dictionary as deep as the layering.  GHC might be able to share these
dictionaries across multiple calls within a function, but I bet you'll
end up building dictionaries a lot.  Compare this with the existential
version, which builds the dictionary once per stream.

On the other hand, you can also use {-# SPECIALISE #-} with your version
to optimise common combinations of layers.  I don't know if there's a
way to get specialisation with the existential version, it seems like a
high priority though, at least for the layers up to the buffering layer.

Also, your version is abstracted over the monad, which is another layer
to optimise away (good luck :-).

 Still, I'm not sure that putting both input and output streams in the
 same type is the right thing, I seem to remember a lot of things
 being simpler with them separate.
 
 i'm interested to hear that things was simpler? in my design it seems
 vice versa - i will need to write far more code to separate read,
 write and read-write FDs, for example. may be, the difference is
 because i have one large Stream class that implements all the
 functions while your design had a lot of classes with a few functions
 in each

Not dealing with the read/write case makes things a lot easier.
Read/write files are very rare, I don't think there's any problem with
requiring the file to be opened twice in this case.  Read/write sockets
are very common, of course, but they are exactly the same as separate
read  write streams because they don't share any state (no file
pointer).

Having separate input/output streams means you have to do less checking,
so perforamnce will be better, and there are fewer error cases.  Each
class has fewer methods, again better for performance.  The types are
more informative, and hence more useful.  Also, you can do cool stuff
like:

-- | Takes an output stream, and returns an input stream that will yield
-- all the data that is written to the output stream.
streamOutputToInput :: (OutputStream s) = s - IO StreamInputStream

-- | Takes an output stream and an input stream, and pipes all the
-- data from the former into the latter.
streamConnect :: (OutputStream o, InputStream i) = o - i - IO ()

Sure you can do these with one Stream class, but the types aren't nearly
as nice.

Oh, one more thing: if you have a way to turn a ForeignPtr into a
Stream, then this can be used both for mmap'd files and for turning
(say) a PackedString into a Stream.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: new i/o library

2006-02-03 Thread Simon Marlow

Simon Marlow wrote:


-- | Takes an output stream and an input stream, and pipes all the
-- data from the former into the latter.
streamConnect :: (OutputStream o, InputStream i) = o - i - IO ()


That's the wrong way around, of course :-)  It pipes everything from the 
input stream to the output stream.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: new i/o library

2006-01-28 Thread Bulat Ziganshin
Hello Simon,

Friday, January 27, 2006, 7:25:44 PM, you wrote:

 i'm now write some sort of new i/o library. one area where i currently
 lacks in comparision to the existing Handles implementation in GHC, is
 the asynchronous i/o operations. can you please briefly describe how
 this is done in GHC and partially - why the multiple buffers are used?

SM Multiple buffers were introduced to cope with the semantics we wanted 
SM for hPutStr.

thank you. i was read hPutStr comments, but don't understood that this
problem is the only cause of introducing multiple buffers

SM The problem is that you don't want hPutStr to hold a lock 
SM on the Handle while it evaluates its argument list, because that could 
SM take arbitrary time.  Furthermore, things like this:

SMputStr (trace foo bar)

SM used to cause deadlocks, because putStr holds the lock, evaluates its 
SM argument list, which causes trace to also attempt to acquire the lock on 
SM stdout, leading to deadlock.

SM So, putStr first grabs a buffer from the Handle, then unlocks the Handle 
SM while it fills up the buffer, then it takes the lock again to write the 
SM buffer.  Since another thread might try to putStr while the lock is 
SM released, we need multiple buffers.

i don't understand the last sentence. you are said about problems with
performing I/O inside computation of putStr argument, not about
another thread?

i understand that locks basically needed because multiple threads can
try to do i/o with the same Handle simultaneously

SM For async IO on Unix, we use non-blocking read() calls, and if read() 
SM indicates that we need to block, we send a request to the IO Manager 
SM thread (see GHC.Conc) which calls select() on behalf of all the threads 
SM waiting for I/O.  For async IO on Windows, we either use the threaded 
SM RTS's blocking foreign call mechanism to invoke read(), or the 
SM non-threaded RTS has a similar mechanism internally.

so, async I/O in GHC is have nothing common with zero-wait
operation in single-threaded environment and can only help to overlap
i/o in one thread with execution of other threads?

SM We ought to be using the various alternatives to select(), but we 
SM haven't got around to that yet.

yes, i read these threads and even remember Trac ticket about this.
btw, in the typeclasses-based i/o library this facility can be added
as additional middle layer, in the same way as buffering and Char
encoding. i even think that it can be done as 3-party library, w/o any
changes to the main library itself

 moreover, i have an idea how to implement async i/o without complex
 burecreacy: use mmapped files, may be together with miltiple buffers.

SM I don't think we should restrict the implementation to mmap'd files, for 
SM all the reasons that Einar gave.  Lots of things aren't mmapable, mainly.

i'm interested because mmap can be used to speed up i/o-bound
programs. but it seems that m/m files can't be used to overlap i/o in
multi-threaded applications. anyway, i use class-based design so at
least we can provide m/m files as one of Stream instances

SM My vision for an I/O library is this:

SM- a single class supporting binary input (resp. output) that is
SM  implemented by various transports: files, sockets, mmap'd files,
SM  memory and arrays.  Windowed mmap is an option here too.

i don't consider fully-mapped files as an separate instance, because
they can be simulated by using window-mapped files with large window

SM- layers of binary filters on top of this: you could add buffering,
SM  and compression/decompression.

SM- a layer of text translation at the top.

SM This is more or less how the Stream-based I/O library that I was working 
SM on is structured.

SM The binary I/O library would talk to a binary transport, perhaps with a 
SM layer of buffering, whereas text-based applications talk to the text layer.

that's more or less close to what i do. it is no wonder - i was
substantially influenced by the design of your new i/o library. the
only difference is that i use one Stream class for any streams


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[4]: new i/o library

2006-01-28 Thread Bulat Ziganshin
Hello Duncan,

Saturday, January 28, 2006, 3:08:04 PM, you wrote:

 yes, i want to save exactly this bit of performance - after i
 optimized all other expenses on the path of text i/o

DC There is a trade off, using mmap gives you zero-copy access to the page
DC cache however there is a not-insignificant performance overhead in
DC setting up and tearing down memory mappings. This is true on unix and
DC win32. So for small writes (eg 4k blocks) it is likely to be cheaper to
DC just use read()/write() on page aligned buffers rather than use mmap.

DC You would need to do benchmarks on each platform to see which method is
DC quicker. Given the code complexity that other people have mentioned I do
DC not think it would be worth it.

i use 64k buffers and tried mmapped files last night. it's not easy to
properly implement this and then to ensure good speed. at least
windows very lazily flushes the buffers that was filled using mmap.
when i wrote 1 gb file in this mode, windows tried to swap out all
programs and itself but delayed writing of already unmapped data!

DC Using page aligned and sized buffers can help read()/write() performance
DC on some OSes like some of the BSDs.

i will try to cutout aligned 64k buffer inside 128k block and will
publish this code here so anyone can test it on his OS

 in other words, i interested in having zero-wait operation both for
 reading and writing,

DC As I said that is not possible with either read() or mmaped read.
DC Conversely it works automatically with write() and mmaped writes.

DC Zero-copy and zero-wait are not the same thing.

i mean that mmap guarantee us zero-copy operation and i wish to use
mmap in such way that zero-wait operation can be ensured

DC An important factor for optimising IO performance is using sufficiently
DC large block sizes to avoid making frequent kernel calls. That includes
DC read()/write() calls and mmap()/unmap() calls.

that's true and easy to implement

DC Perhaps it is possible to move the complexity needed for the lazy
DC hPutStr case into the hPutStr implementation rather than the Handle
DC implementation. For example perhaps it'd be possible for the Handle to
DC just have one buffer but to have a method for writing out an external
DC buffer that is passed to it. Then hPutStr would allocate it's own
DC buffer, evaluate the string, copying it into the buffer. Then it would
DC call on the Handle to write out the buffer. The Handle would flush its
DC existing internal buffer and write out the extra buffer.

1) lazy hPutStr is not some rare case. we can't distinguish strict
and lazy strings with current GHC and in any hPutStr invocation we
should assume that evaluation of its argument can lead to side
effects. that is the whole problem - we want to optimize hPutStr for
the fast work with strict strings, but need to ensure that it will
work correctly even with slow lazy strings having any side effects

2) the scheme above can be implemented using hPutBuf to write this
additional buffer. it's just less efficient (although is not so much -
memcpy works 10 times faster than traversing of [Char])

on the other side, Simon don't counted that locking itself is rather
slow and using two locks instead of one lead to some slowness of his
scheme, especially on small strings

DC Perhaps a better solution for your single-threaded operation case is to
DC have a handle type that is a bit specialised and does not have to deal
DC with the general case. If we're going to get a I/O system that supports
DC various layers and implementations then perhaps you could have an one
DC that implements only the minimal possible I/O class. That could not use
DC any thread locks (ie it'd not work predictably for multiple Haskell
DC threads)

moreover - we can implement locking as special converter type, that
can be applied to any mutable object - stream, collection, counter.
that allows to simplify implementations and add locking only to those
Streams where we really need it. like these:

h - openFD test
   = addUsingOfSelect
   = addBuffering 65536
   = addCharEncoding utf8
   = attachUserData dictionary
   = addLocking

DC and use mmap on the entire file. So you wouldn't get the normal
DC feature that a file extends at the end as it's written to, it'd need a
DC method for determining the size at the beginning or extending it in
DC large chunks. On the other hand it would not need to manage any buffers
DC since reads and writes would just be reads to/from memory.

yes, i done it. but simple MapViewOfFile/UnMapViewOfFile don't work
well enough, at least on writing. windows don't hurry to flush these
buffers, even after unmap, and using of flushViewOfFile results in
synchronous flushing of buffer to the cache. so i need to try doing
flushViewOfFile in separate thread, like the GHC does for its i/o

DC So it'd depend on what the API of the low level layers of the new I/O
DC system are like as to whether such a simple and limited implementation
DC 

new i/o library

2006-01-27 Thread Bulat Ziganshin
Hello Simon

i'm now write some sort of new i/o library. one area where i currently
lacks in comparision to the existing Handles implementation in GHC, is
the asynchronous i/o operations. can you please briefly describe how
this is done in GHC and partially - why the multiple buffers are used?

i'm now use just one buffer, which can contain read or write data, but
not both - this buffer is just flushed before switching mode of
operations. am i lose something due to this simplified algorithm?

moreover, i have an idea how to implement async i/o without complex
burecreacy: use mmapped files, may be together with miltiple buffers.
for example, we can allocate four 16kb buffers. when one buffer is
filled with written data, the program unmaps it and switches to use
the next buffer. i don't tested it, but OS can guess that unmapped
buffer now should be asynchronously written to disk. the same for
reading - when we completely read one buffer, we can unmap it, switch
to the second buffer and map the third so that the OS can
asynchronously fill the third buffer while we are reading second.
should this work, at least on the main desktop OSes?

at least, mmap/VirtualAlloc available afaik on the all ghc-supported
platforms, so this should work anywhere. of course, this scheme omits
async i/o on sockets in Windows

  

-- 
Best regards,
 Bulat  mailto:[EMAIL PROTECTED]



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: new i/o library

2006-01-27 Thread Einar Karttunen
On 27.01 13:10, Bulat Ziganshin wrote:
 i'm now write some sort of new i/o library. one area where i currently
 lacks in comparision to the existing Handles implementation in GHC, is
 the asynchronous i/o operations. can you please briefly describe how
 this is done in GHC and partially - why the multiple buffers are used?

One simple optimization is that you can omit all buffering with
unbuffered operation. Then simply add the buffer (which is ok
because Handles are mutable) if the user ever calls hLookAhead.

 moreover, i have an idea how to implement async i/o without complex
 burecreacy: use mmapped files, may be together with miltiple buffers.
 for example, we can allocate four 16kb buffers. when one buffer is
 filled with written data, the program unmaps it and switches to use
 the next buffer. i don't tested it, but OS can guess that unmapped
 buffer now should be asynchronously written to disk. the same for
 reading - when we completely read one buffer, we can unmap it, switch
 to the second buffer and map the third so that the OS can
 asynchronously fill the third buffer while we are reading second.
 should this work, at least on the main desktop OSes?

Please no. There are multiple reasons to avoid mmapped files.
1) They make very few performance guarantees for reading
   (i.e. a Haskell thread touches memory which has not yet
been read from the file causing IO and all the other
Haskell threads are blocked too)
2) The time of writes is unpredictable making implementing a
   hFlush harder? (not sure about this)
3) Not all file descriptors will support it - i.e. we will
   need the read/write path in any case.
4) Mmap cannot be used for random access for arbitrary files
   since they may be larger than the address space. This means
   some kind of window needs to be implemented - and this is
   easily done with read/write.

- Einar Karttunen
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: new i/o library

2006-01-27 Thread Bulat Ziganshin
Hello Einar,

Friday, January 27, 2006, 4:19:55 PM, you wrote:

EK One simple optimization is that you can omit all buffering with
EK unbuffered operation. Then simply add the buffer (which is ok
EK because Handles are mutable) if the user ever calls hLookAhead.

yes, i do it

 moreover, i have an idea how to implement async i/o without complex
 burecreacy: use mmapped files, may be together with miltiple buffers.
 for example, we can allocate four 16kb buffers. when one buffer is
 filled with written data, the program unmaps it and switches to use
 the next buffer. i don't tested it, but OS can guess that unmapped
 buffer now should be asynchronously written to disk. the same for
 reading - when we completely read one buffer, we can unmap it, switch
 to the second buffer and map the third so that the OS can
 asynchronously fill the third buffer while we are reading second.
 should this work, at least on the main desktop OSes?

EK Please no. There are multiple reasons to avoid mmapped files.
EK 1) They make very few performance guarantees for reading
EK(i.e. a Haskell thread touches memory which has not yet
EK been read from the file causing IO and all the other
EK Haskell threads are blocked too)

yes, it seems that using mmapped file may slowdown such program

EK 2) The time of writes is unpredictable making implementing a
EKhFlush harder? (not sure about this)

i can say only about Windows - here FlushViewOfFile() do it

EK 3) Not all file descriptors will support it - i.e. we will
EKneed the read/write path in any case.

i don't understand what you mean, can you please explain futher?

EK 4) Mmap cannot be used for random access for arbitrary files
EKsince they may be larger than the address space. This means
EKsome kind of window needs to be implemented - and this is
EKeasily done with read/write.

that's not true, at least for Windows - see MapViewOfFile()



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: new i/o library

2006-01-27 Thread Bulat Ziganshin
Hello Duncan,

Friday, January 27, 2006, 4:00:28 PM, you wrote:

 moreover, i have an idea how to implement async i/o without complex
 burecreacy: use mmapped files, may be together with miltiple buffers.
 for example, we can allocate four 16kb buffers. when one buffer is
 filled with written data, the program unmaps it and switches to use
 the next buffer. i don't tested it, but OS can guess that unmapped
 buffer now should be asynchronously written to disk. the same for
 reading - when we completely read one buffer, we can unmap it, switch
 to the second buffer and map the third so that the OS can
 asynchronously fill the third buffer while we are reading second.
 should this work, at least on the main desktop OSes?

DC On Linux an probably other unix-like OSes I don't think this would be
DC any different from using read/write.

DC On Linux, read and mmap use the same underlying mechanism - the page
DC cache. The only difference is that with mmap you get zero-copy access to
DC the page cache. However frequent mapping and unmapping may eliminate
DC that advantage. Either way there is no difference in how asynchronous
DC the operations are.

yes, i want to save exactly this bit of performance - after i
optimized all other expenses on the path of text i/o

in other words, i interested in having zero-wait operation both for
reading and writing, i.e. that in sequence of getChar or putChar
actions there were no waits on any action - of course, if the disk is
fast enough. in other words, speed of such i/o programs should be the
same as if we just write these data to memory

current GHC's Handle implementation uses rather complex machinery for
async reading and writing, and i can even say that most part of
Handle's implementation complexity is due to this async machinery. so
i wanna know what exactly accomplished by this implementation and can
we implement async operation much easier by using mmap?

the word async is overloaded here - i'm most interested in having
zero-overhead in single-threaded operation, while GHC's optimization,
afair, is more about overlapping I/O in one thread with computations
in another. so i'm searching for fastest and easiest-to-implement
scheme. what you propose?


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: new i/o library

2006-01-27 Thread Simon Marlow

Bulat Ziganshin wrote:


i'm now write some sort of new i/o library. one area where i currently
lacks in comparision to the existing Handles implementation in GHC, is
the asynchronous i/o operations. can you please briefly describe how
this is done in GHC and partially - why the multiple buffers are used?


Multiple buffers were introduced to cope with the semantics we wanted 
for hPutStr.  The problem is that you don't want hPutStr to hold a lock 
on the Handle while it evaluates its argument list, because that could 
take arbitrary time.  Furthermore, things like this:


  putStr (trace foo bar)

used to cause deadlocks, because putStr holds the lock, evaluates its 
argument list, which causes trace to also attempt to acquire the lock on 
stdout, leading to deadlock.


So, putStr first grabs a buffer from the Handle, then unlocks the Handle 
while it fills up the buffer, then it takes the lock again to write the 
buffer.  Since another thread might try to putStr while the lock is 
released, we need multiple buffers.


For async IO on Unix, we use non-blocking read() calls, and if read() 
indicates that we need to block, we send a request to the IO Manager 
thread (see GHC.Conc) which calls select() on behalf of all the threads 
waiting for I/O.  For async IO on Windows, we either use the threaded 
RTS's blocking foreign call mechanism to invoke read(), or the 
non-threaded RTS has a similar mechanism internally.


We ought to be using the various alternatives to select(), but we 
haven't got around to that yet.



moreover, i have an idea how to implement async i/o without complex
burecreacy: use mmapped files, may be together with miltiple buffers.


I don't think we should restrict the implementation to mmap'd files, for 
all the reasons that Einar gave.  Lots of things aren't mmapable, mainly.


My vision for an I/O library is this:

  - a single class supporting binary input (resp. output) that is
implemented by various transports: files, sockets, mmap'd files,
memory and arrays.  Windowed mmap is an option here too.

  - layers of binary filters on top of this: you could add buffering,
and compression/decompression.

  - a layer of text translation at the top.

This is more or less how the Stream-based I/O library that I was working 
on is structured.


The binary I/O library would talk to a binary transport, perhaps with a 
layer of buffering, whereas text-based applications talk to the text layer.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users