RE: openFile and threads
> > Haskell Strings are a common performance bottleneck; for > example when > > serving files in the Haskell web server I avoided the conversion to > > Haskell Strings altogether by reading/writing arrays of > bytes (see the > > paper for details). > > I was curious to see if this is also the case here. Therefore I just > pasted the GHC implementation of openFile into Peter's suspicious > module ('openFile' obtained from > http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/ba > se/GHC/Handle.hs---I > hope this was the right one?) to be able to also profile the GHC > internal openfile code. Here are the relevant parts of the resulting > output of the profiler: > > COST CENTREMODULE %time %alloc > > withCString' MailStore 39.1 19.7 Interesting - I just looked at the code for withCString and it is being poorly optimised. There are several layers of FFI abstraction which aren't being inlined/deforested away. Thanks for the pointer, I'll take a look at this. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: openFile and threads
"Simon Marlow" <[EMAIL PROTECTED]> writes: > > > You might consider bypassing the Handle interface and going > > to the bare > > > metal using the Posix library, which will cut down on the > > overhead in > > > openFile. > > > > That's what I was fearing. Is the conversion from Haskell Strings to > > C strings a performance problem? > > Haskell Strings are a common performance bottleneck; for example when > serving files in the Haskell web server I avoided the conversion to > Haskell Strings altogether by reading/writing arrays of bytes (see the > paper for details). I was curious to see if this is also the case here. Therefore I just pasted the GHC implementation of openFile into Peter's suspicious module ('openFile' obtained from http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/base/GHC/Handle.hs---I hope this was the right one?) to be able to also profile the GHC internal openfile code. Here are the relevant parts of the resulting output of the profiler: COST CENTREMODULE %time %alloc withCString' MailStore 39.1 19.7 f1 MailStore 26.1 40.9 f9 MailStore 21.78.8 getBuffer MailStore 4.30.1 f6.2 MailStore 4.34.0 f6 MailStore 4.32.3 f6.3 MailStore 0.01.6 allocateBuffer MailStore 0.0 19.4 ... COST CENTRE MODULE no. entries %time %alloc %time %alloc f6.1 MailStore 361 0 0.00.143.5 41.2 openFileMailStore 362 1154 0.00.143.5 41.1 openFile' MailStore 365 1154 0.00.043.5 40.9 withCString' MailStore 367 0 39.1 19.739.1 19.7 openFd MailStore 371 1154 0.00.7 4.3 20.9 mkFileHandle MailStore 372 1154 0.00.3 4.3 20.2 initBufferState MailStore 387 1154 0.00.0 0.00.0 newFileHandle MailStore 376 1154 0.00.1 0.00.3 handleFinalizer MailStore377 0 0.00.1 0.00.2 flushWriteBufferOnly MailStore 389 1154 0.00.0 0.00.0 getBuffer MailStore 373 1154 4.30.1 4.3 19.6 allocateBuffer MailStore 374 1154 0.0 19.4 0.0 19.5 newEmptyBuffer MailStore375 0 0.00.1 0.00.1 ... The cost centre "f6.1" is the location of the recurring call of "openFile". As you can see almost all of the time is spent in the function "withCString" translating Haskell strings representing the file names to the C representation. I knew that Haskell strings are bad, but I really did not expect them to cause such a huge time penalty ... Cheers, Matthias -- Matthias Neubauer | Universität Freiburg, Institut für Informatik | tel +49 761 203 8060 Georges-Köhler-Allee 79, 79110 Freiburg i. Br., Germany | fax +49 761 203 8052 ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: openFile and threads
Hi, one thing that I gleaned from Matthias's strace output is the fcntl system calls to obtain the locks. They are not really needed for my application and could be avoided using the POSIX library. However, this does not seem to be a big issue just from the cost of the system calls, because there is no noticeable difference between Matthias's C programs. > > But it sounds like in your case you need to open lots of (small?) files. > What do you do with the contents of the files? Yes, the files are small. They typically contain less than 10 words. These words are key words and the Select program selects file names based on boolean expressions formed from the key words. What I'm actually doing is implementing a data base for email messages, where these file names are the primary keys for the messages. Each message is represented by two files of the same name (but in different directories), the data file containing the raw message, and the meta data file that just contains a list of key words. Keeping the meta data in this way allows for smooth interaction with file synchronization across several computers. So it's necessary that each set of key words resides in its own file. I have created a web page with a little more argumentation about it: http://www.informatik.uni-freiburg.de/~thiemann/MailStore The downloadables may be a bit out of date, but you get the idea. Hope that helps. -Peter ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: openFile and threads
> > You might consider bypassing the Handle interface and going > to the bare > > metal using the Posix library, which will cut down on the > overhead in > > openFile. > > That's what I was fearing. Is the conversion from Haskell Strings to > C strings a performance problem? Haskell Strings are a common performance bottleneck; for example when serving files in the Haskell web server I avoided the conversion to Haskell Strings altogether by reading/writing arrays of bytes (see the paper for details). But it sounds like in your case you need to open lots of (small?) files. What do you do with the contents of the files? Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: openFile and threads
Hi Simon, > > If the openFile isn't on the critical path, then it can help to do it in > another thread, but only if the openFile is blocking (not usually the > case if you're opening a normal file). Actually, we tried that using mergeIO, but there was no real improvement. > > You might consider bypassing the Handle interface and going to the bare > metal using the Posix library, which will cut down on the overhead in > openFile. That's what I was fearing. Is the conversion from Haskell Strings to C strings a performance problem? > > The best solution is to avoid the open altogether if possible (eg. with > caching), but I don't know about your particular application so this > might not be a possibility. No, that's not possible. The program reads every file at most once and then terminates. Although one might contemplate writing a separate server that keeps all the information in memory and refreshes it regularly. Thanks -Peter ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: openFile and threads
> Now I'm wondering about ways to cut that down: > 1. How can I avoid the allocations inside of openFile? > 2. Would it help to call f6 in different threads? That is, does a >thread yield when it calls an IO function? openFile is quite expensive because it normally allocates a buffer, which I suspect accounts for most of the allocation you're seeing. It also sets up a finalizer, allocates the Handle, and does a couple of extra system calls. If the openFile isn't on the critical path, then it can help to do it in another thread, but only if the openFile is blocking (not usually the case if you're opening a normal file). You might consider bypassing the Handle interface and going to the bare metal using the Posix library, which will cut down on the overhead in openFile. The best solution is to avoid the open altogether if possible (eg. with caching), but I don't know about your particular application so this might not be a possibility. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
openFile and threads
Folks, here is the piece of code that takes most of the time in a program I have: f6 = {-# SCC "f6" #-}\gumd -> let fileName = usermetadir ++ gumd in catch (do h <- {-# SCC "f6.1" #-} openFile fileName ReadMode str <- {-# SCC "f6.2" #-} hGetLine h _ <- {-# SCC "f6.2a" #-} hClose h return $ {-# SCC "f6.3" #-} words str) (const $ return []) Profiling yields this output: individualinherited COST CENTRE MODULE no.entries %time %alloc %time %alloc f6 MailStore 346 577 0.03.587.5 85.1 f6.3 MailStore 351 0 0.09.9 0.09.9 f6.2a MailStore 350 0 0.00.7 0.00.7 f6.2 MailStore 349 0 0.07.5 0.07.5 f6.1 MailStore 347 0 87.5 63.587.5 63.5 If I read this correctly, openFile performs 63.5% of all allocations and takes 87.5% of the runtime. Now I'm wondering about ways to cut that down: 1. How can I avoid the allocations inside of openFile? 2. Would it help to call f6 in different threads? That is, does a thread yield when it calls an IO function? Any help appreciated. -Peter ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users