RE: openFile and threads

2003-01-13 Thread Simon Marlow
> > Haskell Strings are a common performance bottleneck; for 
> example when
> > serving files in the Haskell web server I avoided the conversion to
> > Haskell Strings altogether by reading/writing arrays of 
> bytes (see the
> > paper for details).
> 
> I was curious to see if this is also the case here. Therefore I just
> pasted the GHC implementation of openFile into Peter's suspicious
> module ('openFile' obtained from
> http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/ba
> se/GHC/Handle.hs---I
> hope this was the right one?) to be able to also profile the GHC
> internal openfile code. Here are the relevant parts of the resulting
> output of the profiler:
> 
> COST CENTREMODULE   %time %alloc
> 
> withCString'   MailStore 39.1   19.7

Interesting - I just looked at the code for withCString and it is being
poorly optimised.  There are several layers of FFI abstraction which
aren't being inlined/deforested away.

Thanks for the pointer, I'll take a look at this.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: openFile and threads

2003-01-13 Thread Matthias Neubauer
"Simon Marlow" <[EMAIL PROTECTED]> writes:

> > > You might consider bypassing the Handle interface and going 
> > to the bare
> > > metal using the Posix library, which will cut down on the 
> > overhead in
> > > openFile.
> > 
> > That's what I was fearing. Is the conversion from Haskell Strings to
> > C strings a performance problem?
> 
> Haskell Strings are a common performance bottleneck; for example when
> serving files in the Haskell web server I avoided the conversion to
> Haskell Strings altogether by reading/writing arrays of bytes (see the
> paper for details).

I was curious to see if this is also the case here. Therefore I just
pasted the GHC implementation of openFile into Peter's suspicious
module ('openFile' obtained from
http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/base/GHC/Handle.hs---I
hope this was the right one?) to be able to also profile the GHC
internal openfile code. Here are the relevant parts of the resulting
output of the profiler:

COST CENTREMODULE   %time %alloc

withCString'   MailStore 39.1   19.7
f1 MailStore 26.1   40.9
f9 MailStore 21.78.8
getBuffer  MailStore  4.30.1
f6.2   MailStore  4.34.0
f6 MailStore  4.32.3
f6.3   MailStore  0.01.6
allocateBuffer MailStore  0.0   19.4

...
COST CENTRE  MODULE no. entries %time %alloc  %time %alloc

f6.1 MailStore   361 0   0.00.143.5   41.2
 openFileMailStore   362  1154   0.00.143.5   41.1
  openFile'  MailStore   365  1154   0.00.043.5   40.9
   withCString'  MailStore   367 0  39.1   19.739.1   19.7
 openFd  MailStore   371  1154   0.00.7 4.3   20.9
  mkFileHandle MailStore 372  1154   0.00.3 4.3   20.2
   initBufferState MailStore 387  1154   0.00.0 0.00.0
   newFileHandle MailStore   376  1154   0.00.1 0.00.3
handleFinalizer MailStore377 0   0.00.1 0.00.2
 flushWriteBufferOnly MailStore  389  1154   0.00.0 0.00.0
   getBuffer MailStore   373  1154   4.30.1 4.3   19.6
allocateBuffer MailStore 374  1154   0.0   19.4 0.0   19.5
 newEmptyBuffer MailStore375 0   0.00.1 0.00.1

...

The cost centre "f6.1" is the location of the recurring call of
"openFile". As you can see almost all of the time is spent in the
function "withCString" translating Haskell strings representing the
file names to the C representation. 

I knew that Haskell strings are bad, but I really did not expect them
to cause such a huge time penalty ...

Cheers,

Matthias

-- 
Matthias Neubauer   |
Universität Freiburg, Institut für Informatik   | tel +49 761 203 8060
Georges-Köhler-Allee 79, 79110 Freiburg i. Br., Germany | fax +49 761 203 8052
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: openFile and threads

2003-01-10 Thread Peter Thiemann
Hi,

one thing that I gleaned from Matthias's strace output is the fcntl
system calls to obtain the locks. They are not really needed for my application
and could be avoided using the POSIX library. 
However, this does not seem to be a big issue just from the cost of the
system calls, because there is no noticeable difference between
Matthias's C programs.
>
> But it sounds like in your case you need to open lots of (small?) files.
> What do you do with the contents of the files?

Yes, the files are small. They typically contain less than 10 words. 
These words are key words and the Select program selects file names based
on boolean expressions formed from the key words.

What I'm actually doing is implementing a data base for email messages, where
these file names are the primary keys for the messages. Each message is 
represented by two files of the same name (but in different directories),
the data file containing the raw message, and the meta data file that just
contains a list of key words.

Keeping the meta data in this way allows for smooth interaction with file
synchronization across several computers. So it's necessary that each set
of key words resides in its own file.
I have created a web page with a little more argumentation about it:
http://www.informatik.uni-freiburg.de/~thiemann/MailStore
The downloadables may be a bit out of date, but you get the idea.

Hope that helps.
-Peter
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: openFile and threads

2003-01-10 Thread Simon Marlow

> > You might consider bypassing the Handle interface and going 
> to the bare
> > metal using the Posix library, which will cut down on the 
> overhead in
> > openFile.
> 
> That's what I was fearing. Is the conversion from Haskell Strings to
> C strings a performance problem?

Haskell Strings are a common performance bottleneck; for example when
serving files in the Haskell web server I avoided the conversion to
Haskell Strings altogether by reading/writing arrays of bytes (see the
paper for details).

But it sounds like in your case you need to open lots of (small?) files.
What do you do with the contents of the files?

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: openFile and threads

2003-01-10 Thread Peter Thiemann
Hi Simon,
>
> If the openFile isn't on the critical path, then it can help to do it in
> another thread, but only if the openFile is blocking (not usually the
> case if you're opening a normal file).

Actually, we tried that using mergeIO, but there was no real improvement.
>
> You might consider bypassing the Handle interface and going to the bare
> metal using the Posix library, which will cut down on the overhead in
> openFile.

That's what I was fearing. Is the conversion from Haskell Strings to
C strings a performance problem?
>
> The best solution is to avoid the open altogether if possible (eg. with
> caching), but I don't know about your particular application so this
> might not be a possibility.

No, that's not possible. The program reads every file at most once and then
terminates. Although one might contemplate writing a separate server that
keeps all the information in memory and refreshes it regularly.

Thanks
-Peter
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: openFile and threads

2003-01-10 Thread Simon Marlow
> Now I'm wondering about ways to cut that down:
> 1. How can I avoid the allocations inside of openFile?
> 2. Would it help to call f6 in different threads? That is, does a
>thread yield when it calls an IO function?

openFile is quite expensive because it normally allocates a buffer,
which I suspect accounts for most of the allocation you're seeing.  It
also sets up a finalizer, allocates the Handle, and does a couple of
extra system calls.

If the openFile isn't on the critical path, then it can help to do it in
another thread, but only if the openFile is blocking (not usually the
case if you're opening a normal file).

You might consider bypassing the Handle interface and going to the bare
metal using the Posix library, which will cut down on the overhead in
openFile.

The best solution is to avoid the open altogether if possible (eg. with
caching), but I don't know about your particular application so this
might not be a possibility.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



openFile and threads

2003-01-09 Thread Peter Thiemann
Folks,

here is the piece of code that takes most of the time in a program I
have:

  f6 = {-# SCC "f6" #-}\gumd ->
let fileName = usermetadir ++ gumd in
catch (do h <- {-# SCC "f6.1" #-} openFile fileName ReadMode
  str <- {-# SCC "f6.2" #-} hGetLine h
  _ <- {-# SCC "f6.2a" #-} hClose h
  return $ {-# SCC "f6.3" #-} words str)
  (const $ return [])

Profiling yields this output:
  individualinherited
COST CENTRE  MODULE  no.entries  %time %alloc   %time %alloc
  f6 MailStore  346 577   0.03.587.5   85.1
   f6.3  MailStore  351   0   0.09.9 0.09.9
   f6.2a MailStore  350   0   0.00.7 0.00.7
   f6.2  MailStore  349   0   0.07.5 0.07.5
   f6.1  MailStore  347   0  87.5   63.587.5   63.5

If I read this correctly, openFile performs 63.5% of all allocations
and takes 87.5% of the runtime.

Now I'm wondering about ways to cut that down:
1. How can I avoid the allocations inside of openFile?
2. Would it help to call f6 in different threads? That is, does a
   thread yield when it calls an IO function?

Any help appreciated.

-Peter
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users