On Sunday 28 November 2010, Larry Garfield <la...@garfieldtech.com> wrote:
> There are many things that everybody "knows" about optimizing PHP code. > One of them is that one of the most expensive parts of the process is > loading code off of disk and compiling it, which is why opcode caches > are such a bit performance boost. The corollary to that, of course, is > that more files = more IO and therefore more of a performance hit. It depends on the implementation that PHP uses to open the file. For example on Linux and similar operating systems, PHP uses the mmap(2) function instead of read(2) or fread(2) functions, so it maps the complete file into memory, that is more faster than using partial file reads. > > But... this is 'effin 2010. It's almost bloody 2011. Operating systems > are smart. They already have 14 levels of caching built into them from > hard drive micro-controller to RAM to CPU cache to OS. I've heard from > other people (who should know) that the IO cost of doing a file_exists() > or other stat calls is almost non-existent because a modern OS caches > that, and with OS-based file caching even reading small files off disk > (the size that most PHP source files are) is not as slow as we think. Yes, that's true. This point depends on how the operating system has implemented it's VFS. Linux, FreeBSD and other platforms, have a well done implementation of VFSs, so they have a good cache implementation for concurrent reads, and first read on a file is made /hard/, then it uses the cached file location (inode data). > > Personally, I don't know. I am not an OS engineer and haven't > benchmarked such things, nor am I really competent to do so. However, > it makes a huge impact on the way one structures a large PHP program as > the performance trade- offs of huge files with massive unused code (that > has to be compiled) vs the cost of reading lots of separate files from > disk (more IO) is highly dependent on the speed of the aforementioned IO > and of compilation. You can do your own benchmarks tests, from high level perspective or low level perspective. If you want to trace performance on how PHP reads files from the hard drive, you can use some extensions like xdebug. For example if you prefer to use require() and include(), instead of require_once() and include_once() for concurrent reads, probably you will get a lower performance because you will do real concurrent reads on certain files. > > So... does anyone have any actual, hard data here? I don't mean "I > think" or "in my experience". I am looking for hard benchmarks, > profiling, or writeups of how OS (Linux specifically if it matters) file > caching works in 2010, not in 1998. Well, it also depends on the operating system configuration. If you just want to know the performance of IO functions on PHP, I suggest to use an extension like xdebug. It can generate profiling information to be used with kcachegrind, so you can properly visualize how are working IO functions in php. Probably a mmap(2) extension for PHP would be useful for certain kind of files. The file_get_contents() function uses open(2)/read(2), so you can't do a quick on a file. > > Modernizing what "everyone knows" is important for the general community, > and the quality of our code. > > --Larry Garfield Best regards, -- Daniel Molina Wegener <dmw [at] coder [dot] cl> System Programmer & Web Developer Phone: +56 (2) 979-0277 | Blog: http://coder.cl/
Description: This is a digitally signed message part.