Re: [HACKERS] Performance patch for Win32
On Thu, Aug 30, 2012 at 04:37:37PM -0400, Bruce Momjian wrote: On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote: I was imagining that this would be a trap for linux developers who saw nothing wrong with their code until it made it to the build/test farm. That's pretty far down the development process. Of course, it is also a trap in the other direction, for Windows developers who use the pattern but do not include anything equivalent for the non-Windows execution path. On the whole, however, your argument in favor of tighter patterns might be more convincing than my argument in favor of catching bugs sooner. I will start implementing your suggestion for patch v2. Any progress on this? I have added this to the TODO list: Reduce file statistics overhead on directory reads http://www.postgresql.org/message-id/1338325561.82125.yahoomail...@web39304.mail.mud.yahoo.com -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance patch for Win32
On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote: I was imagining that this would be a trap for linux developers who saw nothing wrong with their code until it made it to the build/test farm. That's pretty far down the development process. Of course, it is also a trap in the other direction, for Windows developers who use the pattern but do not include anything equivalent for the non-Windows execution path. On the whole, however, your argument in favor of tighter patterns might be more convincing than my argument in favor of catching bugs sooner. I will start implementing your suggestion for patch v2. Any progress on this? --- ━━━ From: Tom Lane t...@sss.pgh.pa.us To: Mark Dilger markdil...@yahoo.com Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org Sent: Tuesday, May 29, 2012 3:42 PM Subject: Re: [HACKERS] Performance patch for Win32 Mark Dilger markdil...@yahoo.com writes: I am hesitant to write a function like AllocateDirWithFilePattern if the pattern is simply ignored on non-Windows. In my patch, the pattern underspecified the files, and the ad-hoc matching code applied to all the returned files tightened that up. But a person could just as well overspecify the pattern and then they would get different behavior on Windows vs. non-Windows, with fewer files returned by FindNextFile() than would have matched the ad-hoc pattern. Well, if you're imagining that we wouldn't need to test carefully on both Windows and non-Windows, I think that's a pipe dream. As an example, your proposal of AllocateDirWithFilePrefix would only work consistently across platforms if the prefix didn't contain anything that Windows thought was a pattern metacharacter. (This might never come up, but I'm not too sure what the metacharacters are on Windows.) Having said that, I have nothing particularly against the idea of specifying a prefix rather than an arbitrary pattern. I'm just saying it'll still need testing. Also, I wonder how many of the potential stat-equivalent operations we'll be unable to optimize away with the more restricted definition. Using a tighter pattern on Windows seems basically free (modulo testing) if we accept that it's Windows-only. regards, tom lane -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance patch for Win32
This is a patch against src/backend/storage/file/fd.c taken from 9.2beta1. This patch is submitted for review and comments, not for application to the code base. *WIP* This patch addresses a performance problem stemming from the use of FindFirstFile() and FindNextFile() to iterate over a directory in Windows. These two functions are used in the port of readdir for Windows. Unfortunately, unlike Linux, these Windows directory iteration functions return the equivalent of a stat() call for each file iterated. Hence, if a directory contains many tens of thousands of files, the iteration can take several minutes to complete. In RemovePgTempFile(), multiple directories are iterated and all files found which match the pattern for a temporary file are unlinked. The pattern matching is performed *outside* the directory iteration. This patch uses a file pattern like t* to match all temporary files, rather than iterating over all files in the directory, thus pushing the pattern match *inside* the directory iteration and gaining significant startup time performance. This is not theoretical. The real-world database where I found this problem is on a Windows 2003 server running PostgreSQL 9.1.3 and having 56,000 tables. I was able to duplicate the problem on a Windows 2008 server. To reproduce, you will need a database on Windows with tens of thousands of tables and a recent version of PostgreSQL. Reboot the Windows server so that the filesystem is guaranteed not to be in the filesystem cache. Start postgres using pg_ctl, and note that it takes several minutes to start. After applying the patch and re-running these steps, the server should not take so long to start. I have the following reservations about my design, and solicit comments and suggestions for improvement: 1) The changes I made in fd.c pass a pattern rather than a name into ReadDir *knowing the details* of how ReadDir on Windows will use the port of readdir in src/port/dirent.c and that in that code FindFirstFile() and FindNextFile() will be called. This knowledge about the inner workings of the port of readdir() is not appropriate inside fd.c, IMHO. 2) I used a fair amount of #ifdef WIN32 to avoid adding unnecessary variables or branches to the non-windows code. Since this code is probably not on the critical path performancewise, this may be overkill. 3) The pattern passed to ReadDir of the form t* should probably be something closer to (in pcre form): m/^t\d+_\d+/, rather than m/^t.*/. I am not sufficiently familiar with how Windows interprets file patterns, and whether it interprets them differently from one version of Windows to another, to be comfortable making a more precise pattern. 4) Other places in the PostgreSQL sources where directory iteration is needed should probably use a pattern if possible when running on Windows. Thus, it might make more sense to have a version of ReadDir that explicitly takes a pattern, and use that version of ReadDir elsewhere in the codebase. fd.diffs Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance patch for Win32
Mark Dilger markdil...@yahoo.com writes: 4) Other places in the PostgreSQL sources where directory iteration is needed should probably use a pattern if possible when running on Windows. Thus, it might make more sense to have a version of ReadDir that explicitly takes a pattern, and use that version of ReadDir elsewhere in the codebase. Yeah, I think a separate argument passed to an AllocateDir variant would be a less ugly way to deal with this. For example, in place of your first #ifdef block just write temp_dir = AllocateDirWithFilePattern(tmpdirname, PG_TEMP_FILE_PREFIX *); What is not immediately obvious to me is whether we should try to make the pattern argument do something useful on non-Windows platforms (and thus allow removal of the ad-hoc pattern match code in the loops where this is used); versus just treating it as a no-op on non-Windows. If we did that, we'd have to consider that Windows gets to define what the pattern language is and try to emulate that; which is likely to be expensive enough to not be a win, not to mention that non-Windows developers might not be terribly familiar with the finer points of the pattern language. I'm kind of inclined to consider that we should just treat the pattern option as a Windows-specific wart and keep the ad-hoc matching code as it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance patch for Win32
I am hesitant to write a function like AllocateDirWithFilePattern if the pattern is simply ignored on non-Windows. In my patch, the pattern underspecified the files, and the ad-hoc matching code applied to all the returned files tightened that up. But a person could just as well overspecify the pattern and then they would get different behavior on Windows vs. non-Windows, with fewer files returned by FindNextFile() than would have matched the ad-hoc pattern. In this particular instance, AllocateDirWithFilePrefix would work, and could be applied on all platforms, using something like: temp_dir = AllocateDirWithFilePrefix(tmpdirname, PG_TEMP_FILE_PREFIX); and that could be converted to PG_TEMP_FILE_PREFIX * on Windows, and on non-Windows the function itself could apply the prefix check easily enough. Is AllocateDirWithFilePrefix overly specific? Clearly we could also take a Suffix argument, but if we go too far down this road we just reinvent regular expressions mark From: Tom Lane t...@sss.pgh.pa.us To: Mark Dilger markdil...@yahoo.com Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org Sent: Tuesday, May 29, 2012 2:30 PM Subject: Re: [HACKERS] Performance patch for Win32 Mark Dilger markdil...@yahoo.com writes: 4) Other places in the PostgreSQL sources where directory iteration is needed should probably use a pattern if possible when running on Windows. Thus, it might make more sense to have a version of ReadDir that explicitly takes a pattern, and use that version of ReadDir elsewhere in the codebase. Yeah, I think a separate argument passed to an AllocateDir variant would be a less ugly way to deal with this. For example, in place of your first #ifdef block just write temp_dir = AllocateDirWithFilePattern(tmpdirname, PG_TEMP_FILE_PREFIX *); What is not immediately obvious to me is whether we should try to make the pattern argument do something useful on non-Windows platforms (and thus allow removal of the ad-hoc pattern match code in the loops where this is used); versus just treating it as a no-op on non-Windows. If we did that, we'd have to consider that Windows gets to define what the pattern language is and try to emulate that; which is likely to be expensive enough to not be a win, not to mention that non-Windows developers might not be terribly familiar with the finer points of the pattern language. I'm kind of inclined to consider that we should just treat the pattern option as a Windows-specific wart and keep the ad-hoc matching code as it is. regards, tom lane
Re: [HACKERS] Performance patch for Win32
Mark Dilger markdil...@yahoo.com writes: I am hesitant to write a function like AllocateDirWithFilePattern if the pattern is simply ignored on non-Windows. In my patch, the pattern underspecified the files, and the ad-hoc matching code applied to all the returned files tightened that up. But a person could just as well overspecify the pattern and then they would get different behavior on Windows vs. non-Windows, with fewer files returned by FindNextFile() than would have matched the ad-hoc pattern. Well, if you're imagining that we wouldn't need to test carefully on both Windows and non-Windows, I think that's a pipe dream. As an example, your proposal of AllocateDirWithFilePrefix would only work consistently across platforms if the prefix didn't contain anything that Windows thought was a pattern metacharacter. (This might never come up, but I'm not too sure what the metacharacters are on Windows.) Having said that, I have nothing particularly against the idea of specifying a prefix rather than an arbitrary pattern. I'm just saying it'll still need testing. Also, I wonder how many of the potential stat-equivalent operations we'll be unable to optimize away with the more restricted definition. Using a tighter pattern on Windows seems basically free (modulo testing) if we accept that it's Windows-only. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance patch for Win32
I was imagining that this would be a trap for linux developers who saw nothing wrong with their code until it made it to the build/test farm. That's pretty far down the development process. Of course, it is also a trap in the other direction, for Windows developers who use the pattern but do not include anything equivalent for the non-Windows execution path. On the whole, however, your argument in favor of tighter patterns might be more convincing than my argument in favor of catching bugs sooner. I will start implementing your suggestion for patch v2. From: Tom Lane t...@sss.pgh.pa.us To: Mark Dilger markdil...@yahoo.com Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org Sent: Tuesday, May 29, 2012 3:42 PM Subject: Re: [HACKERS] Performance patch for Win32 Mark Dilger markdil...@yahoo.com writes: I am hesitant to write a function like AllocateDirWithFilePattern if the pattern is simply ignored on non-Windows. In my patch, the pattern underspecified the files, and the ad-hoc matching code applied to all the returned files tightened that up. But a person could just as well overspecify the pattern and then they would get different behavior on Windows vs. non-Windows, with fewer files returned by FindNextFile() than would have matched the ad-hoc pattern. Well, if you're imagining that we wouldn't need to test carefully on both Windows and non-Windows, I think that's a pipe dream. As an example, your proposal of AllocateDirWithFilePrefix would only work consistently across platforms if the prefix didn't contain anything that Windows thought was a pattern metacharacter. (This might never come up, but I'm not too sure what the metacharacters are on Windows.) Having said that, I have nothing particularly against the idea of specifying a prefix rather than an arbitrary pattern. I'm just saying it'll still need testing. Also, I wonder how many of the potential stat-equivalent operations we'll be unable to optimize away with the more restricted definition. Using a tighter pattern on Windows seems basically free (modulo testing) if we accept that it's Windows-only. regards, tom lane