Re: [HACKERS] Performance patch for Win32

2013-01-24 Thread Bruce Momjian
On Thu, Aug 30, 2012 at 04:37:37PM -0400, Bruce Momjian wrote:
 On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote:
  I was imagining that this would be a trap for linux developers
  who saw nothing wrong with their code until it made it to the
  build/test farm.  That's pretty far down the development
  process.  Of course, it is also a trap in the other direction, for
  Windows developers who use the pattern but do not include
  anything equivalent for the non-Windows execution path.
  
  On the whole, however, your argument in favor of tighter
  patterns might be more convincing than my argument in favor
  of catching bugs sooner.
  
  I will start implementing your suggestion for patch v2.
 
 Any progress on this?

I have added this to the TODO list:

  Reduce file statistics overhead on directory reads


http://www.postgresql.org/message-id/1338325561.82125.yahoomail...@web39304.mail.mud.yahoo.com

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance patch for Win32

2012-08-30 Thread Bruce Momjian
On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote:
 I was imagining that this would be a trap for linux developers
 who saw nothing wrong with their code until it made it to the
 build/test farm.  That's pretty far down the development
 process.  Of course, it is also a trap in the other direction, for
 Windows developers who use the pattern but do not include
 anything equivalent for the non-Windows execution path.
 
 On the whole, however, your argument in favor of tighter
 patterns might be more convincing than my argument in favor
 of catching bugs sooner.
 
 I will start implementing your suggestion for patch v2.

Any progress on this?

---


 
 ━━━
 From: Tom Lane t...@sss.pgh.pa.us
 To: Mark Dilger markdil...@yahoo.com
 Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org
 Sent: Tuesday, May 29, 2012 3:42 PM
 Subject: Re: [HACKERS] Performance patch for Win32
 
 Mark Dilger markdil...@yahoo.com writes:
  I am hesitant to write a function like AllocateDirWithFilePattern
  if the pattern is simply ignored on non-Windows.  In my patch,
  the pattern underspecified the files, and the ad-hoc matching code
  applied to all the returned files tightened that up.  But a person
  could just as well overspecify the pattern and then they would get
  different behavior on Windows vs. non-Windows, with fewer
  files returned by FindNextFile() than would have matched the
  ad-hoc pattern.
 
 Well, if you're imagining that we wouldn't need to test carefully on
 both Windows and non-Windows, I think that's a pipe dream.  As an
 example, your proposal of AllocateDirWithFilePrefix would only work
 consistently across platforms if the prefix didn't contain anything
 that Windows thought was a pattern metacharacter.  (This might never
 come up, but I'm not too sure what the metacharacters are on Windows.)
 
 Having said that, I have nothing particularly against the idea of
 specifying a prefix rather than an arbitrary pattern.  I'm just
 saying it'll still need testing.  Also, I wonder how many of the
 potential stat-equivalent operations we'll be unable to optimize
 away with the more restricted definition.  Using a tighter pattern
 on Windows seems basically free (modulo testing) if we accept that
 it's Windows-only.
 
 regards, tom lane
 
 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance patch for Win32

2012-05-29 Thread Mark Dilger
This is a patch against src/backend/storage/file/fd.c
taken from 9.2beta1.

This patch is submitted for review and comments, not
for application to the code base.  *WIP*


This patch addresses a performance problem stemming
from the use of FindFirstFile() and FindNextFile() to
iterate over a directory in Windows.  These two functions
are used in the port of readdir for Windows.  Unfortunately,
unlike Linux, these Windows directory iteration functions
return the equivalent of a stat() call for each file iterated.
Hence, if a directory contains many tens of thousands of
files, the iteration can take several minutes to complete.

In RemovePgTempFile(), multiple directories are iterated
and all files found which match the pattern for a temporary
file are unlinked.  The pattern matching is performed
*outside* the directory iteration.  This patch uses a file
pattern like t* to match all temporary files, rather than
iterating over all files in the directory, thus pushing the
pattern match *inside* the directory iteration and gaining
significant startup time performance.

This is not theoretical.  The real-world database where I
found this problem is on a Windows 2003 server running
PostgreSQL 9.1.3 and having 56,000 tables.  I was able
to duplicate the problem on a Windows 2008 server.


To reproduce, you will need a database on Windows with
tens of thousands of tables and a recent version of
PostgreSQL.  Reboot the Windows server so that the
filesystem is guaranteed not to be in the filesystem cache.
Start postgres using pg_ctl, and note that it takes several
minutes to start.  After applying the patch and re-running
these steps, the server should not take so long to start.

I have the following reservations about my design, and
solicit comments and suggestions for improvement:

1)  The changes I made in fd.c pass a pattern rather than
a name into ReadDir *knowing the details* of how ReadDir
on Windows will use the port of readdir in src/port/dirent.c
and that in that code FindFirstFile() and FindNextFile() will
be called.  This knowledge about the inner workings of
the port of readdir() is not appropriate inside fd.c, IMHO.

2) I used a fair amount of #ifdef WIN32 to avoid adding
unnecessary variables or branches to the non-windows
code.  Since this code is probably not on the critical path
performancewise, this may be overkill.

3) The pattern passed to ReadDir of the form t* should
probably be something closer to (in pcre form):
m/^t\d+_\d+/, rather than m/^t.*/.  I am not sufficiently
familiar with how Windows interprets file patterns, and
whether it interprets them differently from one version of
Windows to another, to be comfortable making a more
precise pattern.


4) Other places in the PostgreSQL sources where directory
iteration is needed should probably use a pattern if possible
when running on Windows.  Thus, it might make more
sense to have a version of ReadDir that explicitly takes a
pattern, and use that version of ReadDir elsewhere in the
codebase.

fd.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance patch for Win32

2012-05-29 Thread Tom Lane
Mark Dilger markdil...@yahoo.com writes:
 4) Other places in the PostgreSQL sources where directory
 iteration is needed should probably use a pattern if possible
 when running on Windows.  Thus, it might make more
 sense to have a version of ReadDir that explicitly takes a
 pattern, and use that version of ReadDir elsewhere in the
 codebase.

Yeah, I think a separate argument passed to an AllocateDir variant
would be a less ugly way to deal with this.  For example, in place
of your first #ifdef block just write

temp_dir = AllocateDirWithFilePattern(tmpdirname,
  PG_TEMP_FILE_PREFIX *);


What is not immediately obvious to me is whether we should try to make
the pattern argument do something useful on non-Windows platforms
(and thus allow removal of the ad-hoc pattern match code in the loops
where this is used); versus just treating it as a no-op on non-Windows.
If we did that, we'd have to consider that Windows gets to define what
the pattern language is and try to emulate that; which is likely to be
expensive enough to not be a win, not to mention that non-Windows
developers might not be terribly familiar with the finer points of the
pattern language.

I'm kind of inclined to consider that we should just treat the pattern
option as a Windows-specific wart and keep the ad-hoc matching code as
it is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance patch for Win32

2012-05-29 Thread Mark Dilger
I am hesitant to write a function like AllocateDirWithFilePattern
if the pattern is simply ignored on non-Windows.  In my patch,
the pattern underspecified the files, and the ad-hoc matching code
applied to all the returned files tightened that up.  But a person
could just as well overspecify the pattern and then they would get
different behavior on Windows vs. non-Windows, with fewer
files returned by FindNextFile() than would have matched the
ad-hoc pattern.


In this particular instance, AllocateDirWithFilePrefix would
work, and could be applied on all platforms, using something
like:

    temp_dir = AllocateDirWithFilePrefix(tmpdirname,
                                        PG_TEMP_FILE_PREFIX);

and that could be converted to PG_TEMP_FILE_PREFIX *
on Windows, and on non-Windows the function itself could
apply the prefix check easily enough.

Is AllocateDirWithFilePrefix overly specific?  Clearly we
could also take a Suffix argument, but if we go too far down
this road we just reinvent regular expressions


mark




 From: Tom Lane t...@sss.pgh.pa.us
To: Mark Dilger markdil...@yahoo.com 
Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org 
Sent: Tuesday, May 29, 2012 2:30 PM
Subject: Re: [HACKERS] Performance patch for Win32 
 
Mark Dilger markdil...@yahoo.com writes:
 4) Other places in the PostgreSQL sources where directory
 iteration is needed should probably use a pattern if possible
 when running on Windows.  Thus, it might make more
 sense to have a version of ReadDir that explicitly takes a
 pattern, and use that version of ReadDir elsewhere in the
 codebase.

Yeah, I think a separate argument passed to an AllocateDir variant
would be a less ugly way to deal with this.  For example, in place
of your first #ifdef block just write

        temp_dir = AllocateDirWithFilePattern(tmpdirname,
                                              PG_TEMP_FILE_PREFIX *);


What is not immediately obvious to me is whether we should try to make
the pattern argument do something useful on non-Windows platforms
(and thus allow removal of the ad-hoc pattern match code in the loops
where this is used); versus just treating it as a no-op on non-Windows.
If we did that, we'd have to consider that Windows gets to define what
the pattern language is and try to emulate that; which is likely to be
expensive enough to not be a win, not to mention that non-Windows
developers might not be terribly familiar with the finer points of the
pattern language.

I'm kind of inclined to consider that we should just treat the pattern
option as a Windows-specific wart and keep the ad-hoc matching code as
it is.

            regards, tom lane

Re: [HACKERS] Performance patch for Win32

2012-05-29 Thread Tom Lane
Mark Dilger markdil...@yahoo.com writes:
 I am hesitant to write a function like AllocateDirWithFilePattern
 if the pattern is simply ignored on non-Windows.  In my patch,
 the pattern underspecified the files, and the ad-hoc matching code
 applied to all the returned files tightened that up.  But a person
 could just as well overspecify the pattern and then they would get
 different behavior on Windows vs. non-Windows, with fewer
 files returned by FindNextFile() than would have matched the
 ad-hoc pattern.

Well, if you're imagining that we wouldn't need to test carefully on
both Windows and non-Windows, I think that's a pipe dream.  As an
example, your proposal of AllocateDirWithFilePrefix would only work
consistently across platforms if the prefix didn't contain anything
that Windows thought was a pattern metacharacter.  (This might never
come up, but I'm not too sure what the metacharacters are on Windows.)

Having said that, I have nothing particularly against the idea of
specifying a prefix rather than an arbitrary pattern.  I'm just
saying it'll still need testing.  Also, I wonder how many of the
potential stat-equivalent operations we'll be unable to optimize
away with the more restricted definition.  Using a tighter pattern
on Windows seems basically free (modulo testing) if we accept that
it's Windows-only.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance patch for Win32

2012-05-29 Thread Mark Dilger
I was imagining that this would be a trap for linux developers
who saw nothing wrong with their code until it made it to the
build/test farm.  That's pretty far down the development
process.  Of course, it is also a trap in the other direction, for
Windows developers who use the pattern but do not include
anything equivalent for the non-Windows execution path.


On the whole, however, your argument in favor of tighter
patterns might be more convincing than my argument in favor
of catching bugs sooner.

I will start implementing your suggestion for patch v2.




 From: Tom Lane t...@sss.pgh.pa.us
To: Mark Dilger markdil...@yahoo.com 
Cc: pgsql-hackers@postgresql.org pgsql-hackers@postgresql.org 
Sent: Tuesday, May 29, 2012 3:42 PM
Subject: Re: [HACKERS] Performance patch for Win32 
 
Mark Dilger markdil...@yahoo.com writes:
 I am hesitant to write a function like AllocateDirWithFilePattern
 if the pattern is simply ignored on non-Windows.  In my patch,
 the pattern underspecified the files, and the ad-hoc matching code
 applied to all the returned files tightened that up.  But a person
 could just as well overspecify the pattern and then they would get
 different behavior on Windows vs. non-Windows, with fewer
 files returned by FindNextFile() than would have matched the
 ad-hoc pattern.

Well, if you're imagining that we wouldn't need to test carefully on
both Windows and non-Windows, I think that's a pipe dream.  As an
example, your proposal of AllocateDirWithFilePrefix would only work
consistently across platforms if the prefix didn't contain anything
that Windows thought was a pattern metacharacter.  (This might never
come up, but I'm not too sure what the metacharacters are on Windows.)

Having said that, I have nothing particularly against the idea of
specifying a prefix rather than an arbitrary pattern.  I'm just
saying it'll still need testing.  Also, I wonder how many of the
potential stat-equivalent operations we'll be unable to optimize
away with the more restricted definition.  Using a tighter pattern
on Windows seems basically free (modulo testing) if we accept that
it's Windows-only.

            regards, tom lane