Re: [HACKERS] Commitfest: patches Ready for Committer
On Mon, Oct 6, 2014 at 10:53 PM, Tom Lane wrote: > The levenshtein-distance thingy seems to still be a topic of debate > as well, both as to how we're going to refactor the code and as to > what the exact hinting rules ought to be. If some committer wants > to take charge of it and resolve those issues, fine; but I don't want > to see it done by just blindly committing whatever the last submitted > version was. > My 2c. I imagine that in this case the consensus is going to be first: - Move a minimum of the core functions of fuzzystrmatch and rename them (str_distance?) - Do not dump fuzzystrmatch and have the levenshtein call those functions In any case levenshtein code needs a careful refactoring and some additional thoughts first before the hint code is touched. Regards. -- Michael
Re: [HACKERS] Commitfest: patches Ready for Committer
Heikki Linnakangas writes: > Committers: Could you please pick a patch, and commit if appropriate? Or > if there's a patch there that you think should *not* be committed, > please speak up. The "custom plan API" thing may be marked ready for committer, but that doesn't mean it's committable, or even that there is any consensus about whether we want it or what it should look like. The levenshtein-distance thingy seems to still be a topic of debate as well, both as to how we're going to refactor the code and as to what the exact hinting rules ought to be. If some committer wants to take charge of it and resolve those issues, fine; but I don't want to see it done by just blindly committing whatever the last submitted version was. I've not paid much of any attention to the other four. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Josh Berkus wrote: Hackers, At this point, almost all patches have been assigned to reviewers. If you submitted a patch and don't get feedback by Saturday, take a look at who's in the "reviewers" column and send them a query. Since I have no way to track when patches are assigned to reviewers, I have no idea if some of them are sitting on their hands. : Note that patches need not have only one reviewer! If you have time, please take on testing some of the more complex patches, especially: Column-level Permissions Common Table Expressions SE-PostgreSQL patches I updated my patches: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00859.php It contains rebasing to the latest CVS HEAD, a new hook to avoid bypass access controls, and a small fix. In addition, I tried to attach several short explanations about key points to understand the big patch. I wrote the documentation of SE-PostgreSQL, but it is not a description for *the patch*. If reviewers have any unclear things, please feel free to ask me. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Josh Berkus wrote: Josh is probably basing that on the long history of discussion/revision cycles. Yep, and that *I* don't understand what the patch does, so I'm not going to turn a newbie reviewer loose on it. Here is a quick overview, there are two parts to the patch: 1. event system This allows one to register "PQregisterEventProc" a per-conn callback function with libpq that gets called when particular events occur. Currently, the events tell you when a conn or result is created, reset, destroyed and copied. It is generic enough to add more events in the future. By receiving events about libpq objects, you can properly allocate and free userland memory (PQfinish, PQexec, PQclear, etc... trigger events) and associate it with the libpq object: thus PQsetInstanceData(conn...) or PQsetResultInstanceData(res). This is basically adding members to the opaque PGconn or PGresult during runtime. This "instance data" can be retreived via PQinstanceData and PQresultInstanceData. To shine a different light on it, apps normally wrap a PGconn or PGresult within their own structures, but now you can wrap the app structures inside a PGconn or PGresult. This may help, its the libpqtypes PGEventProc implementation. http://libpqtypes.esilo.com/browse_source.html?file=events.c Also check out the patches sgml docs if you get a chance. 2. result management There are also four functions that provide more control over PGresult. If you need to create a result from scratch, expands on the PQmakeEmptyPGResult idea. PQcopyResult - copy a given source result, flags argument determines what portions of the result are copied. PQsetResultAttrs - sets the attributes of the reuslt (its columns). PQsetvalue - sets a tuple value in a result PQresultAlloc - thin wrapper around the internal pqResultAlloc. Uses the result's block allocater, which allows PQclear to properly free all memory assocaited with a PGresult. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
> Josh is probably basing that on the long history of > discussion/revision cycles. Yep, and that *I* don't understand what the patch does, so I'm not going to turn a newbie reviewer loose on it. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
On Thu, Sep 11, 2008 at 5:06 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> The patch looks OK to me as it was the last time I looked at it; maybe >> Tom has more comments, so I decided against just committing it. > > I haven't got round to looking at it in this fest. If anyone else wants > to look it over, feel free. I think Josh is overrating the patch's > review difficulty --- anyone who uses libpq a lot could review it, > IMHO. You certainly wouldn't need any backend-internals knowledge. > Josh is probably basing that on the long history of discussion/revision cycles. There is very little change from the design that was hammered out except for the late breaking tweak of how the passthru pointer works. I think the subtext here is that we are waiting on you to sign off on the patch (or not). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Merlin Moncure escribió: >> Alvaro actually performed a review on libpq events. We are waiting on >> his feedback on our changes (based on his review) and the newly >> submitted documentation. I'll update the wiki accordingly. I wasn't >> sure if Alvaro was claiming reviewer status or not. It may be ready >> to go in as-is unless we draw any new objections. > The patch looks OK to me as it was the last time I looked at it; maybe > Tom has more comments, so I decided against just committing it. I haven't got round to looking at it in this fest. If anyone else wants to look it over, feel free. I think Josh is overrating the patch's review difficulty --- anyone who uses libpq a lot could review it, IMHO. You certainly wouldn't need any backend-internals knowledge. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Alvaro Herrera wrote: Actually a minor gripe ... should PQsetvalue be PQsetValue? :-) We were mimicing PQgetvalue, which uses a lowercase 'v'. Is there a preference, none on this end. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Merlin Moncure escribió: > On Thu, Sep 11, 2008 at 1:53 AM, Josh Berkus <[EMAIL PROTECTED]> wrote: > > Some patches have not been assigned to reviewers for various reasons. The > > following weren't assigned because they are complex and really need a > > high-end hacker or committer to take them on: > > > > libpq events > > Alvaro actually performed a review on libpq events. We are waiting on > his feedback on our changes (based on his review) and the newly > submitted documentation. I'll update the wiki accordingly. I wasn't > sure if Alvaro was claiming reviewer status or not. It may be ready > to go in as-is unless we draw any new objections. The patch looks OK to me as it was the last time I looked at it; maybe Tom has more comments, so I decided against just committing it. I admit I haven't checked the docs. Actually a minor gripe ... should PQsetvalue be PQsetValue? :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
On Thu, Sep 11, 2008 at 1:53 AM, Josh Berkus <[EMAIL PROTECTED]> wrote: > Some patches have not been assigned to reviewers for various reasons. The > following weren't assigned because they are complex and really need a > high-end hacker or committer to take them on: > > libpq events Alvaro actually performed a review on libpq events. We are waiting on his feedback on our changes (based on his review) and the newly submitted documentation. I'll update the wiki accordingly. I wasn't sure if Alvaro was claiming reviewer status or not. It may be ready to go in as-is unless we draw any new objections. Anybody who wants to look should ping Alvaro and/or Tom. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches mostly assigned ... status
Josh Berkus <[EMAIL PROTECTED]> writes: > Also, note that the following patches need performance testing on a > variety of platforms. Everyone should help with this. > GSoC Improved Hash Indexing > posix fadvises > operator restrictivity function for text search > CLUSTER using sort instead of index scan The last two of those are not code-complete so I'm not sure that performance testing is all that meaningful for them. But by all means, test the first two ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
"Greg Smith" <[EMAIL PROTECTED]> writes: > On Fri, 28 Mar 2008, Gregory Stark wrote: > >> I described which interfaces worked on Linux and Solaris based on empirical >> tests. I posted source code for synthetic benchmarks so we could test it on a >> wide range of hardware. I posted graphs based on empirical results. > > Is it possible to post whatever script that generates the graph (gnuplot?) so > people can compare the results they get to yours? Unfortunately I couldn't figure out how to get gnuplot to do this. I ended up doing it in gnumeric. I'll look into fixing it the script up to be more automatic. Maybe look at gnuplot again. > I can run some tests on smaller Linux/Solaris systems to see if they don't > show > a regression, that was my main concern about this experiment. Heikki suggested some things to test for regressions. I'll look at that. What I'm more curious about and hoped I could get data from others is whether posix_fadvise(WILL_NEED) does anything useful on various versions of FreeBSD, OSX, etc. Afaict the syscall doesn't exist at all on Solaris though, which surprises me :( -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
"Martijn van Oosterhout" <[EMAIL PROTECTED]> writes: > - I think normal index scans could benefit from this (it was measurable > when I was playing with AIO in postgres a few years back). I don't want to torture any existing code paths to get prefetching to work. Heikki suggested I take advantage of the page-at-a-time index scanning though which does sound like it could be convenient. > - I think the number of preread_count is far too small, given you get a > benefit even if you only have one spindle. > - I don't understand the ramp-up method either. So the idea is that I was deathly afraid of being accused of doing unnecessary additional I/O. So I was worried about the user doing something like SELECT ... LIMIT 1. Or worse, just starting a select and discarding it after only selecting a handful of records. I also didn't want to start the bitmap scan by doing hundreds of syscalls before we even return the first record. So I figured it's safer to read only the first block at first. Only if the user goes on to the next record do we try prefetching the next block. Each record the user reads we bump up the prefetch amount by 1 until we hit the goal value for the size raid array we're using. That also nicely spreads out the syscalls so we get one prefetch between each record returned. We also don't know how densely packed the records are on the pages. If they're densely packed then we'll have prefetched a whole bunch of records in time for the second page. But if there's only one record on a page and we're already on the second page I figured that indicated we would be reading many pages with sparsely distributed records. So we ramp up the prefetch exponentially by multiplying by two each time we move to the next page. The net result is that if you do a bitmap scan which is processing lots of pointers it'll quickly reach the point where it's prefetching the number of blocks based on effective_spindle_count. If you're just processing the first few tuples it'll only read a small number of extra pages. And if you're only processing one record it won't prefetch at all. > People spend a lot of time worrying about hundreds of posix_fadvise() > calls but you don't need anywhere near that much to be effective. With > AIO I limited the number of outstanding requests to a dozen and it was > still useful. You lose nothing by capping the number of requests at any > point. Well you leave money on the table. But yes, I'm trying to be conservative about how much to prefetch when we don't know that it's in our favour. >> I want to know if we're interested in the more invasive patch restructuring >> the buffer manager. My feeling is that we probably are eventually. But I >> wonder if people wouldn't feel more comfortable taking baby steps at first >> which will have less impact in cases where it's not being heavily used. > > I think the way it is now is neat and simple and enough for now. Thanks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
On Fri, 28 Mar 2008, Gregory Stark wrote: I described which interfaces worked on Linux and Solaris based on empirical tests. I posted source code for synthetic benchmarks so we could test it on a wide range of hardware. I posted graphs based on empirical results. Is it possible to post whatever script that generates the graph (gnuplot?) so people can compare the results they get to yours? It's when I realized I didn't have that and would have to recreate one myself that my intention to just run some quick results and ship them to you lost momentum and I never circled back to it. It would be nice if you made it easier for people to generate fancy results here as immediate gratification for them (while of course just asking them to send the raw data). I can run some tests on smaller Linux/Solaris systems to see if they don't show a regression, that was my main concern about this experiment. Some of the discussion that followed your original request for tests was kind of confusing as far as how to interpret the results as well; I think I know what to look for but certainly wouldn't mind some more guidance there, too. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
On Fri, Mar 28, 2008 at 05:34:30PM +, Gregory Stark wrote: > But what I really need is someone to read the patch and say "looks good" or > point out things they don't like. In particular, what I really, really want is > some guidance on the singular key question I asked. I was going to write all sorts of stuff, till I noticed Heikki said basically everything I was going to say: - I think normal index scans could benefit from this (it was measurable when I was playing with AIO in postgres a few years back). - The integration with the bitmap scan is good, neat even - I think the number of preread_count is far too small, given you get a benefit even if you only have one spindle. - I don't understand the ramp-up method either. People spend a lot of time worrying about hundreds of posix_fadvise() calls but you don't need anywhere near that much to be effective. With AIO I limited the number of outstanding requests to a dozen and it was still useful. You lose nothing by capping the number of requests at any point. > I want to know if we're interested in the more invasive patch restructuring > the buffer manager. My feeling is that we probably are eventually. But I > wonder if people wouldn't feel more comfortable taking baby steps at first > which will have less impact in cases where it's not being heavily used. I think the way it is now is neat and simple and enough for now. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Commitfest patches
Gregory Stark wrote: I want to know if we're interested in the more invasive patch restructuring the buffer manager. My feeling is that we probably are eventually. But I wonder if people wouldn't feel more comfortable taking baby steps at first which will have less impact in cases where it's not being heavily used. It's a lot more work to do the invasive patch and there's not much point in doing it if we're not interested in taking it when it's ready. I like the simplicity of this patch. My biggest worry is the impact on performance when the posix_fadvise calls are not helping. I'd like to see some testing of that. How expensive are the extra bufmgr hash table lookups, compared to all the other stuff that's involved in a bitmap heap scan? Another thing I wonder is whether this approach can easily be extended to other types of scans than bitmap heap scans. Normal index scans seem most interesting; they can generate a lot of random I/O. I don't see any big problems there, except again the worst-case performance: If you issue AdviseBuffer calls for all the heap tuples in an index page, in the naive way, you can issue hundreds of posix_fadvise calls. But what if they're all actually on the same page? Surely you only want to go through the buffer manager once (like we do in the scan, thanks to ReleaseAndReadBuffer()) in that case, and only call posix_fadvise once. But again, maybe you can convince me that it's a non-issue by some benchmarking. Here's an updated patch. It's mainly just a CVS update but it also includes some window dressing: an autoconf test, some documentation, and some fancy arithmetic to auto-tune the amount of prefetching based on a more real-world parameter "effective_spindle_count". I like the "effective_spindle_count". That's very intuitive. The formula to turn that into preread_pages looks complex, though. I can see the reasoning behind it, but I doubt thing behave that beautifully in the real world. (That's not important right now, we have plenty of time to tweak that after more testing.) It also includes printing out how much the prefetching target got ramped up to in the explain output -- though I'm not sure we really want that in the tree, but it's nice for performance testing. I don't understand the ramp up logic. Can you explain what that's for and how it works? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
"Bruce Momjian" <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> >> Bruce, you seem to have removed one of my three patches from the queue. I >> would actually prefer you remove the other two and put back that one. It's >> the >> one I most urgently need feedback on to continue. > > I talked to Greg on IM. The complaint was that his posix_fadvise() > patch was added to the TODO list as a URL, rather than getting him > feedback. He is getting feedback now in another thread. Sort of. The feedback I'm getting is from people who want to discuss the "easy" parameters of the patch like how much prefetching to do and when without actually reading to see what's already been done. I'm happy to discuss them as I find these things interesting too. But what I really need is someone to read the patch and say "looks good" or point out things they don't like. In particular, what I really, really want is some guidance on the singular key question I asked. I want to know if we're interested in the more invasive patch restructuring the buffer manager. My feeling is that we probably are eventually. But I wonder if people wouldn't feel more comfortable taking baby steps at first which will have less impact in cases where it's not being heavily used. It's a lot more work to do the invasive patch and there's not much point in doing it if we're not interested in taking it when it's ready. Here's an updated patch. It's mainly just a CVS update but it also includes some window dressing: an autoconf test, some documentation, and some fancy arithmetic to auto-tune the amount of prefetching based on a more real-world parameter "effective_spindle_count". It also includes printing out how much the prefetching target got ramped up to in the explain output -- though I'm not sure we really want that in the tree, but it's nice for performance testing. bitmap-preread-v9.diff.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Heikki Linnakangas wrote: > > Should we consider only telling the kernel X pages ahead, meaning when > > we are on page 10 we tell it about page 16? > > Yes. You don't want to fire off thousands of posix_fadvise calls > upfront. That'll just flood the kernel, and it will most likely ignore > any advise after the first few hundred or so. I'm not sure what the > appropriate amount of read ahead would be, though. Probably depends a > lot on the OS and hardware, and needs to be a adjustable. > > In some cases we can't easily read ahead more than a certain number of > pages. For example, in a regular index scan, we can easily fire off > posix_advise calls for all the heap pages referenced by a single index > page, but reading ahead more than that becomes much more complex. And if you read-ahead too far the pages might get pushed out of the kernel cache before you ask to read them. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Bruce Momjian wrote: Heikki Linnakangas wrote: So it has nothing to do with table size. The fadvise calls need to be (and are) limited by what can be used in the near future, and not for the whole statement. Right, I was sloppy. Instead of table size, what matters is the amount of data the scan needs to access. The point remains that if the data is already in OS cache, the posix_fadvise calls are a waste of time, regardless of how many pages ahead you advise. I now understand what posix_fadvise() is allowing us to do. posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will need a certain block in the future --- this seems much cheaper than a background reader. Yep. We know we will need the blocks, and telling the kernel can't hurt, except that there is overhead in telling the kernel. Has anyone measured how much overhead? I would be interested in a test program that read the same page over and over again from the kernel, with and without a posix_fadvise() call. Agreed, that needs to be benchmarked next. There's also some overhead in doing the buffer manager hash table lookup to check whether the page is in shared_buffers. We could reduce that by the more complex approach Greg mentioned of allocating a buffer in shared_buffers when we do posix_fadvise. Should we consider only telling the kernel X pages ahead, meaning when we are on page 10 we tell it about page 16? Yes. You don't want to fire off thousands of posix_fadvise calls upfront. That'll just flood the kernel, and it will most likely ignore any advise after the first few hundred or so. I'm not sure what the appropriate amount of read ahead would be, though. Probably depends a lot on the OS and hardware, and needs to be a adjustable. In some cases we can't easily read ahead more than a certain number of pages. For example, in a regular index scan, we can easily fire off posix_advise calls for all the heap pages referenced by a single index page, but reading ahead more than that becomes much more complex. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
On Fri, Mar 28, 2008 at 11:41:58AM -0400, Bruce Momjian wrote: > Should we consider only telling the kernel X pages ahead, meaning when > we are on page 10 we tell it about page 16? It's not so interesting for sequential reads, the kernel can work that out for itself. Disk reads are usually in blocks of at least 128K anyway, so there's a real good chance you have block 16 already. The interesting case is index scan, where you so a posix_fadvise() on the next block *before* returning the items in the current block. Then by the time you've processed these tuples, the next block will hopefully have been read in and we can proceed without delay. Or fadvising all the tuples referred to from an index page at once so the kernel can determine the optimal order to fetch them. The read() will still be in order of the tuple, but the delay will (hopefully) be less. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Commitfest patches
Gregory Stark wrote: > > Bruce, you seem to have removed one of my three patches from the queue. I > would actually prefer you remove the other two and put back that one. It's the > one I most urgently need feedback on to continue. I talked to Greg on IM. The complaint was that his posix_fadvise() patch was added to the TODO list as a URL, rather than getting him feedback. He is getting feedback now in another thread. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Heikki Linnakangas wrote: > > So it has nothing to do with table size. The fadvise calls need to be > > (and are) > > limited by what can be used in the near future, and not for the whole > > statement. > > Right, I was sloppy. Instead of table size, what matters is the amount > of data the scan needs to access. The point remains that if the data is > already in OS cache, the posix_fadvise calls are a waste of time, > regardless of how many pages ahead you advise. I now understand what posix_fadvise() is allowing us to do. posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will need a certain block in the future --- this seems much cheaper than a background reader. We know we will need the blocks, and telling the kernel can't hurt, except that there is overhead in telling the kernel. Has anyone measured how much overhead? I would be interested in a test program that read the same page over and over again from the kernel, with and without a posix_fadvise() call. Should we consider only telling the kernel X pages ahead, meaning when we are on page 10 we tell it about page 16? -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Zeugswetter Andreas OSB SD wrote: Heikki wrote: It seems that the worst case for this patch is a scan on a table that doesn't fit in shared_buffers, but is fully cached in the OS cache. In that case, the posix_fadvise calls would be a certain waste of time. I think this is a misunderstanding, the fadvise is not issued to read the whole table and is not issued for table scans at all (and if it were it would only advise for the next N pages). So it has nothing to do with table size. The fadvise calls need to be (and are) limited by what can be used in the near future, and not for the whole statement. Right, I was sloppy. Instead of table size, what matters is the amount of data the scan needs to access. The point remains that if the data is already in OS cache, the posix_fadvise calls are a waste of time, regardless of how many pages ahead you advise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Heikki wrote: > It seems that the worst case for this patch is a scan on a table that > doesn't fit in shared_buffers, but is fully cached in the OS cache. In > that case, the posix_fadvise calls would be a certain waste of time. I think this is a misunderstanding, the fadvise is not issued to read the whole table and is not issued for table scans at all (and if it were it would only advise for the next N pages). So it has nothing to do with table size. The fadvise calls need to be (and are) limited by what can be used in the near future, and not for the whole statement. e.g. N next level index pages that are relevant, or N relevant heap pages one index leaf page points at. Maybe in the index case N does not need to be limited, since we have a natural limit on how many pointers fit on one page. In general I think separate reader processes (or threads :-) that directly read into the bufferpool would be a more portable and efficient implementation. E.g. it could use ScatterGather IO. So I think we should look, that the fadvise solution is not obstruing that path, but I think it does not. Gregory wrote: >> A more invasive form of this patch would be to assign and pin a buffer when >> the preread is done. That would men subsequently we would have a pinned buffer >> ready to go and not need to go back to the buffer manager a second time. We >> would instead just "complete" the i/o by issuing a normal read call. I guess you would rather need to mark the buffer for use for this page, but let any backend that needs it first, pin it and issue the read. I think the fadviser should not pin it in advance, since he cannot guarantee to actually read the page [soon]. Rather remember the buffer and later check and pin it for the read. Else you might be blocking the buffer. But I think doing something like this might be good since it avoids issuing duplicate fadvises. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)
Gregory Stark wrote: I described which interfaces worked on Linux and Solaris based on empirical tests. I posted source code for synthetic benchmarks so we could test it on a wide range of hardware. I posted graphs based on empirical results. I posted mathematical formulas analysing just how much preread would be expected to exercise a raid array fully. I'm not sure what else I can do to effect a more thorough examination. I'm sure posix_fadvise is a win in the case where it's supposed to help: a scan that does a lot of random reads, on RAID array. And you've posted results demonstrating that. What we need to make sure is that there's no significant loss when it's not helping. It seems that the worst case for this patch is a scan on a table that doesn't fit in shared_buffers, but is fully cached in the OS cache. In that case, the posix_fadvise calls would be a certain waste of time. That could be alleviated by deciding at plan time whether to preread or not, based on effective_cache_size. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
On Fri, 2008-03-28 at 11:26 +, Gregory Stark wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > For heaven's sake. I've been posting about this for months. Any chance of getting all that together on a single Wiki page, so we can review everything? We'll need those docs after its committed anyway. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Fri, 2008-03-28 at 09:08 +, Gregory Stark wrote: > >> A more invasive form of this patch would be to assign and pin a buffer when >> the preread is done. That would men subsequently we would have a pinned >> buffer >> ready to go and not need to go back to the buffer manager a second time. We >> would instead just "complete" the i/o by issuing a normal read call. > > So if posix_fadvise did nothing or there was a longer than optimal > delay, this would be a net loss. > > You'd need reasonable evidence that the posix_fadvise facility was a win > on all platforms and recent release levels before we could agree with > that. I posted a test program and asked for others to test it on various platforms and various sized raid arrays. [EMAIL PROTECTED] was the only person who bothered to run it and he only found a 16x speedup on his hardware. http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php (unfortunately our mail archives seem to have failed to archive his message, this is my followup to it) > I think we need a more thorough examination of this area before we > commit anything. Maybe you've done this, but I haven't seen the > analysis. Can you say more, please? Or at least say what you don't know, > so other experts listening can fill in the blanks. For heaven's sake. I've been posting about this for months. Search the archives for "RAID arrays and performance", "There's random access and then there's random access", and "Bitmap index scan preread using posix_fadvise". I described which interfaces worked on Linux and Solaris based on empirical tests. I posted source code for synthetic benchmarks so we could test it on a wide range of hardware. I posted graphs based on empirical results. I posted mathematical formulas analysing just how much preread would be expected to exercise a raid array fully. I'm not sure what else I can do to effect a more thorough examination. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest patches
On Fri, 2008-03-28 at 09:08 +, Gregory Stark wrote: > A more invasive form of this patch would be to assign and pin a buffer when > the preread is done. That would men subsequently we would have a pinned buffer > ready to go and not need to go back to the buffer manager a second time. We > would instead just "complete" the i/o by issuing a normal read call. So if posix_fadvise did nothing or there was a longer than optimal delay, this would be a net loss. You'd need reasonable evidence that the posix_fadvise facility was a win on all platforms and recent release levels before we could agree with that. I think we need a more thorough examination of this area before we commit anything. Maybe you've done this, but I haven't seen the analysis. Can you say more, please? Or at least say what you don't know, so other experts listening can fill in the blanks. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers