Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
On 17.01.2011 22:33, Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: In fact, aren't there cases where the *length test* also fails? Currently, two text values are only equal of strcoll() considers them equal and the bits are the same. So this patch is safe in that regard. There is, however, some desire to loosen this. That isn't ever going to happen, unless you'd like to give up hash joins and hash aggregation on text values. You could canonicalize the string first in the hash function. I'm not sure if we have all the necessary information at hand there, but at least with some encoding/locale-specific support functions it'd be possible. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
On Tue, Jan 18, 2011 at 05:39, Tom Lane t...@sss.pgh.pa.us wrote: I haven't looked at this patch, but it seems to me that it would be reasonable to conclude A != B if the va_extsize values in the toast pointers don't agree. It's a very light-weight alternative of memcmp the byte data, but there is still the same issue -- we might have different compressed results if we use different algorithm for TOASTing. So, it would be better to apply the present patch as-is. We can improve the comparison logic over the patch in another development cycle if possible. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
On 18/01/11 18:04, Tom Lane wrote: David Fetterda...@fetter.org writes: Who's the copyright holder(s)? If it's all individual contributors, Red Hat policy is not in play. Sorry David, it was written on the company's dime. However, I doubt that Red Hat derives any value from this useful product being excluded from contrib by the choice of license - would they be receptive to the idea that it would be free marketing to have it in the main tarball/rpm/deb (etc) with merely a decision to change it GPL-BSD? regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, Jan 18, 2011 at 08:21, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jan 18, 2011 at 4:15 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I also find it weird that incoming replication connections are logged by default. In the standby, we also log streaming replication successfully connected to primary, which serves much of the same debugging purpose. That standby-side message is ok, we have a tradition of being more verbose during recovery, we also emit the restored log file \%s\ from archive message for every WAL segment restored from archive for example. We could turn log_connections into an enum, like log_statement: log_connections = 'none' # none, replication, regular, all It almost seems overkill, but probably less so than a completely new guc :) We should treat log_disconnections the same? We could keep it a boolean, but then only log disconnections for the cases that are mentioned in log_connections? It doesn't make sense to log disconnection for a connection we didn't log the connection for... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
On 01/17/2011 07:58 PM, Kääriäinen Anssi wrote: The issue I saw was this: assume you have an extension foo, containing one function, test(). CREATE EXTENSION foo; DROP FUNCTION test(); -- restricted due to dependency ALTER FUNCTION test() RENAME TO test2; DROP FUNCTION test2(); -- not restricted! The same can be done using CREATE OR REPLACE. I hope this is not an error on my part. It is possible because I had a lot of schemas and my search_path might have been wrong... The rename is an error on my part, sorry for that. Renaming can be done, but dropping is not possible even after rename. But a function in a package can be CREATE OR REPLACEd, and after that the function can be dropped. COR should be restricted in the same way as DROP is. I will check if this is still a problem with the latest patch. Another problem is that you can ALTER FUNCTION test() SET SCHEMA = something_else, and you can alter the functions search_path, which could be a problem for non-relocatable extensions. Even if the schema is changed, dropping extension / altering extension will work as expected. The function is just in different schema than the extension. But, both of these IMO fall in the category don't do that. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote: Robert Haas wrote: a quick-and-dirty attempt to limit the amount of I/O caused by hint bits. I'm still very interested in knowing what people think about that. I found the elimination of the response-time spike promising. I don't think I've seen enough data yet to feel comfortable endorsing it, though. I guess the question in my head is: how much of the lingering performance hit was due to having to go to clog and how much was due to competition with the deferred writes? If much of it is due to repeated recalculation of visibility based on clog info, I think there would need to be some way to limit how many times that happened before the hint bits were saved. What if we sped up the case where hint bits aren't set? Has anyone collected data on the actual pain points of checking visibility when hint bits aren't set? How about when setting hint bits is intentionally delayed? I wish we had some more infrastructure around the XIDCACHE counters; having that info available for people's general workloads might be extremely valuable. Even if I was to compile with it turned on, it seems the only way to get at it is via stderr, which is very hard to deal with. Lacking performance data (and for my own education), I've spent the past few hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, the three critical functions from a performance standpoint are TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and TransactionIdDidCommit. Note that all 3 can potentially be called twice; once to check xmin and once to check xmax. ISTM TransactionIdIsCurrentTransactionId is missing a shortcut: shouldn't we be able to immediately return false if the XID we're checking is older than some value, like global xmin? Maybe it's only worth checking that case if we hit a subtransaction, but if the check is faster than one or two loops through the binary search... I would think this at least warrants a one XID cache ala cachedFetchXidStatus (though it would need to be a different cache...) Another issue is that TransactionIdIsInProgress will call this function as well, unless it skips out because the transaction is RecentXmin. TransactionIdIsInProgress does a fair amount of easy checking already... the biggest thing is that if it's less than RecentXmin we bounce out immediately. If we can't bounce out immediately though, this routine gets pretty expensive unless the XID is currently running and is top-level. It's worse if there are subxacts and can be horribly bad if any subxact caches have overflowed. Note that if anything has overflowed, then we end up going to clog and possibly pg_subtrans. Finally, TransactionIdDidCommit hits clog. So the degenerate cases seem to be: - Really old XIDs. These suck because there's a good chance we'll have to read from clog. - XIDs RecontXmin that are not currently running top-level transactions. The pain here increases with subtransaction use. For the second case, if we can ensure that RecentXmin is not very old then there's generally a smaller chance that TransactionIdIsInProgress has to do a lot of work. My experience is that most systems that have a high transaction rate don't end up with a lot of long-running transactions. Storing a list of the X oldest transactions would allow us to keep RecentXmin closer to the most recent XID. For the first case, we should be able to create a more optimized clog lookup method that works for older XIDs. If we restrict this to XIDs that are older than GlobalXmin then we can simplify things because we don't have to worry about transactions that are in-progress. We also don't need to differentiate between subtransactions and their parents (though, we obviously need to figure out whether a subtransaction is considered to be committed or not). Because we're restricting this to XIDs that we know we can determine the state of, we only need to store a maximum of 1 bit per XID. That's already half the size of clog. But because we don't have to build this list on the fly (we're don't need to update it on every commit/abort as long as we know the range of XIDs that are stored), we don't have to support random writes. That means we can use a structure that's more complex to maintain than a simple bitmap. Or maybe we stick with a bitmap but compress it. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing comment in TransactionIdIsInProgress
On 18.01.2011 07:15, Jim Nasby wrote: Shouldn't the comment read If first time through? /* * If not first time through, get workspace to remember main XIDs in. We * malloc it permanently to avoid repeated palloc/pfree overhead. */ if (xids == NULL) { ... xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId)); Huh, yes, I'm amazed that no-one have noticed. I must've read that piece of code dozens of times in the last couple of years myself, and that sentence was even copy-pasted to GetConflictingVirtualXIDs() later in that file, including that thinko. Thanks, fixed both copies. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
On 01/17/2011 06:53 PM, Dimitri Fontaine wrote: Usability review: The patch implements a way to create extensions. While the patch is labeled extensions support for pg_dump, it actually implements more. It implements a new way to package and install extension, and changes contrib extensions to use the new way. Well, all there's in the patch is infrastructure to be able to issue those single lines in your dump :) Is this supposed to be used mainly by contrib and PGXN extensions? When I saw the documentation, I immediately thought that this is a nice way to package my application's stored procedures. If this is not one of the intended usages, it should be documented. I can see that this could be problematic when updating PostgreSQL and when recovering from backups. When recovering from backup, you need to have the locally created extension available. But it might be that the extension files are lost when the system went down in flames. Now, the backup is unusable (right?) until extension files can be recovered from source control or where ever they might be stored. This is why I suggested having multiple locations for the extensions. It would be easy to backup locally created extensions if those were in a single directory. All in all, I have a nervous feeling that somebody someday will have an unusable dump because they used this feature, but do not have the extension files available... Also, this part of documentation: The goal of using extensions is so that applicationpg_dump/ knows not to dump all the object definitions into your backups, but rather issue a single xref linkend=SQL-CREATEEXTENSION command. From that, it is not entirely clear to me why this is actually wanted in PostgreSQL. I suppose this will make dump/restore easier to use. But from that paragraph, I get the feeling the only advantage is that your dump will be smaller. And I still have a feeling this implements more. Not that it is a bad thing at all. It used to work this way with \i, obviously. Should the extension patch care about that and how? Do we want to RESET search_path or to manually manage it like we do for the client_min_messages and log_min_messages? It was unintuitive to me to have search_path changed by SQL command as a side effect. When using \i, not so much. When trying to load extensions which contain identical signatured functions, if the loading will succeed is dependent on the search path. If there is a conflicting function in search path (first extension loaded to schema public), then the second extension load will fail. But if the order is reversed (first extension loaded to schema foo which is not in search path), then the second extension can be loaded to the public schema. Well I think that's an expected limitation here. In the future we might want to add suport for inter-extension dependencies and conflicts, but we're certainly not there yet. OK with this as is. It is just a bit surprising that you can create the extensions in one order but not in another. There is no validation for the extension names in share/contrib/. It is possible to have extensions control files named .control, .name.control, name''.control etc. While it is stupid to name them so, it might be better to give an explicit warning / error in these cases. It is of course possible to use these extensions. I don't have a strong opinion here, will wait for some votes. Yeah, this is not a big problem in practice. Just don't name them like that. And if you do, you will find out soon enough that you made a mistake. By the way, in my comment above It is of course possible to use these extensions. - It is of course NOT possible I haven't had time to review the pg_dump part of the patch yet, will do that next (tomorrow). I hope it is OK to post a partial review... From here, it's very good! Will you continue from the git repository, where the fixes are available, or do you want a v26 already? It is easy for me to continue from the Git repo. I will next continue doing the pg_dump part of the review. I hope I have time to complete that today. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Determining client_encoding from client locale
On 17.01.2011 20:18, Peter Eisentraut wrote: That may be worth investigating, but I don't think it's related to the present patch. As I already said - not at all. The patch was ok for me. Susanne -- Susanne Ebrecht - 2ndQuadrant PostgreSQL Development, 24x7 Support, Training and Services www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Warning compiling pg_dump (MinGW, Windows XP)
Hello, Andrew. You wrote: AD On 01/17/2011 03:51 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 01/17/2011 07:18 AM, Pavel Golub wrote: So you think I should just ignore these warnings? Because I can't remember the same behaviour on 8.x branches... We've had them all along, as I said. See http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=frogmouthdt=2011-01-04%2023%3A54%3A16stg=make for the same thing in an 8.2 build. I wonder why mingw's gcc is complaining about %m when other versions of gcc do not? If you can't get it to shut up about that, there's not going to be much percentage in silencing warnings about %lld. AD We could add -Wno-format to the flags. That makes it shut up, but maybe AD we don't want to use such a sledgehammer. I want to understand the only thing. Are these warnings really dangerous? Or I should just ignore them? AD cheers AD andrew -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MingW + GCC 4.5.2 + Relocate libpq.dll = SegFault
Hello, Charlie. Can you please express your opinion about my request Warning compiling pg_dump (MinGW, Windows XP) to pgsql-hackers on Thu, 13 Jan 2011. Do you have the same warnings using MinGW environment? You wrote: CS I'm compiling postgresql 9.0.2 using msys + mingw + gcc 4.5.2 (latest CS official release from mingw). This is on Windows 7 64-bit. CS Unfortunately the built dlls, at least libpq.dll, crash if they need to CS be relocated. This happens to me when loading libpq.dll into a project CS that has a number of other dll requirements. Note this does NOT happen CS when building with gcc 3.4.5. CS Using GDB to track down the problem, the error occurs in CS __gcc_register_frame and looks to be the same error described here: CS http://old.nabble.com/Bad-DLL-relocation---reproducible-w--test-case-td18292380.html CS Note a similar sounding error described, and fixed, in newer releases of CS binutils (which mingw provides and I am using) is described here: CS http://lists-archives.org/mingw-users/11369-error-0xc005-is-now-fixed.html CS Looking at the postgresql Makefiles, the dlls are built using dllwrap. CS In particular, see src/Makefile.shlib line 413 and src/Makefile.port CS line 25. CS When I change these to not use dllwrap and instead just use gcc -shared CS the problem goes away. Therefore I'd like to propose: CS 1. Change line 413 in src/Makefile.shlib CS $(DLLWRAP) -o $@ --dllname $(shlib) $(DLLWRAP_FLAGS) --def CS $(DLL_DEFFILE) $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) CS To: CS $(CC) -shared -o $@ $(DLL_DEFFILE) $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) CS $(SHLIB_LINK) CS 2. Changle line 73 in src/Makefile.port: CS $(DLLWRAP) -o $@ --def $*.def $ $(LDFLAGS) $(LDFLAGS_SL) $(BE_DLLLIBS) CS To: CS $(CC) -shared -o $@ $*.def $ $(LDFLAGS) $(LDFLAGS_SL) $(BE_DLLLIBS) CS I tested this by intentionally compiling libpq.dll using the link flag CS -Wl,--image-base to make sure that its base address conflicted with CS another dll loaded by my program. With the proposed changes, windows CS successfully relocated libpq.dll without causing a segmentation fault. CS I don't claim to know why dllwrap is producing dlls that can't be CS relocated while gcc -shared is. But it appears to have been deprecated CS back in 2006 according to the binutils mailing list: CS http://www.mail-archive.com/bug-binutils@gnu.org/msg01470.html CS So between being deprecated and not producing relocatable dlls, it seems CS like its best to stop using dllwrap. If this seems like a reasonable CS change I can put together a quick patch if that helps. CS Thanks, CS Charlie -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Mon, Jan 17, 2011 at 10:30 PM, Fujii Masao masao.fu...@gmail.com wrote: Though I haven't seen the core part of the patch (i.e., ReceiveTarFile, etc..) yet, here is the comments against others. Here are another comments: When I untar the tar file taken by pg_basebackup, I got the following messages: $ tar xf base.tar tar: Skipping to next header tar: Archive contains obsolescent base-64 headers tar: Error exit delayed from previous errors Is this a bug? This happens only when I create $PGDATA by using initdb -X (i.e., I relocated the pg_xlog directory elsewhere than $PGDATA). + if (compresslevel 0) + { + snprintf(fn, sizeof(fn), %s/%s.tar.gz, tardir, PQgetvalue(res, rownum, 0)); + ztarfile = gzopen(fn, wb); Though I'm not familiar with zlib, isn't gzsetparams() required here? +#ifdef HAVE_LIBZ + if (!tarfile !ztarfile) +#else + if (!tarfile) +#endif + { + fprintf(stderr, _(%s: could not create file \%s\: %s\n), + progname, fn, strerror(errno)); Instead of strerror, get_gz_error seems required when using zlib. + if (!res || PQresultStatus(res) != PGRES_COPY_OUT) The check for !res is not needed since PQresultStatus checks that. + r = PQgetCopyData(conn, copybuf, 0); + if (r == -1) Since -1 of PQgetCopyData might indicate an error, in this case, we would need to call PQgetResult?. ReceiveTarFile seems refactorable by using GZWRITE and GZCLOSE macros. + fprintf(stderr, _(%s: could not write to file '%s': %m\n), %m in fprintf is portable? Can't you change '%s' to \%s\ for consistency? + /* +* Make sure we're unpacking into an empty directory +*/ + verify_dir_is_empty_or_create(current_path); Can pg_basebackup take a backup of $PGDATA including a tablespace directory, without an error? The above code seems to prevent that + if (compresslevel = 0) + { + fprintf(stderr, _(%s: invalid compression level \%s\\n), It's better to check compresslevel 9 here. +/* + * Print a progress report based on the global variables. If verbose output + * is disabled, also print the current file name. Typo: s/disabled/enabled I request new option which specifies whether pg_start_backup executes immediate checkpoint or not. Currently it always executes immediate one. But I'd like to run smoothed one in busy system. What's your opinion? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote: 1) Forks are 'per relation' but the distinct estimators are 'per column' (or 'per group of columns') so I'm not sure whether the file should contain all the estimators for the table, or if there should be one fork for each estimator. The former is a bit difficult to manage, the latter somehow breaks the current fork naming convention. Yeah, when I looked at the fork stuff I was disappointed to find out there's essentially no support for dynamically adding forks. There's two other possible uses for that I can think of: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. - Dynamic forks would make it possible to do a column-store database, or at least something approximating one. Without some research, there's no way to know if either of the above makes sense; but without dynamic forks we're pretty much dead in the water. So I wonder what it would take to support dynamically adding forks... Interesting ideas, but a bit out of scope. I think I'll go with one fork containing all the estimators for now, although it might be inconvenient in some cases. I was thinking about rebuilding a single estimator with increased precision - in that case the size changes so that all the other data has to be shifted. But this won't be very common (usually all the estimators will be rebuilt at the same time), and it's actually doable. So the most important question is how to intercept the new/updated rows, and where to store them. I think each backend should maintain it's own private list of new records and forward them only in case of commit. Does that sound reasonable? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
On 01/18/2011 11:42 AM, Dimitri Fontaine wrote: I've fixed the case by having the code remember the function's extension if any, and restore it along with the other dependencies. The only question here is should CREATE OR REPLACE be allowed. I just realized this could present a new problem. If I am not mistaken, when loading from dump, you suddenly get the extension's version back, not the one you defined in CREATE OR REPLACE. If this is the case, this should NOT be allowed. And by the same reasoning, ALTER FUNCTION [anything] should not be allowed either. Or at least then the function/(or any object for that matter) should be restored somehow from the backup, not from the extension files. I still haven't had the time to start pg_dump reviewing, so I haven't verified if this is really a problem. But I suspect so... - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, Jan 18, 2011 at 5:17 PM, Magnus Hagander mag...@hagander.net wrote: We should treat log_disconnections the same? We could keep it a boolean, but then only log disconnections for the cases that are mentioned in log_connections? It doesn't make sense to log disconnection for a connection we didn't log the connection for... Maybe true. But, at least for me, it's more intuitive to provide both as enum parameters. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
Anssi Kääriäinen anssi.kaariai...@thl.fi writes: Is this supposed to be used mainly by contrib and PGXN extensions? When I saw the documentation, I immediately thought that this is a nice way to package my application's stored procedures. If this is not one of the intended usages, it should be documented. I can see that this could be problematic when updating PostgreSQL and when recovering from backups. Sure, private application's stored procedure are meant to be fully supported by the extension's facility. When recovering from backup, you need to have the locally created extension available. But it might be that the extension files are lost when the system went down in flames. Now, the backup is unusable (right?) until extension files can be recovered from source control or where ever they might be stored. This is why I suggested having multiple locations for the extensions. It would be easy to backup locally created extensions if those were in a single directory. All in all, I have a nervous feeling that somebody someday will have an unusable dump because they used this feature, but do not have the extension files available... Well, as said in the documentation, extensions are to be used for objects you are *not* maintaining in your database, but elsewhere. Typically you are maintaining your stored procedure code in some VCS, and you have some packaging (cat *.sql my-ext.sql in the Makefile would be the simpler to imagine). So yes if you tell PostgreSQL that your procedures are managed elsewhere so that their code won't be part of your dumps, and then fail to manage them anywhere else, you're hosed. My viewpoint here is that when you want to use extensions, you want to package them for your OS of choice (mine is debian, and I've been working on easing things on this side too with pg_buildext to be found in the postgresql-server-dev-all package). If you're an occasional user just wanting to use new shining facilities… well, think twice… Also, this part of documentation: The goal of using extensions is so that applicationpg_dump/ knows not to dump all the object definitions into your backups, but rather issue a single xref linkend=SQL-CREATEEXTENSION command. So maybe we want to extend this little sentence to add the warnings around it, that if you're not packaging your extension's delivery to your servers, you're likely shooting yourself in the foot? From that, it is not entirely clear to me why this is actually wanted in PostgreSQL. I suppose this will make dump/restore easier to use. But from that paragraph, I get the feeling the only advantage is that your dump will be smaller. And I still have a feeling this implements more. Not that it is a bad thing at all. Well try to upgrade from 8.4 to 9.0 with some extensions installed in there and used in your tables. Pick any contrib, such as hstore or ltree or cube, or some external code, such as ip4r or prefix or such. Then compare to upgrade with the extension facility, and tell me what's best :) Hint: the dump contains the extension's script, but does not contain the shared object file. If you're upgrading the OS and the contribs, as you often do when upgrading major versions, you're hosed. You would think that pg_upgrade alleviate the concerns here, but you still have to upgrade manually the extension's .so. All in all, those extensions (contribs, ip4r, etc) are *not* maintained in your database and pretending they are by putting their scripts in your dumps is only building problems. This patch aims to offer a solution here. It used to work this way with \i, obviously. Should the extension patch care about that and how? Do we want to RESET search_path or to manually manage it like we do for the client_min_messages and log_min_messages? It was unintuitive to me to have search_path changed by SQL command as a side effect. When using \i, not so much. Agreed. Will code the manual management way (that is already used for log settings) later today, unless told to see RESET and how to do that at the statement level rather than the transaction one. It is easy for me to continue from the Git repo. I will next continue doing the pg_dump part of the review. I hope I have time to complete that today. Excellent, will try to continue following your pace :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
2011/1/18 Greg Smith g...@2ndquadrant.com: Bruce Momjian wrote: Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00? The idea of having a dead period doing no work at all between write phase and sync phase may have some merit. I don't have enough test data yet on some more fundamental issues in this area to comment on whether that smaller optimization would be valuable. It may be a worthwhile concept to throw into the sequencing. Are we able to have some pause without strict rules like 'stop for 30 sec' ? (case : my hardware is very good and I can write 400MB/sec with no interrupt, XXX IOPS) I wonder if we are not going to have issue with RAID firmware + BBU + linux scheduler because we are adding 'unexpected' behavior in the middle. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
2011/1/18 Fujii Masao masao.fu...@gmail.com: On Mon, Jan 17, 2011 at 10:30 PM, Fujii Masao masao.fu...@gmail.com wrote: Though I haven't seen the core part of the patch (i.e., ReceiveTarFile, etc..) yet, here is the comments against others. Here are another comments: When I untar the tar file taken by pg_basebackup, I got the following messages: $ tar xf base.tar tar: Skipping to next header tar: Archive contains obsolescent base-64 headers tar: Error exit delayed from previous errors Is this a bug? This happens only when I create $PGDATA by using initdb -X (i.e., I relocated the pg_xlog directory elsewhere than $PGDATA). + if (compresslevel 0) + { + snprintf(fn, sizeof(fn), %s/%s.tar.gz, tardir, PQgetvalue(res, rownum, 0)); + ztarfile = gzopen(fn, wb); Though I'm not familiar with zlib, isn't gzsetparams() required here? +#ifdef HAVE_LIBZ + if (!tarfile !ztarfile) +#else + if (!tarfile) +#endif + { + fprintf(stderr, _(%s: could not create file \%s\: %s\n), + progname, fn, strerror(errno)); Instead of strerror, get_gz_error seems required when using zlib. + if (!res || PQresultStatus(res) != PGRES_COPY_OUT) The check for !res is not needed since PQresultStatus checks that. + r = PQgetCopyData(conn, copybuf, 0); + if (r == -1) Since -1 of PQgetCopyData might indicate an error, in this case, we would need to call PQgetResult?. ReceiveTarFile seems refactorable by using GZWRITE and GZCLOSE macros. + fprintf(stderr, _(%s: could not write to file '%s': %m\n), %m in fprintf is portable? Can't you change '%s' to \%s\ for consistency? + /* + * Make sure we're unpacking into an empty directory + */ + verify_dir_is_empty_or_create(current_path); Can pg_basebackup take a backup of $PGDATA including a tablespace directory, without an error? The above code seems to prevent that + if (compresslevel = 0) + { + fprintf(stderr, _(%s: invalid compression level \%s\\n), It's better to check compresslevel 9 here. +/* + * Print a progress report based on the global variables. If verbose output + * is disabled, also print the current file name. Typo: s/disabled/enabled I request new option which specifies whether pg_start_backup executes immediate checkpoint or not. Currently it always executes immediate one. But I'd like to run smoothed one in busy system. What's your opinion? *if* it is possible, this is welcome, the checkpoint hit due to pg_start_backup is visible, even outside pg_basebasckup. (it sync everything then it blast cache memory) -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
On 01/18/2011 12:11 PM, Anssi Kääriäinen wrote: The only question here is should CREATE OR REPLACE be allowed. I just realized this could present a new problem. If I am not mistaken, when loading from dump, you suddenly get the extension's version back, not the one you defined in CREATE OR REPLACE. If this is the case, this should NOT be allowed. And by the same reasoning, ALTER FUNCTION [anything] should not be allowed either. Or at least then the function/(or any object for that matter) should be restored somehow from the backup, not from the extension files. I still haven't had the time to start pg_dump reviewing, so I haven't verified if this is really a problem. But I suspect so... Ok, verified at least for CREATE OR REPLACE, ALTER FUNCTION RENAME and ALTER FUNCTION SET search_path. You will get the extensions version back when restoring from plain sql dump, not the CORed function, rename is lost and same for search_path. I suspect this is a problem for any object type supported in extensions. But unfortunately I do not have time to verify that. One more problem with pg_dump. If you have CREATE EXTENSION in you extensions .sql file, you will have problems restoring. I know it is stupid to do so, but still CREATE EXTENSION inside CREATE EXTENSION should be disallowed, as it is possible you find out too late that this is stupid thing to do. Also, the functions created in the recursive CREATE EXTENSION will be dumped, and the dependencies are not created correctly. Unfortunately I have used up all the time I have for reviewing this patch. I can arrange some more time, maybe late this week, maybe a bit later. So, I do not have time to do the pg_dump part review in full detail right now. Still, I hope the work I have done is helpful. Should I write up a post that contains all the current outstanding issues in one post, or is it enough to just link this thread in the CommitFest application? - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
Anssi Kääriäinen anssi.kaariai...@thl.fi writes: The only question here is should CREATE OR REPLACE be allowed. I just Yes. Think ALTER EXTENSION UPGRADE, the next patch in the series (already proposed for this CF too). realized this could present a new problem. If I am not mistaken, when loading from dump, you suddenly get the extension's version back, not the one you defined in CREATE OR REPLACE. If this is the case, this should NOT be allowed. And by the same reasoning, ALTER FUNCTION [anything] should not be allowed either. Or at least then the function/(or any object for that matter) should be restored somehow from the backup, not from the extension files. Well ideally those will get into extension's upgrade scripts, not be typed interactively by superusers. But I don't think we should limit the capability of superusers to quickly fix a packaging mistake… I still haven't had the time to start pg_dump reviewing, so I haven't verified if this is really a problem. But I suspect so... Both a problem when badly used and a good thing to have sometime, as in the upgrade scripts :) -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
Anssi Kääriäinen anssi.kaariai...@thl.fi writes: Ok, verified at least for CREATE OR REPLACE, ALTER FUNCTION RENAME and ALTER FUNCTION SET search_path. You will get the extensions version back when restoring from plain sql dump, not the CORed function, rename is lost and same for search_path. I suspect this is a problem for any object type supported in extensions. But unfortunately I do not have time to verify that. Yes it's the same, if you edit an extension's object directly the edited version is not in the dump. There's provision to have those objects in the dump but the function to make it so is currently restricted to only be called from the extension's script. pg_extension_flag_dump() changes the dependency type between an extension's and one of its object, given by oid. The default behavior of an extension is to skip objects in pg_dump, but in some cases that's not what you want to achieve, as an extension author. If your extension provides user data (in a table, for example), then flag this table so that it's part of the backups, like so: SELECT pg_extension_flag_dump('schema.tablename'::regclass); Maybe we should open this function so that it's usable even outside of the extension's script, but I'm not keen on doing so. Again, editing the extension's objects out of the scripts is still limited to superusers and not the only way to shoot yourself in the foot… One more problem with pg_dump. If you have CREATE EXTENSION in you extensions .sql file, you will have problems restoring. I know it is stupid to do so, but still CREATE EXTENSION inside CREATE EXTENSION should be disallowed, as it is possible you find out too late that this is stupid thing to do. Also, the functions created in the recursive CREATE EXTENSION will be dumped, and the dependencies are not created correctly. That will be handled later, it's called inter-extension dependencies. We said we won't address that yet… Unfortunately I have used up all the time I have for reviewing this patch. I can arrange some more time, maybe late this week, maybe a bit later. So, I do not have time to do the pg_dump part review in full detail right now. Still, I hope the work I have done is helpful. Very much so, thanks a lot for the time you already spent on it! Should I write up a post that contains all the current outstanding issues in one post, or is it enough to just link this thread in the CommitFest application? I'd appreciate a list of yet-to-fix items. What I have is the search_path issue where CREATE EXTENSION foo; can leave it changed for the current session, I intend to fix that later today. Other than that, I have no further already agreed on code fix to make. What's your list? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
Robert Haas wrote: Idea #4: For ext3 filesystems that like to dump the entire buffer cache instead of only the requested file, write a little daemon that runs alongside of (and completely indepdently of) PostgreSQL. Every 30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and closes the file, thus dumping the cache and preventing a ridiculous growth in the amount of data to be sync'd at checkpoint time. Today's data suggests this problem has been resolved in the latest kernels. I saw the giant flush/series of small flushes pattern quite easily on the CentOS5 system I last did heavy pgbench testing on. The one I'm testing now has kernel 2.6.23 (Ubuntu 10.04), and it doesn't show it at all. Here's what a bad checkpoint looks like on this system: LOG: checkpoint starting: xlog DEBUG: checkpoint sync: number=1 file=base/24746/36596.8 time=7651.601 msec DEBUG: checkpoint sync: number=2 file=base/24746/36506 time=0.001 msec DEBUG: checkpoint sync: number=3 file=base/24746/36596.2 time=1891.695 msec DEBUG: checkpoint sync: number=4 file=base/24746/36596.4 time=7431.441 msec DEBUG: checkpoint sync: number=5 file=base/24746/36515 time=0.216 msec DEBUG: checkpoint sync: number=6 file=base/24746/36596.9 time=4422.892 msec DEBUG: checkpoint sync: number=7 file=base/24746/36596.12 time=954.242 msec DEBUG: checkpoint sync: number=8 file=base/24746/36237_fsm time=0.002 msec DEBUG: checkpoint sync: number=9 file=base/24746/36503 time=0.001 msec DEBUG: checkpoint sync: number=10 file=base/24746/36584 time=41.401 msec DEBUG: checkpoint sync: number=11 file=base/24746/36596.7 time=885.921 msec DEBUG: checkpoint sync: number=12 file=base/24813/30774 time=0.002 msec DEBUG: checkpoint sync: number=13 file=base/24813/24822 time=0.005 msec DEBUG: checkpoint sync: number=14 file=base/24746/36801 time=49.801 msec DEBUG: checkpoint sync: number=15 file=base/24746/36601.2 time=610.996 msec DEBUG: checkpoint sync: number=16 file=base/24746/36596 time=16154.361 msec DEBUG: checkpoint sync: number=17 file=base/24746/36503_vm time=0.001 msec DEBUG: checkpoint sync: number=18 file=base/24746/36508 time=0.000 msec DEBUG: checkpoint sync: number=19 file=base/24746/36596.10 time=9759.898 msec DEBUG: checkpoint sync: number=20 file=base/24746/36596.3 time=3392.727 msec DEBUG: checkpoint sync: number=21 file=base/24746/36237 time=0.150 msec DEBUG: checkpoint sync: number=22 file=base/24746/36596.11 time=9153.437 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 1057833 of relation base/24746/36596 [800 more of these] DEBUG: checkpoint sync: number=23 file=base/24746/36601.1 time=48697.179 msec DEBUG: could not forward fsync request because request queue is full DEBUG: checkpoint sync: number=24 file=base/24746/36597 time=0.080 msec DEBUG: checkpoint sync: number=25 file=base/24746/36237_vm time=0.001 msec DEBUG: checkpoint sync: number=26 file=base/24813/24822_fsm time=0.001 msec DEBUG: checkpoint sync: number=27 file=base/24746/36503_fsm time=0.000 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 20619 of relation base/24746/36601 DEBUG: checkpoint sync: number=28 file=base/24746/36506_fsm time=0.000 msec DEBUG: checkpoint sync: number=29 file=base/24746/36596_vm time=0.040 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 278967 of relation base/24746/36596 DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 1582400 of relation base/24746/36596 DEBUG: checkpoint sync: number=30 file=base/24746/36596.6 time=0.002 msec DEBUG: checkpoint sync: number=31 file=base/24813/11647 time=0.004 msec DEBUG: checkpoint sync: number=32 file=base/24746/36601 time=201.632 msec DEBUG: checkpoint sync: number=33 file=base/24746/36801_fsm time=0.001 msec DEBUG: checkpoint sync: number=34 file=base/24746/36596.5 time=0.001 msec DEBUG: checkpoint sync: number=35 file=base/24746/36599 time=0.000 msec DEBUG: checkpoint sync: number=36 file=base/24746/36587 time=0.005 msec DEBUG: checkpoint sync: number=37 file=base/24746/36596_fsm time=0.001 msec DEBUG: checkpoint sync: number=38 file=base/24746/36596.1 time=0.001 msec DEBUG: checkpoint sync: number=39 file=base/24746/36801_vm time=0.001 msec LOG: checkpoint complete: wrote 9515 buffers (29.0%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.409 s, sync=111.615 s, total=144.052 s; sync files=39, longest=48.697 s, average=2.853 s Here the file that's been brutally delayed via backend contention is #23, after already seeing quite long delays on the earlier ones. That I've never seen with earlier kernels running ext3. This is good in that it makes it more likely a spread sync approach that works on XFS will also work on these newer kernels with ext4. Then the only group we wouldn't be able to help if that works the ext3
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 03:14, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 17, 2011 at 10:50 PM, Magnus Hagander mag...@hagander.net wrote: + printf(_( -D, --pgdata=directory receive base backup into directory\n)); + printf(_( -T, --tardir=directory receive base backup into tar files\n + stored in specified directory\n)); + printf(_( -Z, --compress=0-9 compress tar output\n)); + printf(_( -l, --label=label set backup label\n)); + printf(_( -p, --progress show progress information\n)); + printf(_( -v, --verbose output verbose messages\n)); Can we list those options in alphabetical order as other tools do? Certainly. But it makes more sense to have -D and -T next to each other, I think - they'd end up way elsewhere. Perhaps we need a group taht says target? I agree with you if we end up choosing -D and -T for a target rather than pg_dump-like options I proposed. Yeah. What do others think between tohse two options? -D/-T followed by directory, or -D dir and -Fformat? Requiring PQfinish() might be more reasonable since it will give you a log on the server if you don't, but I'm not convinced that's necessary either? At least it's required for each password-retry. Otherwise, previous connections remain during backup. You can see this problem easily Oh yeah, I've put that one in my git branch already. by repeating the password input in pg_basebackup. LOG: could not send data to client: Connection reset by peer LOG: could not send data to client: Broken pipe FATAL: base backup could not send data, aborting backup As you said, if PQfinish is not called at exit(1), the above messages would be output. Those messages look ugly and should be suppressed whenever we *can*. Also I believe other tools would do that. Yeah, agreed. I'll add that, shouldn't be too hard. + keywords[2] = fallback_application_name; + values[2] = pg_basebackup; Using the progname variable seems better rather than the fixed word pg_basebackup. I don't think so - that turns into argv[0], which means that if you use the full path of the program (/usr/local/pgsql/bin/pg_basebackup for example), the entire path goes into fallback_application_name - not just the program name. No. get_progname extracts the actual name. Hmm. I see it does. I wonder what I did to make that not work. Then I agree with the change :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 10:49, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 17, 2011 at 10:30 PM, Fujii Masao masao.fu...@gmail.com wrote: Though I haven't seen the core part of the patch (i.e., ReceiveTarFile, etc..) yet, here is the comments against others. Here are another comments: Thanks! These are all good and useful comments! When I untar the tar file taken by pg_basebackup, I got the following messages: $ tar xf base.tar tar: Skipping to next header tar: Archive contains obsolescent base-64 headers tar: Error exit delayed from previous errors Is this a bug? This happens only when I create $PGDATA by using initdb -X (i.e., I relocated the pg_xlog directory elsewhere than $PGDATA). Interesting. What version of tar and what platform? I can't reproduce that here... It certainly is a bug, that should not happen. + if (compresslevel 0) + { + snprintf(fn, sizeof(fn), %s/%s.tar.gz, tardir, PQgetvalue(res, rownum, 0)); + ztarfile = gzopen(fn, wb); Though I'm not familiar with zlib, isn't gzsetparams() required here? Uh. It certainly is! I clearly forgot it there... +#ifdef HAVE_LIBZ + if (!tarfile !ztarfile) +#else + if (!tarfile) +#endif + { + fprintf(stderr, _(%s: could not create file \%s\: %s\n), + progname, fn, strerror(errno)); Instead of strerror, get_gz_error seems required when using zlib. Indeed it is. I think it needs to be this: #ifdef HAVE_LIBZ if (compresslevel 0 !ztarfile) { /* Compression is in use */ fprintf(stderr, _(%s: could not create compressed file \%s\: %s\n), progname, fn, get_gz_error(ztarfile)); exit(1); } else #endif { /* Either no zlib support, or zlib support but compresslevel = 0 */ if (!tarfile) { fprintf(stderr, _(%s: could not create file \%s\: %s\n), progname, fn, strerror(errno)); exit(1); } } + if (!res || PQresultStatus(res) != PGRES_COPY_OUT) The check for !res is not needed since PQresultStatus checks that. Hah. I still keep doing that from old habit. I know you've pointed that out before, with libpqwalreceiver :-) + r = PQgetCopyData(conn, copybuf, 0); + if (r == -1) Since -1 of PQgetCopyData might indicate an error, in this case, we would need to call PQgetResult?. Uh, -1 means end of data, no? -2 means error? ReceiveTarFile seems refactorable by using GZWRITE and GZCLOSE macros. You mean the ones from pg_dump? I don't think so. We can't use gzwrite() with compression level 0 on the tar output, because it will still write a gz header. With pg_dump, that is ok because it's our format, but with a .tar (without .gz) I don't think it is. at least that's how I interpreted the function. + fprintf(stderr, _(%s: could not write to file '%s': %m\n), %m in fprintf is portable? Hmm. I just assumed it was because we use it elsewhere, but I now see we only really use it for ereport() stuff. Bottom line is, I don't know - perhaps it needs to be changed to use strerror()? Can't you change '%s' to \%s\ for consistency? Yeah, absolutely. Clearly I was way inconsistent, and it got worse with some copy/paste :-( + /* + * Make sure we're unpacking into an empty directory + */ + verify_dir_is_empty_or_create(current_path); Can pg_basebackup take a backup of $PGDATA including a tablespace directory, without an error? The above code seems to prevent that Uh, how do you mean it woul dprevent that? It requires that the directory you're writing the tablespace to is empty or nonexistant, but that shouldn't prevent a backup, no? It will prevent you from overwriting things with your backup, but that's intentional - if you don't need the old dir, just remove it. + if (compresslevel = 0) + { + fprintf(stderr, _(%s: invalid compression level \%s\\n), It's better to check compresslevel 9 here. Agreed (well, check for both of them of course) +/* + * Print a progress report based on the global variables. If verbose output + * is disabled, also print the current file name. Typo: s/disabled/enabled Indeed. I request new option which specifies whether pg_start_backup executes immediate checkpoint or not. Currently it always executes immediate one. But I'd like to run smoothed one in busy system. What's your opinion? Yeah that sounds like a good idea. Shouldn't be too hard to do (will reuqire a backend patch as well, of course). Should we use -f for fast? Though
Re: [HACKERS] auto-sizing wal_buffers
Fujii Masao wrote: +/* Minimum setting used for a lower bound on wal_buffers */ +#define XLOG_BUFFER_MIN4 Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin? XLOG_BUFFER_MIN is not used anywhere for now. That's a typo; will fix. + if (XLOGbuffers (XLOGbuffersMin * 2)) + XLOGbuffers = XLOGbuffersMin * 2; + } Why is the minimum value 64kB only when wal_buffers is set to -1? This seems confusing for users. That's because the current default on older versions is 64kB. Since the automatic selection is going to be the new default, I hope, I don't want it to be possible it will pick a number smaller than the default of older versions. So the automatic lower limit is 64kB, while the actual manually set lower limit remains 32kB, as before. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] auto-sizing wal_buffers
Josh Berkus wrote: I think we can be more specific on that last sentence; is there even any *theoretical* benefit to settings above 16MB, the size of a WAL segment? Certainly there have been no test results to show any. There was the set Marti just reminded about. The old wording suggested big enough to fix a single transaction was big enough, and that let to many people being confused and setting this parameter way too low. Since it's possible I'm wrong about 16MB being the upper limit, I didn't want the wording to specifically rule out people testing that size to see what happens. We believe there's never any advantage due to the forced wal segment switch, but having test results to the contrary floating around keeps me from being too aggressive in how the wording there goes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Warning compiling pg_dump (MinGW, Windows XP)
On 01/18/2011 04:40 AM, Pavel Golub wrote: AD We could add -Wno-format to the flags. That makes it shut up, but maybe AD we don't want to use such a sledgehammer. I want to understand the only thing. Are these warnings really dangerous? Or I should just ignore them? As I pointed out previously, we have had these warnings for years and years. I am not aware of a single issue that has been reported as arising from them. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
On 01/18/2011 01:03 PM, Dimitri Fontaine wrote: I'd appreciate a list of yet-to-fix items. What I have is the search_path issue where CREATE EXTENSION foo; can leave it changed for the current session, I intend to fix that later today. Other than that, I have no further already agreed on code fix to make. What's your list? There is only documentation fixes, and I am not sure if even those are agreed to be necessary. It might be good if the documentation contained: - A warning that you need to have the files for your extensions readily available to be able to restore from a dump. This might be obvious, but better safe than sorry... - A warning that you will be restored to the extension's version if you ALTER or CREATE OR REPLACE a function. From the current documentation, it is maybe too easy to miss these risks. I am seeing this from non-experienced user's angle, and thus see these as potential foot guns. Other than that, I don't think there is anything more. I am a little nervous of restoring to extension's version of a function when the function has been CREATE OR REPLACEd, but that might be just me over thinking this. Also, from the previous posts, there is just the control file naming issue, and the issue of load order if two extensions contain similarly named and signatured functions. But these were agreed to be issues not needing any further work. Now, I need help what to do next. Should I leave the status as Needs Review as the pg_dump part is almost completely non-reviewed? And then attach this thread as a comment? Or as a review? - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
Excerpts from Dimitri Fontaine's message of mar ene 18 07:01:55 -0300 2011: Anssi Kääriäinen anssi.kaariai...@thl.fi writes: It used to work this way with \i, obviously. Should the extension patch care about that and how? Do we want to RESET search_path or to manually manage it like we do for the client_min_messages and log_min_messages? It was unintuitive to me to have search_path changed by SQL command as a side effect. When using \i, not so much. If the CREATE EXTENSION stuff all works in a transaction, perhaps it would be easier if you used SET LOCAL. At the end of the transaction it would reset to the original value automatically. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Mon, Jan 17, 2011 at 16:27, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2011-01-17 at 16:20 +0100, Magnus Hagander wrote: On Mon, Jan 17, 2011 at 16:18, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 17, 2011 at 8:55 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. I don't like those names at all :( I agree. I don't think your original names are bad, as long as they're well-documented. I sympathize with Simon's desire to make it clear that these use the replication framework, but I really don't want the command names to be that long. Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 12:40, Magnus Hagander mag...@hagander.net wrote: On Tue, Jan 18, 2011 at 10:49, Fujii Masao masao.fu...@gmail.com wrote: Yeah that sounds like a good idea. Shouldn't be too hard to do (will reuqire a backend patch as well, of course). Should we use -f for fast? Though that may be an unfortunate overload of the usual usecase for -f, so maybe -c fast|slow for checkpoint fast/slow? Was easy, done with -c fast|slow. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW: Extensions support for pg_dump
Anssi Kääriäinen anssi.kaariai...@thl.fi writes: On 01/18/2011 01:03 PM, Dimitri Fontaine wrote: I'd appreciate a list of yet-to-fix items. What I have is the search_path issue where CREATE EXTENSION foo; can leave it changed for the current session, I intend to fix that later today. After some reading of backend/catalog/namespace.c and some testing, my take would be to have extension authors use the following form when that is necessary: SET LOCAL search_path TO public; Now, none of the contribs use that form (they all are relocatable except for adminpack which forces its objects to get installed in pg_catalog), so I've only updated the documentation. There is only documentation fixes, and I am not sure if even those are agreed to be necessary. It might be good if the documentation contained: - A warning that you need to have the files for your extensions readily available to be able to restore from a dump. This might be obvious, but better safe than sorry... Added a § about that in the docs. - A warning that you will be restored to the extension's version if you ALTER or CREATE OR REPLACE a function. Well I don't want to encourage people to manually edit any extension objects directly, so I've added a note about not doing that. Other than that, I don't think there is anything more. I am a little nervous of restoring to extension's version of a function when the function has been CREATE OR REPLACEd, but that might be just me over thinking this. Also, from Well with the next patch in the series, ALTER EXTENSION UPGRADE, you will have a way to edit extension's objects and have the new version installed next time, but there's no magic bullet here, it means work to do by the extension's author (preparing upgrade scripts and new version's install-from-scratch scripts). the previous posts, there is just the control file naming issue, and the issue of load order if two extensions contain similarly named and signatured functions. But these were agreed to be issues not needing any further work. Yes, extension dependencies and conflicts are not meant to be covered yet. Now, I need help what to do next. Should I leave the status as Needs Review as the pg_dump part is almost completely non-reviewed? And then attach this thread as a comment? Or as a review? My guess would be to leave it as Needs Review and add this thread as a review. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Excerpts from Magnus Hagander's message of mar ene 18 08:40:50 -0300 2011: On Tue, Jan 18, 2011 at 10:49, Fujii Masao masao.fu...@gmail.com wrote: + fprintf(stderr, _(%s: could not write to file '%s': %m\n), %m in fprintf is portable? Hmm. I just assumed it was because we use it elsewhere, but I now see we only really use it for ereport() stuff. Bottom line is, I don't know - perhaps it needs to be changed to use strerror()? Some libc's (such as glibc) know about %m, others presumably don't (it's a GNU extension, according to my manpage). ereport does the expansion by itself, see expand_fmt_string(). Probably just using strerror() is the easiest. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011: Hello Postgres Hackers, In reference to this todo item about clustering system table indexes, ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) I have been studying the system tables to see which would benefit from clustering. I have some index suggestions and a question if you have a moment. Wow, this is really old stuff. I don't know if this is really of any benefit, given that these catalogs are loaded into syscaches anyway. Furthermore, if you cluster at initdb time, they will soon lose the ordering, given that updates move tuples around and inserts put them anywhere. So you'd need the catalogs to be re-clustered once in a while, and I don't see how you'd do that (except by asking the user to do it, which doesn't sound so great). I think you need some more discussion on the operational details before engaging in the bootstrap bison stuff (unless you just want to play with Bison for educational purposes, of course, which is always a good thing to do). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011: Hello Postgres Hackers, BTW whatever you do, don't start a new thread by replying to an existing message and just changing the subject line. It will mess up the threading for some readers, and some might not even see your message. Compose a fresh message instead. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 3:47 AM, Jim Nasby j...@nasby.net wrote: On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote: Robert Haas wrote: a quick-and-dirty attempt to limit the amount of I/O caused by hint bits. I'm still very interested in knowing what people think about that. I found the elimination of the response-time spike promising. I don't think I've seen enough data yet to feel comfortable endorsing it, though. I guess the question in my head is: how much of the lingering performance hit was due to having to go to clog and how much was due to competition with the deferred writes? If much of it is due to repeated recalculation of visibility based on clog info, I think there would need to be some way to limit how many times that happened before the hint bits were saved. What if we sped up the case where hint bits aren't set? Has anyone collected data on the actual pain points of checking visibility when hint bits aren't set? How about when setting hint bits is intentionally delayed? I wish we had some more infrastructure around the XIDCACHE counters; having that info available for people's general workloads might be extremely valuable. Even if I was to compile with it turned on, it seems the only way to get at it is via stderr, which is very hard to deal with. Lacking performance data (and for my own education), I've spent the past few hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, the three critical functions from a performance standpoint are TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and TransactionIdDidCommit. Note that all 3 can potentially be called twice; once to check xmin and once to check xmax. hint bits give you two benefits: you don't have to lwlock the clog and you don't have to go look them up. a lookup is either a lru cache lookup or an i/o lookup on the clog. the cost of course is extra writing out the bits. in most workloads they are not even noticed but in particular cases they are an i/o multiplier. a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). an ideal testing environment to compare would be a mature database (full clog) with some verifiable performance tests and a mixed olap/oltp workload. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
On Tue, Jan 18, 2011 at 09:14:41PM +1300, Mark Kirkwood wrote: On 18/01/11 18:04, Tom Lane wrote: David Fetterda...@fetter.org writes: Who's the copyright holder(s)? If it's all individual contributors, Red Hat policy is not in play. Sorry David, it was written on the company's dime. However, I doubt that Red Hat derives any value from this useful product being excluded from contrib by the choice of license - would they be receptive to the idea that it would be free marketing to have it in the main tarball/rpm/deb (etc) with merely a decision to change it GPL-BSD? I'm guessing there's a Policy® at Red Hat that software made on its dime be GPL (v2, I'd guess), and that getting an exception would involve convening its board or similarly drastic action. Is that about right? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Excerpts from Magnus Hagander's message of mar ene 18 10:47:03 -0300 2011: Ok, thanks for clarifying. I've updated to use strerror(). Guess it's time for another patch, PFA :-) Thanks ... Message nitpick: + if (compresslevel 0) + { + fprintf(stderr, + _(%s: this build does not support compression\n), + progname); + exit(1); + } pg_dump uses the following wording: WARNING: archive is compressed, but this installation does not support compression -- no data will be available\n So perhaps yours should s/build/installation/ Also, in messages of this kind, + if (gzsetparams(ztarfile, compresslevel, Z_DEFAULT_STRATEGY) != Z_OK) + { + fprintf(stderr, _(%s: could not set compression level %i\n), + progname, compresslevel); Shouldn't you also be emitting the gzerror()? ... oh I see you're already doing it for most gz calls. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 15:49, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Magnus Hagander's message of mar ene 18 10:47:03 -0300 2011: Ok, thanks for clarifying. I've updated to use strerror(). Guess it's time for another patch, PFA :-) Thanks ... Message nitpick: + if (compresslevel 0) + { + fprintf(stderr, + _(%s: this build does not support compression\n), + progname); + exit(1); + } pg_dump uses the following wording: WARNING: archive is compressed, but this installation does not support compression -- no data will be available\n So perhaps yours should s/build/installation/ That shows up when a the *archive* is compressed, though? There are a number of other cases that use build in the backend, such as: src/backend/utils/misc/guc.c: errmsg(assertion checking is not supported by this build))); src/backend/utils/misc/guc.c:errmsg(Bonjour is not supported by this build))); src/backend/utils/misc/guc.c:errmsg(SSL is not supported by this build))); Also, in messages of this kind, + if (gzsetparams(ztarfile, compresslevel, Z_DEFAULT_STRATEGY) != Z_OK) + { + fprintf(stderr, _(%s: could not set compression level %i\n), + progname, compresslevel); Shouldn't you also be emitting the gzerror()? ... oh I see you're already doing it for most gz calls. It's not clear from the zlib documentation I have that gzerror() works after a gzsetparams(). Do you have any docs that say differently by any chance? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: FDW API
On Sun, 16 Jan 2011 01:55:19 +0100 Jan Urbański wulc...@wulczer.org wrote: snip In general, the feature looks great and I hope it'll make it into 9.1. And it we'd get the possibility to write FDW handlers in other PLs than C, it would rock so hard... I'm going to mark this a Waiting for Author because of the typos, the BeginScan/EndScan issue, and the nested loop stopping halfway issue. The rest are suggestions or just thoughts, and if you don't see them as justified, I'll mark the next patch Ready for Committer. Thanks a lot for the comments. I've (hopefully) fixed issues above. Please find attached patches. == patch list == 1) 20110118-no_fdw_perm_check.patch - This patch is not included in last post. This had been proposed on 2011-01-05 first, but maybe has not been reviewd yet. I re-propose this patch for SQL standard conformance. This patch removes permission check that requires USAGE on the foreign-data wrapper at CREATE FOREIGN TABLE. Please see original post for details. http://archives.postgresql.org/message-id/20110105145206.30fd.69899...@metrosystems.co.jp 2) 20110118-fdw_catalog_lookup.patch - This patch adds GetForeignTables. Fixed lack of pg_foreign_table.h inclusion. 3) 20110118-fdw_handler.patch - This patch adds support for HANDLER option to FOREIGN DATA WRAPPER object. 4) 20110118-foreign_scan.patch - This patch adds ForeignScan executor node and FDW API hooks based on Heikki's proposal. As Itagaki-san suggested on 2010-12-21, FDW must generate information for EXPLAIN VERBOSE every PlanRelScan() call. It's better to avoid such overhead, so new EXPLAIN hook would be needed. I'll try to make it cleaner. And I'll reply to the rest of your comments. We could use comments about how to return tuples from Iterate and how to finish returning them. I had to look at the example to figure out that you need ExecClearTuple(slot) in your last Iterate. If Iterate's interface were to change, we could just return NULL instead of a tuple to say that we're done. I've added some comments for FDW-developer to fdwapi.h, though they wouldn't be enough. We could be a bit more explicit about how to allocate objects, for instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc, will it not go away too soon, or too late (IOW leak)? For that example, the answer is no. Objects are allocated in MessageContext if you don't switch context and released when the query has been finished. I agree that more documentation or comments for FDW-developers should be added. Maybe PlanNative should get the foreign table OID, not the server OID, to resemble PlanRelScan more. The server OID is just a syscache lookup away, anyway. You would missed a case that multiple foreign tables are used in a query. Main purpose of PlanNative is to support pass-through execution of remote query. In pass-through mode, you can use syntax as if you have directly connected to external server, so can't use PostgreSQL's parser. If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called, but EndScan is. That's weird, and I noticed because I got a segfault when EndScan tried to free things that BeginScan allocated. Maybe just don't call EndScan for EXPLAIN? Fixed to not call EndScan if it was EXPLAIN execution. Regards, -- Shigeru Hanada 20110118-no_fdw_perm_check.patch.gz Description: Binary data 20110118-fdw_catalog_lookup.patch.gz Description: Binary data 20110118-fdw_handler.patch.gz Description: Binary data 20110118-foreign_scan.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
On Tue, 18 Jan 2011 15:17:12 +0900 Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Sat, Jan 15, 2011 at 08:35, Shigeru HANADA han...@metrosystems.co.jp wrote: Interface of NextCopyFrom() is fixed to return HeapTuple, to support tableoid without any change to TupleTableSlot. 3) 20110114-copy_export_HeapTupe.patch This patch fixes interface of NextCopyFrom() to return results as HeapTuple. I think file_fdw can return tuples in virtual tuples forms, and ForeignNext() calls ExecMaterializeSlot() to store tableoid. Thanks for the comment. I've fixed file_fdw to use tts_values and tts_isnull to receive results from NextCopyFrom, and store virtual tuple in the slot. Attached patch requires FDW API patches and copy_export-20110114.patch. Please see also: http://archives.postgresql.org/message-id/20110119002615.8316.69899...@metrosystems.co.jp And also cost estimation of file_fdw is simplified along your comments below. On Tue, 21 Dec 2010 21:32:17 +0900 Itagaki Takahiro itagaki.takah...@gmail.com wrote: #3. Why do you re-open a foreign table in estimate_costs() ? Since the caller seems to have the options for them, you can pass them directly, no? In addition, passing a half-initialized fplan to estimate_costs() is a bad idea. If you think it is an OUT parameter, the OUT params should be *startup_cost and *total_cost. In that message, you also pointed out that FDW must generate explainInfo in every PlanRelScan call even if the planning is not for EXPLAIN. I'll try to defer generating explainInfo until EXPLAIN VERBOSE really uses it. It might need new hook point in expalain.c, though. Regards, -- Shigeru Hanada 20110118-file_fdw.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, Jan 18, 2011 at 10:56, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jan 18, 2011 at 5:17 PM, Magnus Hagander mag...@hagander.net wrote: We should treat log_disconnections the same? We could keep it a boolean, but then only log disconnections for the cases that are mentioned in log_connections? It doesn't make sense to log disconnection for a connection we didn't log the connection for... Maybe true. But, at least for me, it's more intuitive to provide both as enum parameters. Is there *any* usecase for setting them differently though? (other than connections being something and disconnectoins being none?) If not, aren't we just encouraging people to configure in a way that makes no sense? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
David Fetter da...@fetter.org writes: I'm guessing there's a Policy® at Red Hat that software made on its dime be GPL (v2, I'd guess), and that getting an exception would involve convening its board or similarly drastic action. It's company policy, and while it *might* be possible to get an exception, the effort involved would far exceed the benefit we'd get out of it. Moreover, despite Mark's creative argument, I really doubt that Red Hat would perceive any benefit to themselves in making an exception. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
Magnus Hagander mag...@hagander.net writes: Is there *any* usecase for setting them differently though? I can't believe we're still engaged in painting this bikeshed. Let's just control it off log_connections and have done. BTW, what about log_disconnections --- does a walsender emit a message according to that? If not, why not? If we go through with something fancy on the connection side, are we going to invent the same extra complexity for log_disconnections? And if we do, what happens when they're set inconsistently? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
Itagaki Takahiro itagaki.takah...@gmail.com writes: On Tue, Jan 18, 2011 at 05:39, Tom Lane t...@sss.pgh.pa.us wrote: I haven't looked at this patch, but it seems to me that it would be reasonable to conclude A != B if the va_extsize values in the toast pointers don't agree. It's a very light-weight alternative of memcmp the byte data, but there is still the same issue -- we might have different compressed results if we use different algorithm for TOASTing. Which makes it a lightweight waste of cycles. So, it would be better to apply the present patch as-is. No, I don't think so. Has any evidence been submitted that that part of the patch is of benefit? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Magnus Hagander mag...@hagander.net writes: Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It's still totally unclear what they do. How about pg_receive_log etc? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: It's a very light-weight alternative of memcmp the byte data, but there is still the same issue -- we might have different compressed results if we use different algorithm for TOASTing. Which makes it a lightweight waste of cycles. So, it would be better to apply the present patch as-is. No, I don't think so. Has any evidence been submitted that that part of the patch is of benefit? I think you might be mixing up what's actually in the patch with another idea that was proposed but isn't actually in the patch. The patch itself does nothing controversial. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, Jan 18, 2011 at 2:15 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I also find it weird that incoming replication connections are logged by default. In the standby, we also log streaming replication successfully connected to primary, which serves much of the same debugging purpose. Oh, good point. All right, I withdraw my objection. Let's just make it all controlled by log_connections and go home. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011: Hello Postgres Hackers, In reference to this todo item about clustering system table indexes, ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) I have been studying the system tables to see which would benefit from clustering. I have some index suggestions and a question if you have a moment. Wow, this is really old stuff. I don't know if this is really of any benefit, given that these catalogs are loaded into syscaches anyway. Furthermore, if you cluster at initdb time, they will soon lose the ordering, given that updates move tuples around and inserts put them anywhere. So you'd need the catalogs to be re-clustered once in a while, and I don't see how you'd do that (except by asking the user to do it, which doesn't sound so great). The idea of the TODO seems to have been to set the default clustering to something reasonable. That doesn't necessarily seem like a bad idea even if we can't automatically maintain the cluster order, but there's some question in my mind whether we'd get any measurable benefit from the clustering. Even on a database with a gigantic number of tables, it seems likely that the relevant system catalogs will stay fully cached and, as you point out, the system caches will further blunt the impact of any work in this area. I think the first thing to do would be to try to come up with a reproducible test case where clustering the tables improves performance. If we can't, that might mean it's time to remove this TODO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: No, I don't think so. Has any evidence been submitted that that part of the patch is of benefit? I think you might be mixing up what's actually in the patch with another idea that was proposed but isn't actually in the patch. The patch itself does nothing controversial. Oh, I misread Itagaki-san's comment to imply that that *was* in the patch. Maybe I should go read it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_fsync open_sync test
Greg Smith wrote: Bruce Momjian wrote: Is there a value to this test_fsync test? Compare open_sync with different sizes: (This is designed to compare the cost of one large sync'ed write and two smaller sync'ed writes.) open_sync 16k write 242.563 ops/sec 2 open_sync 8k writes 752.752 ops/sec It compares the cost of doing larger vs. two smaller open_sync writes. Might be some value for determining things like what the optimal WAL block size to use is. All these tests are kind of hard to use effectively still, I'm not sure if it's time to start trimming tests yet until we've made more progress on interpreting results first. OK, thanks for the feedback. I just wanted to make sure it wasn't a stupid test. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
On Tue, Jan 18, 2011 at 11:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: No, I don't think so. Has any evidence been submitted that that part of the patch is of benefit? I think you might be mixing up what's actually in the patch with another idea that was proposed but isn't actually in the patch. The patch itself does nothing controversial. Oh, I misread Itagaki-san's comment to imply that that *was* in the patch. Maybe I should go read it. Perhaps. :-) While you're at it you might commit it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
On Tue, Jan 18, 2011 at 10:53 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Fetter da...@fetter.org writes: I'm guessing there's a PolicyŽ at Red Hat that software made on its dime be GPL (v2, I'd guess), and that getting an exception would involve convening its board or similarly drastic action. It's company policy, and while it *might* be possible to get an exception, the effort involved would far exceed the benefit we'd get out of it. Moreover, despite Mark's creative argument, I really doubt that Red Hat would perceive any benefit to themselves in making an exception. I'm not sure why they'd care, but it certainly doesn't seem worth spending the amount of time arguing about it that we are. David and Mark are, of course, free to spend their time petitioning Red Hat for relicensing if they are so inclined, but they aren't entitled to tell you how to spend yours. And even if they were, I would hope that they'd want you to spend it committing patches rather than arguing with your employer about relicensing of a utility that's freely available anyway and of use to 0.1% of our user base. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 11:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, I misread Itagaki-san's comment to imply that that *was* in the patch. Maybe I should go read it. Perhaps. :-) While you're at it you might commit it. :-) Yeah, as penance I'll take this one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It's still totally unclear what they do. How about pg_receive_log etc? I agree with whomever said using wal is better than log to be unambiguous. So it'd be pg_receive_wal and pg_receive_base_backup then? Votes from others? (it's easy to rename so far, so I'll keep plugging away under the name pg_basebackup based on Fujii-sans comments until such a time as we have a reasonable consensus :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
2011/1/18 Magnus Hagander mag...@hagander.net: On Tue, Jan 18, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Actually, after some IM chats, I think pg_streamrecv should be renamed, probably to pg_walstream (or pg_logstream, but pg_walstream is a lot more specific than that) pg_stream_log pg_stream_backup Those seem better. Tom, would those solve your concerns about it being clear which side they are on? Or do you think you'd still risk reading them as the sending side? It's still totally unclear what they do. How about pg_receive_log etc? I agree with whomever said using wal is better than log to be unambiguous. So it'd be pg_receive_wal and pg_receive_base_backup then? Votes from others? (it's easy to rename so far, so I'll keep plugging away under the name pg_basebackup based on Fujii-sans comments until such a time as we have a reasonable consensus :-) pg_receive_wal is good for me. pg_receive_base_backup in french base is a shortcut for database. here we backup the whole cluster, I would suggest pg_receive_cluster(_backup ?) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Tue, Jan 18, 2011 at 4:53 AM, t...@fuzzy.cz wrote: So the most important question is how to intercept the new/updated rows, and where to store them. I think each backend should maintain it's own private list of new records and forward them only in case of commit. Does that sound reasonable? At the risk of sounding demoralizing, nothing about this proposal sounds very promising to me, and that sounds like a particularly bad idea. What happens if the transaction updates a billion records? Or even a million records? Are you going to store all of those in backend-local memory until commit time? Or spool them in a potentially-gigantic disk file somewhere? That memory allocation - or file - could grow to be larger than the size of the entire database in the worst case. And COMMIT could take an awfully long time if it has to spool megabytes or gigabytes of data off to some other process. And what happens if there's a crash after the COMMIT but before all that data is sent? The estimates become permanently wrong? And are we doing all of this just to get a more accurate estimate of ndistinct? For the amount of effort that it will probably take to get this working at all, you could probably implement index-only scans and have enough bandwidth left over to tackle global temporary tables. And unless I'm missing something, the chances that the performance consequences will be tolerable are pretty much zero. And it would only benefit the tiny fraction of users for whom bad n_distinct estimates cause bad plans, and then the even tinier percentage of those who can't conveniently fix it by using the manual override that we already have in place - which presumably means people who have gigantic tables that are regularly rewritten with massive changes in the data distribution that affect plan choice. Is that more than the empty set? Maybe the idea here is that this wouldn't fix just ndistinct estimates but would also help with multi-column statistics. Even if that's the case, I think it's almost certainly a dead end from a performance standpoint. Some kind of manual ANALYZE process that can be invoked when needed to scan the entire table would be painful but maybe acceptable for certain people with a problem they can't fix any other way, but doing this automatically for everyone seems like a really bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
2011/1/18 Robert Haas robertmh...@gmail.com: On Tue, Jan 18, 2011 at 10:53 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Fetter da...@fetter.org writes: I'm guessing there's a PolicyŽ at Red Hat that software made on its dime be GPL (v2, I'd guess), and that getting an exception would involve convening its board or similarly drastic action. It's company policy, and while it *might* be possible to get an exception, the effort involved would far exceed the benefit we'd get out of it. Moreover, despite Mark's creative argument, I really doubt that Red Hat would perceive any benefit to themselves in making an exception. I'm not sure why they'd care, but it certainly doesn't seem worth spending the amount of time arguing about it that we are. David and Mark are, of course, free to spend their time petitioning Red Hat for relicensing if they are so inclined, but they aren't entitled to tell you how to spend yours. And even if they were, I would hope that they'd want you to spend it committing patches rather than arguing with your employer about relicensing of a utility that's freely available anyway and of use to 0.1% of our user base. still good, thanks Tom and RH to have push it nearest other PostgreSQL. tools. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Jan 18, 2011, at 8:24 AM, Merlin Moncure wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). If you're not finding much benefit to hint bits, that's *very* interesting. Everything I outlined certainly looks like a pretty damn expensive code path; it's really surprising that hint bits don't help. I think it would be very valuable to profile the cost of the different code paths involved in the HeapTupleSatisfies* functions, even if the workload is just pgBench. an ideal testing environment to compare would be a mature database (full clog) with some verifiable performance tests and a mixed olap/oltp workload. We're working on setting such a framework up. Unfortunately it will only be 8.3 to start, but we hope to be on 9.0 soon. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan 17, 2011, at 8:11 PM, Robert Haas wrote: On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby j...@nasby.net wrote: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. That seems like an interesting idea, but I actually don't see why it would be any more efficient, and it seems like you'd end up reinventing things like vacuum and free space map management. The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap. - Dynamic forks would make it possible to do a column-store database, or at least something approximating one. I've been wondering whether we could do something like this by treating a table t with columns pk, a1, a2, a3, b1, b2, b3 as two tables t1 and t2, one with columns pk, a1, a2, a3 and the other with columns pk, b1, b2, b3. SELECT * FROM t would be translated into SELECT * FROM t1, t2 WHERE t1.pk = t2.pk. Possibly, but you'd be paying tuple overhead twice, which is what I was looking to avoid with forks. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby j...@nasby.net wrote: On Jan 17, 2011, at 8:11 PM, Robert Haas wrote: On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby j...@nasby.net wrote: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. That seems like an interesting idea, but I actually don't see why it would be any more efficient, and it seems like you'd end up reinventing things like vacuum and free space map management. The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap. How's that different from what vacuum does on a TOAST table now? - Dynamic forks would make it possible to do a column-store database, or at least something approximating one. I've been wondering whether we could do something like this by treating a table t with columns pk, a1, a2, a3, b1, b2, b3 as two tables t1 and t2, one with columns pk, a1, a2, a3 and the other with columns pk, b1, b2, b3. SELECT * FROM t would be translated into SELECT * FROM t1, t2 WHERE t1.pk = t2.pk. Possibly, but you'd be paying tuple overhead twice, which is what I was looking to avoid with forks. What exactly do you mean by tuple overhead? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby j...@nasby.net wrote: On Jan 17, 2011, at 8:11 PM, Robert Haas wrote: On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby j...@nasby.net wrote: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. That seems like an interesting idea, but I actually don't see why it would be any more efficient, and it seems like you'd end up reinventing things like vacuum and free space map management. The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap. How's that different from what vacuum does on a TOAST table now? Even more to the point: Jim hasn't provided one single reason to suppose that this would be better-performing than the existing approach. It looks to me like a large amount of work, and loss of on-disk compatibility, for nothing at all except the sake of change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan 18, 2011, at 11:24 AM, Robert Haas wrote: On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby j...@nasby.net wrote: On Jan 17, 2011, at 8:11 PM, Robert Haas wrote: On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby j...@nasby.net wrote: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. That seems like an interesting idea, but I actually don't see why it would be any more efficient, and it seems like you'd end up reinventing things like vacuum and free space map management. The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap. How's that different from what vacuum does on a TOAST table now? TOAST vacuum is currently an entirely separate vacuum. It might run at the same time as the main table vacuum, but it still has all the work that would be associated with vacuuming a table with the definition of a toast table. In fact, at one point vacuuming toast took two passes: the first deleted the toast rows that were no longer needed, then you had to go back and vacuum those deleted rows. - Dynamic forks would make it possible to do a column-store database, or at least something approximating one. I've been wondering whether we could do something like this by treating a table t with columns pk, a1, a2, a3, b1, b2, b3 as two tables t1 and t2, one with columns pk, a1, a2, a3 and the other with columns pk, b1, b2, b3. SELECT * FROM t would be translated into SELECT * FROM t1, t2 WHERE t1.pk = t2.pk. Possibly, but you'd be paying tuple overhead twice, which is what I was looking to avoid with forks. What exactly do you mean by tuple overhead? typedef struct HeapTupleHeaderData. With only two tables it might not be that bad, depending on the fields. Beyond two tables it's almost certainly a loser. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Jan 18, 2011, at 11:32 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby j...@nasby.net wrote: On Jan 17, 2011, at 8:11 PM, Robert Haas wrote: On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby j...@nasby.net wrote: - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup. That seems like an interesting idea, but I actually don't see why it would be any more efficient, and it seems like you'd end up reinventing things like vacuum and free space map management. The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap. How's that different from what vacuum does on a TOAST table now? Even more to the point: Jim hasn't provided one single reason to suppose that this would be better-performing than the existing approach. It looks to me like a large amount of work, and loss of on-disk compatibility, for nothing at all except the sake of change. Yes, I was only pointing out that there were possible uses for allowing a variable number of forks per relation if Tomas felt the need to go that direction. Changing toast would certainly require some very convincing arguments as to the benefits. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 3:47 AM, Jim Nasby j...@nasby.net wrote: On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote: Robert Haas wrote: a quick-and-dirty attempt to limit the amount of I/O caused by hint bits. I'm still very interested in knowing what people think about that. I found the elimination of the response-time spike promising. I don't think I've seen enough data yet to feel comfortable endorsing it, though. I guess the question in my head is: how much of the lingering performance hit was due to having to go to clog and how much was due to competition with the deferred writes? If much of it is due to repeated recalculation of visibility based on clog info, I think there would need to be some way to limit how many times that happened before the hint bits were saved. What if we sped up the case where hint bits aren't set? Has anyone collected data on the actual pain points of checking visibility when hint bits aren't set? I think that's worth looking into, but I don't have any present plan to actually do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote: a few weeks back I hacked an experimental patch that removed the hint bit action completely. the results were very premature and/or incorrect, but my initial findings suggested that hint bits might not be worth the cost from performance standpoint. i'd like to see some more investigation in this direction before going with a complex application mechanism (although that would be beneficial vs the status quo). I think it's not very responsible to allege that hint bits aren't providing a benefit without providing the patch that you used and the tests that you ran. This is a topic that needs careful analysis, and I think that saying hint bits don't provide a benefit... maybe... doesn't do anything but confuse the issue. How about doing some tests with the patch from my OP and posting the results? If removing hint bits entirely doesn't degrade performance, then surely the less-drastic approach I've taken here ought to be OK too. But in my testing, it didn't look too good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Tue, Jan 18, 2011 at 12:32 PM, Jim Nasby j...@nasby.net wrote: How's that different from what vacuum does on a TOAST table now? TOAST vacuum is currently an entirely separate vacuum. It might run at the same time as the main table vacuum, but it still has all the work that would be associated with vacuuming a table with the definition of a toast table. In fact, at one point vacuuming toast took two passes: the first deleted the toast rows that were no longer needed, then you had to go back and vacuum those deleted rows. I don't know whether it still works that way or not, but if it does, fixing it could perhaps be done with far less drastic surgery than what you're proposing here. Possibly, but you'd be paying tuple overhead twice, which is what I was looking to avoid with forks. What exactly do you mean by tuple overhead? typedef struct HeapTupleHeaderData. With only two tables it might not be that bad, depending on the fields. Beyond two tables it's almost certainly a loser. A struct definition by itself doesn't cause overhead. Are you talking about storage on disk? CPU usage for MVCC visibility testing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, Jan 18, 2011 at 12:03 PM, Magnus Hagander mag...@hagander.net wrote: So it'd be pg_receive_wal and pg_receive_base_backup then? Votes from others? (it's easy to rename so far, so I'll keep plugging away under the name pg_basebackup based on Fujii-sans comments until such a time as we have a reasonable consensus :-) I like pg_receive_wal. pg_receive_base_backup I would be inclined to shorten to pg_basebackup or pg_streambackup, but I just work here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Jan 18, 2011, at 6:35 AM, Alvaro Herrera wrote: Wow, this is really old stuff. I don't know if this is really of any benefit, given that these catalogs are loaded into syscaches anyway. The benefit is educational primarily. I was looking for a todo list item that would expose me to the system tables. Learning the data model of a new system is always step 1 for me. So that one was perfect as it would have me study and consider each one to determine if there was any benefit from clustering on its initial load into cache. Furthermore, if you cluster at initdb time, they will soon lose the ordering, given that updates move tuples around and inserts put them anywhere. So you'd need the catalogs to be re-clustered once in a while, and I don't see how you'd do that (except by asking the user to do it, which doesn't sound so great). I did discover that last night. I'm used to databases that keep up their clustering. One that falls apart over time is distinctly strange. And the way you guys do your re-clustering logic is overkill if just a few rows are out of place. On the upside, a call to mass re-clustering goes and updates all the clustered indexes in the system and that includes these tables. Will have to study auto-vacuum as well to consider that. (unless you just want to play with Bison for educational purposes, of course, which is always a good thing to do). Pretty much, yeah. - Simone Aiken
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011: Hello Postgres Hackers, In reference to this todo item about clustering system table indexes, ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) Wow, this is really old stuff. I don't know if this is really of any If we can't, that might mean it's time to remove this TODO. When I'm learning a new system I like to first learn how to use it, second learn its data model, third start seriously looking at the code. So that Todo is ideal for my learning method. If there is something else that would also involve studying all the system tables it would also be great. For example, I noticed we have column level comments on the web but not in the database itself. This seems silly. Why not have the comments in the database and have the web query the tables of template databases for the given versions? That way \d+ pg_tablename would provide instant gratification for users. And we all like our gratification to be instant. They could be worked into The .h files though as inserts to pg_description they wouldn't provide an excuse to learn bison. I'm open to other suggestions as well. -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
Excerpts from Magnus Hagander's message of mar ene 18 11:53:55 -0300 2011: On Tue, Jan 18, 2011 at 15:49, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Magnus Hagander's message of mar ene 18 10:47:03 -0300 2011: Ok, thanks for clarifying. I've updated to use strerror(). Guess it's time for another patch, PFA :-) Thanks ... Message nitpick: + if (compresslevel 0) + { + fprintf(stderr, + _(%s: this build does not support compression\n), + progname); + exit(1); + } pg_dump uses the following wording: WARNING: archive is compressed, but this installation does not support compression -- no data will be available\n So perhaps yours should s/build/installation/ That shows up when a the *archive* is compressed, though? There are a number of other cases that use build in the backend, such as: src/backend/utils/misc/guc.c: errmsg(assertion checking is not supported by this build))); src/backend/utils/misc/guc.c: errmsg(Bonjour is not supported by this build))); src/backend/utils/misc/guc.c: errmsg(SSL is not supported by this build))); Hmm. I think I'd s/build/installation/ on all those messages for consistency. Also, in messages of this kind, + if (gzsetparams(ztarfile, compresslevel, Z_DEFAULT_STRATEGY) != Z_OK) + { + fprintf(stderr, _(%s: could not set compression level %i\n), + progname, compresslevel); Shouldn't you also be emitting the gzerror()? ... oh I see you're already doing it for most gz calls. It's not clear from the zlib documentation I have that gzerror() works after a gzsetparams(). Do you have any docs that say differently by any chance? Ah, no. I was reading zlib.h, which is ambiguous as you say: ZEXTERN int ZEXPORT gzsetparams OF((gzFile file, int level, int strategy)); /* Dynamically update the compression level or strategy. See the description of deflateInit2 for the meaning of these parameters. gzsetparams returns Z_OK if success, or Z_STREAM_ERROR if the file was not opened for writing. */ ZEXTERN const char * ZEXPORT gzerror OF((gzFile file, int *errnum)); /* Returns the error message for the last error which occurred on the given compressed file. errnum is set to zlib error number. If an error occurred in the file system and not in the compression library, errnum is set to Z_ERRNO and the application may consult errno to get the exact error code. ... but a quick look at the code says that it sets gz_stream-z_err which is what gzerror returns: int ZEXPORT gzsetparams (file, level, strategy) gzFile file; int level; int strategy; { gz_stream *s = (gz_stream*)file; if (s == NULL || s-mode != 'w') return Z_STREAM_ERROR; /* Make room to allow flushing */ if (s-stream.avail_out == 0) { s-stream.next_out = s-outbuf; if (fwrite(s-outbuf, 1, Z_BUFSIZE, s-file) != Z_BUFSIZE) { s-z_err = Z_ERRNO; } s-stream.avail_out = Z_BUFSIZE; } return deflateParams ((s-stream), level, strategy); } -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
David Fetter wrote: I think I haven't communicated clearly what I'm suggesting, which is that we ship with both an UPSERT and a MERGE, the former being ugly, crude and simple, and the latter festooned with dire warnings about isolation levels and locking. I don't know that I completely agree with the idea that the UPSERT version should always be crude and the MERGE one necessarily heavy from a performance perspective. However, I do feel you raise a legitimate point that once the hard stuff is done, and the locking issues around SQL proper MERGE sorted, having an UPSERT/REPLACE synonym that maps to it under the hood may be a useful way to provide a better user experience. The way I see this, the right goal is to first provide the full spec behavior with good performance, and get all that plumbing right. There's nothing that says we can't provide an easier syntax than the admittedly complicated MERGE one to the users though. (I am tempted to make a joke about how you could probaby So, as for this patch...there's about half a dozen significant open issues with the current code, along with a smattering of smaller ones. The problems remaining are deep enough that I think it would be challenging to work through them for this CF under the best conditions. And given that we haven't heard from Boxuan since early December, we're definitely understaffed to tackle major revisions. My hope was that we'd get an updated patch from him before the CF deadline. Even without it, maybe get some more internal help here. Given my focus on checkpoint issues, Simon on Sync Rep, and Dimitri still on his Extensions push, that's second part isn't going to happen. I am marking MERGE officially Returned With Feedback for 9.1. Lots of progress made, much better community understanding of the unresolved issues now than when we started, but not in shape to go into this release. Since we still have some significant interest here in getting this finished, I'll see if I can get put together a better game-plan for how to get this actually done for 9.2 in time to discuss at release planning time. The main thing that's become much more obvious to me just recently is how the remaining issues left here relate to the true serialization work, so worrying about that first is probably the right order anyway. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
Robert Haas robertmh...@gmail.com writes: I think you may be confused about what the patch does - currently, pages with hint bit changes are considered dirty, period. Therefore, they are written whenever any other dirty page would be written: by the background writer cleaning scan, at checkpoints, and when a backend must write a dirty buffer before reallocating it to hold a different page. The patch keeps the first of these and changes the second two While I was trying to performance-test the texteq patch, it occurred to me that this proposed hint-bit change has got a serious drawback. To wit, that it will totally destroy reproducibility of any performance test that involves table scans. Right now, you know that you can take hint bits out of the equation by doing a vacuum analyze and checkpoint; after that, all hint bits in the table are known to be set and written to disk. Then you can get on with comparing the effects of some patch or other. With the proposed patch, it will never be clear whether all the hint bits are set, because the patch specifically removes the deterministic ways to get a hint bit written out. So you'll never be very sure whether a performance difference you think you see is real, or whether one case or the other got affected by extra clog lookups. It's hard enough already to be sure about performance changes on the order of 1%, but this will make it impossible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add \dL to show languages
Hi Josh, Nope, I do not have any better ideas than DO Blocks?. Everything looks good with the exception one bug now. \dL foo * QUERY ** SELECT l.lanname AS Name, pg_catalog.pg_get_userbyid(l.lanowner) as Owner, l.lanpltrusted AS Trusted FROM pg_catalog.pg_language lWHERE l.lanname ~ '^(foo)$' ORDER BY 1; ** ERROR: syntax error at or near l LINE 4: FROM pg_catalog.pg_language lWHERE l.lanname ~ '^(foo)$' I believe the fix is to move \n from before the WHERE clause to after the FROM, and from before ORDER BY to after WHERE. Fix this bug and I believe this patch is ready for a committer. PS. You added some trailing withspace after printACLColumn, A recommendation if you want to avoid it is to either have a git commit hook which checks for that and/or have colouring of git diffs so you can see it marked in red. I use both. :) Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup for streaming base backups
On Tue, 2011-01-18 at 18:03 +0100, Magnus Hagander wrote: So it'd be pg_receive_wal and pg_receive_base_backup then? OK for me. Maybe even pg_receive_wal_stream Don't see any reason why command names can't be long. We have many function names already that long. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pgbench to the MAXINT
Euler Taveira de Oliveira wrote: (i) If we want to support and scale factor greater than 21474 we have to convert some columns to bigint; it will change the test. From the portability point it is a pity but as we have never supported it I'm not too worried about it. Why? Because it will use bigint columns only if the scale factor is greater than 21474. Is it a problem? I don't think so because generally people compare tests with the same scale factor. (ii) From the performance perspective, we need to test if the modifications don't impact performance. I don't create another code path for 64-bit modifications (it is too ugly) and I'm afraid some modifications affect the 32-bit performance. I'm in a position to test it though because I don't have a big machine ATM. Greg, could you lead these tests? (iii) I decided to copy scanint8() (called strtoint64 there) from backend (Robert suggestion [1]) because Tom pointed out that strtoll() has portability issues. I replaced atoi() with strtoint64() but didn't do any performance tests. (i): Completely agreed. (ii): There is no such thing as a big machine that is 32 bits now; anything that's 32 is a tiny system here in 2011. What I can do is check for degredation on the only 32-bit system I have left here, my laptop. I'll pick a sensitive test case and take a look. (iii) This is an important thing to test, particularly given it has the potential to impact 64-bit results too. Thanks for picking this up again and finishing the thing off. I'll add this into my queue of performance tests to run and we can see if this is worth applying. Probably take a little longer than the usual CF review time. But as this doesn't interfere with other code people are working on and is sort of a bug fix, I don't think it will be a problem if it takes a little longer to get this done. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance bug in DO blocks
I just noticed that if you execute the same DO command over and over within a session, it gets slower and slower. And if you keep it up you'll notice the backend's RAM consumption bloating too. The cause appears to be that we leak the cached plans created for any SQL statements or expressions within the DO command --- the next iteration won't reuse those, but rather create its own set. Probably ought to look into releasing those when the DO block is over. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_fsync open_sync test
Greg Smith wrote: Bruce Momjian wrote: Is there a value to this test_fsync test? Compare open_sync with different sizes: (This is designed to compare the cost of one large sync'ed write and two smaller sync'ed writes.) open_sync 16k write 242.563 ops/sec 2 open_sync 8k writes 752.752 ops/sec It compares the cost of doing larger vs. two smaller open_sync writes. Might be some value for determining things like what the optimal WAL block size to use is. All these tests are kind of hard to use effectively still, I'm not sure if it's time to start trimming tests yet until we've made more progress on interpreting results first. FYI, I beefed up the testing for this in test_fsync: Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16k in different write open_sync sizes.) 1 16k open_sync write723.824 ops/sec 2 8k open_sync writes 347.979 ops/sec 4 4k open_sync writes 215.114 ops/sec 8 2k open_sync writes 174.613 ops/sec 16 1k open_sync writes87.496 ops/sec -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PgEast: 2011, 2nd call for papers
Hey folks, PgEast is being held in NYC this year from 03/22-03-25. Get your papers in, the deadline is soon! http://www.postgresqlconference.org/ Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
On Tue, Jan 18, 2011 at 17:33, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 18, 2011 at 2:15 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I also find it weird that incoming replication connections are logged by default. In the standby, we also log streaming replication successfully connected to primary, which serves much of the same debugging purpose. Oh, good point. All right, I withdraw my objection. Let's just make it all controlled by log_connections and go home. Done. Oh, and the proper answer is yellow. Everybody knows that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spread checkpoint sync
To be frank, I really don't care about fixing this behavior on ext3, especially in the context of that sort of hack. That filesystem is not the future, it's not possible to ever really make it work right, and every minute spent on pandering to its limitations would be better spent elsewhere IMHO. I'm starting with the ext3 benchmarks just to provide some proper context for the worst-case behavior people can see right now, and to make sure refactoring here doesn't make things worse on it. My target is same or slightly better on ext3, much better on XFS and ext4. Please don't forget that we need to avoid performance regressions on NTFS and ZFS as well. They don't need to improve, but we can't let them regress. I think we can ignore BSD/UFS and Solaris/UFS, as well as HFS+, though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] texteq/byteaeq: avoid detoast
Noah Misch n...@leadboat.com writes: texteq, textne, byteaeq and byteane detoast their arguments, then check for equality of length. Unequal lengths imply the answer trivially; given equal lengths, the functions proceed to compare the actual bytes. We can skip detoasting entirely when the lengths are unequal. The attached patch implements this. Applied with stylistic changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 1:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think you may be confused about what the patch does - currently, pages with hint bit changes are considered dirty, period. Therefore, they are written whenever any other dirty page would be written: by the background writer cleaning scan, at checkpoints, and when a backend must write a dirty buffer before reallocating it to hold a different page. The patch keeps the first of these and changes the second two While I was trying to performance-test the texteq patch, it occurred to me that this proposed hint-bit change has got a serious drawback. To wit, that it will totally destroy reproducibility of any performance test that involves table scans. Right now, you know that you can take hint bits out of the equation by doing a vacuum analyze and checkpoint; after that, all hint bits in the table are known to be set and written to disk. Then you can get on with comparing the effects of some patch or other. With the proposed patch, it will never be clear whether all the hint bits are set, because the patch specifically removes the deterministic ways to get a hint bit written out. So you'll never be very sure whether a performance difference you think you see is real, or whether one case or the other got affected by extra clog lookups. It's hard enough already to be sure about performance changes on the order of 1%, but this will make it impossible. True. You could perhaps fix that by adding a GUC, but that feels awfully like making it the user's problem to fix our broken implementation. Maybe we could live with it if the GUC were only something developers ever needed to use, but I expect different people would have different ideas about the correct setting in production. If I'm not failing to understand the situation, the problem with the first sequential scan after a bulk load is that we're cycling through a ring of buffers that all have hint-bit changes and therefore all have to be written. The first pass through the ring is OK, but after that every new buffer we bring in requires evicting a buffer that we first have to write. Of course, with the patch, this bottleneck is removed by skipping all those writes, but that now causes a second problem: the pages only get written if the background writer happens to notice them before the backend gets all the way around the ring, and that's pretty hit-or-miss, so we basically dribble hint bits out to disk here and there but the steady state never really converges to all hint bits on disk. Maybe we could work around this by making the algorithm a little more sophisticated. Instead of the rather unilateral policy backends don't write pages that are only dirty due to hint bit changes! we could have some more nuanced rules. For example, we might decree that a backend will maintain a counter of the number of non-dirty pages it's allocated. Once it's allocated 20 pages that are either clean or dirty-only-for-hint-bits, it writes that (or the next) dirty-only-for-hint-bits it encounters. That way, the effort of hint bit setting would be spread out over the first 20 table scans, and after that you converge to steady state. We could also possibly special-case vacuum to always write dirty-only-for-hint bits pages, on the theory that the work is going to have to be done at some point, and we're better off doing it during a maintenance task than elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication logging
All, Just speaking as someone who does a lot of log-crunching for performance review, I don't find having a separate log_connections option for replication terribly useful. It's easy enough for me to just log all connections and filter for the type of connections I want. Even in cases where there's a ton of connection activity (e.g. PHP webapp without a connection pool) logging all connections still doesn't generate that many MB of output, in my experience. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Noah Misch n...@leadboat.com writes: On Sun, Jan 16, 2011 at 06:49:27PM +0100, Pavel Stehule wrote: I am sending a updated version with little bit more comments. But I am sure, so somebody with good English have to edit my comments. Minimally I hope, so your questions will be answered. Thanks. I edited the comments and white space somewhat, as attached. I'll go ahead and mark it Ready for Committer. I looked at this patch and found it fairly awkward. What is the point of adding an additional flag to every variable, as opposed to just forcibly detoasting during assignment? If it's to avoid detoasting when the variable is read in a way that doesn't require detoasting, it fails rather completely IMO, since exec_eval_datum certainly doesn't know that. The added memory context variable seems redundant as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On 18.01.2011 21:16, Robert Haas wrote: On Tue, Jan 18, 2011 at 1:32 PM, Tom Lanet...@sss.pgh.pa.us wrote: While I was trying to performance-test the texteq patch, it occurred to me that this proposed hint-bit change has got a serious drawback. To wit, that it will totally destroy reproducibility of any performance test that involves table scans. Right now, you know that you can take hint bits out of the equation by doing a vacuum analyze and checkpoint; after that, all hint bits in the table are known to be set and written to disk. Then you can get on with comparing the effects of some patch or other. With the proposed patch, it will never be clear whether all the hint bits are set, because the patch specifically removes the deterministic ways to get a hint bit written out. So you'll never be very sure whether a performance difference you think you see is real, or whether one case or the other got affected by extra clog lookups. It's hard enough already to be sure about performance changes on the order of 1%, but this will make it impossible. True. You could perhaps fix that by adding a GUC, but that feels awfully like making it the user's problem to fix our broken implementation. Maybe we could live with it if the GUC were only something developers ever needed to use, but I expect different people would have different ideas about the correct setting in production. VACUUM (SET HINT BITS) table -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 12
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: There's a few remaining TODO comments in the code, which obviously need to be resolved one way or another Not all of these are must haves for 9.1. Here's how they break down: The two in predicate_internals.h mark places which would need to be touched if we further modify the btree AM to do index-tuple level predicate locking. I know that Dan was eager to tackle that because his group at MIT has predicate locking working at that granularity for their transaction-aware caching which works with PostgreSQL. At this point, though, I'm very skeptical about starting that effort for 9.1; it seems like something for 9.2. There's one TODO related to whether we can drop the volatile modifier from MySerializableXact. I've started reviewing code around this, but am not yet done. It wouldn't be terrible to leave it there and worry about this for 9.2, but it will be even better if we can clear it up one way or the other now. Three are marking the spots we want to touch if a patch becomes available which lets us cancel the transaction on one connection from another. I'd love to take care of these, but we can't without a separate patch that I know *I'm* not in a position to write for 9.1. We may need to leave these for 9.2, as well. One is about the desirability of taking some action (probably reporting a warning) in the SLRU summarization code if we've burned through over a billion transaction IDs while one read write transaction has remained active. That puts us close to where we'd need to start canceling attempts to start a new transaction due to resource issues. Seems like we should issue that warning for 9.1. Not big or dangerous. I'll do it. One is related to the heuristics for promoting the granularity of predicate locks. We picked numbers out of the air which seemed reasonable at the time. I'm sure we can make this more sophisticated, but what we have seems to be working OK. I'm tempted to suggest that we leave this alone until 9.2 unless someone has a suggestion for a particular improvement. One is related to the number of structures to allocate for detailed conflict checking. We've never seen a problem with this limit, at least that has reached me. On the other hand, it's certainly at least theoretically possible to hit the limit and cause confusing errors. Perhaps we can put this one on a GUC and make sure the error message is clear with a hint to think about adjusting the GUC? The GUC would be named something like max_conflicts_per_serializable_transaction (or something to that general effect). We should probably do that or bump up the limit. If my back-of-the-envelope math is right, a carefully constructed pessimal load could need up to (max_connections / 2)^2 -- so 100 connections could conceivably require 2500 structures, although such a scenario would be hard to create. Current picked from thin air numbers would have space for 500. The structure is six times the pointer size, so 24 bytes each at 32 bit and 48 bytes each at 64 bit. Three TODOs have popped up since Heikki's post because I pushed Dan's 2PC WIP to the repo -- it's at a point where behavior should be right for cases where the server keeps running. He's working on the persistence aspect now, and there are TODOs in the new stubs he's filling out. Definitely 9.1. Then there's one still lurking outside the new predicate* files, in heapam.c. It's about 475 lines into the heap_update function (25 lines from the bottom). In reviewing where we needed to acquire predicate locks, this struck me as a place where we might need to duplicate the predicate lock from one tuple to another, but I wasn't entirely sure. I tried for a few hours one day to construct a scenario which would show a serialization anomaly if I didn't do this, and failed create one. If someone who understands both the patch and heapam.c wants to look at this and offer an opinion, I'd be most grateful. I'll take another look and see if I can get my head around it better, but failing that, I'm disinclined to either add locking I'm not sure we need or to remove the comment that says we *might* need it there. That's all of them. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Tue, Jan 18, 2011 at 3:00 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 18.01.2011 21:16, Robert Haas wrote: On Tue, Jan 18, 2011 at 1:32 PM, Tom Lanet...@sss.pgh.pa.us wrote: While I was trying to performance-test the texteq patch, it occurred to me that this proposed hint-bit change has got a serious drawback. To wit, that it will totally destroy reproducibility of any performance test that involves table scans. Right now, you know that you can take hint bits out of the equation by doing a vacuum analyze and checkpoint; after that, all hint bits in the table are known to be set and written to disk. Then you can get on with comparing the effects of some patch or other. With the proposed patch, it will never be clear whether all the hint bits are set, because the patch specifically removes the deterministic ways to get a hint bit written out. So you'll never be very sure whether a performance difference you think you see is real, or whether one case or the other got affected by extra clog lookups. It's hard enough already to be sure about performance changes on the order of 1%, but this will make it impossible. True. You could perhaps fix that by adding a GUC, but that feels awfully like making it the user's problem to fix our broken implementation. Maybe we could live with it if the GUC were only something developers ever needed to use, but I expect different people would have different ideas about the correct setting in production. VACUUM (SET HINT BITS) table Something along those lines could work too, but I don't see much problem with making VACUUM doing it unconditionally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 12
On Tue, Jan 18, 2011 at 3:18 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That's all of them. Our existing code has plenty of TODOs in it already, so I see no problem with continuing to comment places where future enhancements are possible, as long as they don't reflect deficiencies that are crippling in the present. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 12
Kevin Grittner kevin.gritt...@wicourts.gov wrote: If my back-of-the-envelope math is right, a carefully constructed pessimal load could need up to (max_connections / 2)^2 -- so 100 connections could conceivably require 2500 structures, although such a scenario would be hard to create. Current picked from thin air numbers would have space for 500. Er, actually, we would have space for 5000, because it's five times the number of SERIALIZABLEXACT structures which is ten times max_connections. I guess that would explain why I've never seen a report of a problem. Still, someone who creates a very large number of connections and pounds them heavily with SERIALIZABLE transactions might conceivably run into it. Since that's something the docs explicitly warn you *not* to do with serializable transactions, I'm not sure we need to do more than make sure the error message and hint are good. Thoughts? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] test_fsync label adjustments
I have modified test_fsync to use test labels that match wal_sync_method values, and and added more tests for open_sync with different sizes. This should make the program easier for novices to understand. Here is a test run for Ubuntu 11.04: $ ./test_fsync 2000 operations per test Compare file sync methods using one 8k write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync (non-direct I/O)*85.127 ops/sec open_datasync (direct I/O) 87.119 ops/sec fdatasync 81.006 ops/sec fsync 82.621 ops/sec fsync_writethroughn/a open_sync (non-direct I/O)*84.412 ops/sec open_sync (direct I/O) 91.006 ops/sec * This non-direct I/O mode is not used by Postgres. Compare file sync methods using two 8k writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync (non-direct I/O)*42.721 ops/sec open_datasync (direct I/O) 45.296 ops/sec fdatasync 76.665 ops/sec fsync 78.361 ops/sec fsync_writethroughn/a open_sync (non-direct I/O)*42.311 ops/sec open_sync (direct I/O) 45.247 ops/sec * This non-direct I/O mode is not used by Postgres. Compare open_sync with different write sizes: (This is designed to compare the cost of writing 16k in different write open_sync sizes.) 1 16k open_sync write 86.740 ops/sec 2 8k open_sync writes44.709 ops/sec 4 4k open_sync writes22.096 ops/sec 8 2k open_sync writes10.856 ops/sec 16 1k open_sync writes 5.434 ops/sec Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) write, fsync, close86.802 ops/sec write, close, fsync85.766 ops/sec Non-sync'ed 8k writes: write 83.068 ops/sec Applied patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/tools/fsync/test_fsync.c b/src/tools/fsync/test_fsync.c index b1cec74..9c829ba 100644 *** /tmp/pgdiff.3331/yCQf2a_test_fsync.c Tue Jan 18 15:06:39 2011 --- src/tools/fsync/test_fsync.c Tue Jan 18 14:43:58 2011 *** void test_open(void); *** 47,52 --- 47,53 void test_non_sync(void); void test_sync(int writes_per_op); void test_open_syncs(void); + void test_open_sync(const char *msg, int writes_size); void test_file_descriptor_sync(void); void print_elapse(struct timeval start_t, struct timeval stop_t); void die(char *str); *** main(int argc, char *argv[]) *** 61,68 test_open(); - test_non_sync(); - /* Test using 1 8k write */ test_sync(1); --- 62,67 *** main(int argc, char *argv[]) *** 73,78 --- 72,79 test_file_descriptor_sync(); + test_non_sync(); + unlink(filename); return 0; *** handle_args(int argc, char *argv[]) *** 105,111 } while ((option = getopt_long(argc, argv, f:o:, ! long_options, optindex)) != -1) { switch (option) { --- 106,112 } while ((option = getopt_long(argc, argv, f:o:, ! long_options, optindex)) != -1) { switch (option) { *** handle_args(int argc, char *argv[]) *** 126,132 } } ! printf(%d operations per test\n\n, ops_per_test); } void --- 127,133 } } ! printf(%d operations per test\n, ops_per_test); } void *** test_open(void) *** 162,201 } void - test_non_sync(void) - { - int tmpfile, ops; - - /* - * Test a simple write without fsync - */ - printf(Simple non-sync'ed write:\n); - printf(LABEL_FORMAT, 8k write); - fflush(stdout); - - gettimeofday(start_t, NULL); - for (ops = 0; ops ops_per_test; ops++) - { - if ((tmpfile = open(filename, O_RDWR, 0)) == -1) - die(Cannot open output file.); - if (write(tmpfile, buf, WRITE_SIZE) != WRITE_SIZE) - die(write failed); - close(tmpfile); - } - gettimeofday(stop_t, NULL); - print_elapse(start_t, stop_t); - } - - void test_sync(int writes_per_op) { int tmpfile, ops, writes;
Re: [HACKERS] ToDo List Item - System Table Index Clustering
Robert Haas wrote: On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011: Hello Postgres Hackers, In reference to this todo item about clustering system table indexes, ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) I have been studying the system tables to see which would benefit ?from clustering. ?I have some index suggestions and a question if you have a moment. Wow, this is really old stuff. ?I don't know if this is really of any benefit, given that these catalogs are loaded into syscaches anyway. Furthermore, if you cluster at initdb time, they will soon lose the ordering, given that updates move tuples around and inserts put them anywhere. ?So you'd need the catalogs to be re-clustered once in a while, and I don't see how you'd do that (except by asking the user to do it, which doesn't sound so great). The idea of the TODO seems to have been to set the default clustering to something reasonable. That doesn't necessarily seem like a bad idea even if we can't automatically maintain the cluster order, but there's some question in my mind whether we'd get any measurable benefit from the clustering. Even on a database with a gigantic number of tables, it seems likely that the relevant system catalogs will stay fully cached and, as you point out, the system caches will further blunt the impact of any work in this area. I think the first thing to do would be to try to come up with a reproducible test case where clustering the tables improves performance. If we can't, that might mean it's time to remove this TODO. I think CLUSTER is a win when you are looking up multiple rows in the same table, either using a non-unique index or a range search. What places do such lookups? Having them all in adjacent pages would be a win --- single-row lookups are usually not. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
David E. Wheeler da...@kineticode.com writes: One of the reasons our client wants GIN for the integer[] column so bad is because recreating the GiST integer[] index is quite painful. Before I duped the table, I was just dropping and recreating the index on the original table. It was great to create the GIN index; for 400K rows, it took 1300 ms. After my initial round of testing, I dropped it and created the GiST index. That ran for well, *hours*. Four at least, maybe six or seven (I forgot \timing and was letting it run on screen while I did some iOS hacking). I think something might be really wrong with GiST index creation for integer arrays, because the difference was just appalling. I poked into this a bit, although it's really off-topic for what I was doing in GIN, and I agree that there is something fundamentally rotten in the gist__int_ops logic. oprofile shows that the code is spending all its time in g_int_compress and g_int_union during index build: samples %app name symbol name 5274740.2486 _int.so g_int_compress 2709220.6726 libc-2.12.2.so _wordcopy_fwd_aligned 1893814.4506 _int.so compASC 1825613.9302 postgres pg_qsort 5741 4.3807 no-vmlinux /no-vmlinux 3297 2.5158 postgres swapfunc 864 0.6593 _int.so g_int_decompress 826 0.6303 _int.so _int_unique 700 0.5341 _int.so inner_int_union 617 0.4708 postgres med3 588 0.4487 libc-2.12.2.so memset 371 0.2831 _int.so g_int_same 143 0.1091 libc-2.12.2.so memcpy 123 0.0939 postgres ArrayGetNItems 670.0511 _int.so inner_int_inter 480.0366 postgres AllocSetAlloc 470.0359 libc-2.12.2.so memmove 400.0305 postgres MemoryContextAllocZero The time in g_int_compress is all on this loop: while (len MAXNUMRANGE * 2) { min = 0x7fff; for (i = 2; i len; i += 2) if (min (dr[i] - dr[i - 1])) { min = (dr[i] - dr[i - 1]); cand = i; } memmove((void *) dr[cand - 1], (void *) dr[cand + 1], (len - cand - 1) * sizeof(int32)); len -= 2; } which is not too surprising since that's obviously O(N^2). The other high-percentage functions are qsort and subsidiaries, and those calls are coming from g_int_union. That part could probably be sped up, since most of the calls are unioning two inputs, which could be done a lot cheaper than this (the inputs are always presorted no?). But there is something really fundamentally wrong in the logic, I think, because while poking at it with gdb I saw it union-ing fifty-thousand-element arrays. Surely it should get lossy before it gets to that. I'm also wondering how it tells the difference between lossy and non-lossy keys --- although it's hard to tell what such spectacularly uncommented code is supposed to be doing, it looks like the lossy case is supposed to be pairs of values representing ranges, in which case g_int_compress is surely doing the Wrong Thing with already-lossy inputs, and union'ing lossy inputs is an entirely insane operation as well. At the moment my opinion is that gist__int_ops is too broken to be usable, and it's also too uncommented to be fixable by anyone other than the original author. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
2011/1/18 Tom Lane t...@sss.pgh.pa.us: Noah Misch n...@leadboat.com writes: On Sun, Jan 16, 2011 at 06:49:27PM +0100, Pavel Stehule wrote: I am sending a updated version with little bit more comments. But I am sure, so somebody with good English have to edit my comments. Minimally I hope, so your questions will be answered. Thanks. I edited the comments and white space somewhat, as attached. I'll go ahead and mark it Ready for Committer. I looked at this patch and found it fairly awkward. What is the point of adding an additional flag to every variable, as opposed to just forcibly detoasting during assignment? If it's to avoid detoasting when the variable is read in a way that doesn't require detoasting, it fails rather completely IMO, since exec_eval_datum certainly doesn't know that. I am not sure about false overhead of detoasting. This is a safe variant. I don't would to decrease performance. Not sure if it's necessary. But detoasting on assignment isn't enought: some most simple example - searching a maximum for i in array_lower(a,1) .. array_upper(a,1) loop if x a[i] then x = a[i]; end if; end loop; in this cycle the variable a wasn't modified. Any access to this variable means a detoast and decompres. So there is necessary to modify a process. Detoast not on assign, but detoast on usage. Regards Pavel Stehule The added memory context variable seems redundant as well. I didn't find a pointer on top execution context available from execution state. I am sure, so we have to switch to this context explicitly. Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump moved to pgfoundry
On 19/01/11 05:51, Robert Haas wrote: I'm not sure why they'd care, but it certainly doesn't seem worth spending the amount of time arguing about it that we are. David and Mark are, of course, free to spend their time petitioning Red Hat for relicensing if they are so inclined, but they aren't entitled to tell you how to spend yours. And even if they were, I would hope that they'd want you to spend it committing patches rather than arguing with your employer about relicensing of a utility that's freely available anyway and of use to 0.1% of our user base. Funny how people can read an email and derive a completely different meaning from what you intended - just to be clear: it certainly wasn't my intention to tell anyone how to spend their time. Anyway, good to have pg_filedump on pgfoundry, thanks Tom and RH. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo List Item - System Table Index Clustering
On Tue, Jan 18, 2011 at 12:16 PM, Simone Aiken sai...@ulfheim.net wrote: When I'm learning a new system I like to first learn how to use it, second learn its data model, third start seriously looking at the code. So that Todo is ideal for my learning method. Sure - my point is just that we usually have as a criteria for any performance related patch that it actually does improve performance. So, we'd need a test case. If there is something else that would also involve studying all the system tables it would also be great. For example, I noticed we have column level comments on the web but not in the database itself. This seems silly. Why not have the comments in the database and have the web query the tables of template databases for the given versions? Uh... I don't know what this means. I'm open to other suggestions as well. Here are a few TODO items that look relatively easy to me (they may not actually be easy when you dig in, of course): Clear table counters on TRUNCATE Allow the clearing of cluster-level statistics Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME Allow ALTER TABLE to change constraint deferrability and actions Unfortunately we don't have a lot of easy TODOs. People keep doing the ones we think up... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_fsync label adjustments
On Jan 18, 2011, at 3:55 PM, Bruce Momjian wrote: I have modified test_fsync to use test labels that match wal_sync_method values, and and added more tests for open_sync with different sizes. This should make the program easier for novices to understand. Here is a test run for Ubuntu 11.04: $ ./test_fsync 2000 operations per test Compare file sync methods using one 8k write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync (non-direct I/O)*85.127 ops/sec open_datasync (direct I/O) 87.119 ops/sec fdatasync 81.006 ops/sec fsync 82.621 ops/sec fsync_writethroughn/a open_sync (non-direct I/O)*84.412 ops/sec open_sync (direct I/O) 91.006 ops/sec * This non-direct I/O mode is not used by Postgres. I am curious how this is targeted at novices. A naive user might enable the fastest option which could be exactly wrong. For this to be useful to novices, I suspect the tool will need to generate platform-specific suggestions, no? Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
David E. Wheeler da...@kineticode.com writes: These numbers are a bit crazy-making, but the upshot is that Gist is slow out of the gate, but with data cached, it's pretty speedy. With indexscan and bitmapscan disabled, these queries all took 300-400 ms. So GIN was never better performing than a table scan. I could not replicate that here at all --- GIN indexscans were consistently better than seqscans for me, eg regression=# set enable_bitmapscan TO 1; SET Time: 0.673 ms regression=# explain analyze SELECT count(*) FROM listings WHERE features @@ '(136879913688001369043)'::query_int AND deleted_at IS NULL AND status = 1; QUERY PLAN -- Aggregate (cost=1159.20..1159.21 rows=1 width=0) (actual time=23.964..23.964 rows=1 loops=1) - Bitmap Heap Scan on listings (cost=31.15..1158.18 rows=406 width=0) (actual time=23.014..23.876 rows=772 loops=1) Recheck Cond: ((features @@ '1368799 1368800 1369043'::query_int) AND (deleted_at IS NULL) AND (status = 1)) - Bitmap Index Scan on idx_gin_features (cost=0.00..31.05 rows=406 width=0) (actual time=22.913..22.913 rows=772 loops=1) Index Cond: (features @@ '1368799 1368800 1369043'::query_int) Total runtime: 24.040 ms (6 rows) Time: 24.968 ms regression=# set enable_bitmapscan TO 0; SET Time: 0.458 ms regression=# explain analyze SELECT count(*) FROM listings WHERE features @@ '(136879913688001369043)'::query_int AND deleted_at IS NULL AND status = 1; QUERY PLAN Aggregate (cost=9158.24..9158.25 rows=1 width=0) (actual time=145.121..145.121 rows=1 loops=1) - Seq Scan on listings (cost=0.00..9157.22 rows=406 width=0) (actual time=0.025..144.982 rows=772 loops=1) Filter: ((deleted_at IS NULL) AND (features @@ '1368799 1368800 1369043'::query_int) AND (status = 1)) Total runtime: 145.177 ms (4 rows) Time: 146.228 ms I'm noticing also that I get different rowcounts than you do, although possibly that has something to do with the partial-index conditions, which I'm not trying to duplicate here (all rows in my table pass those two tests). * Why does it take 3-4x longer to create the GIN than the GiST index on tsvector? Perhaps more maintenance_work_mem would help with that; although the fine manual says specifically that GIN text search indexes take about three times longer to build than equivalent GiST indexes, so maybe that behavior is as designed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql
Pavel Stehule pavel.steh...@gmail.com writes: 2011/1/18 Tom Lane t...@sss.pgh.pa.us: I looked at this patch and found it fairly awkward. Â What is the point of adding an additional flag to every variable, as opposed to just forcibly detoasting during assignment? But detoasting on assignment isn't enought: for i in array_lower(a,1) .. array_upper(a,1) loop if x a[i] then x = a[i]; end if; end loop; in this cycle the variable a wasn't modified. Any access to this variable means a detoast and decompres. How so? In what I'm envisioning, a would have been decompressed when it was originally assigned to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing GIN for empty/null/full-scan cases
On Jan 18, 2011, at 1:39 PM, Tom Lane wrote: At the moment my opinion is that gist__int_ops is too broken to be usable, and it's also too uncommented to be fixable by anyone other than the original author. That seems to jibe with your comments from last year: http://archives.postgresql.org/pgsql-performance/2009-03/msg00265.php Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers