Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
Le samedi 17 novembre 2012 22:57:49, Tom Lane a écrit : Fujii Masao masao.fu...@gmail.com writes: Do we really need to store the settings in a system table? Since WAL would be generated when storing the settings in a system table, this approach seems to prevent us from changing the settings in the standby. That's a really good point: if we try to move all GUCs into a system table, there's no way for a standby to have different values; and for some of them different values are *necessary*. I think that shoots down this line of thought entirely. Can we go back to the plain write a file approach now? I think a SET PERSISTENT command that's disallowed in transaction blocks and just writes the file immediately is perfectly sensible. I was justifying the usage of a table structure, not to keep it in sync (just use it to hide the complexity of locks). Anyway that was just comments. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Materialized views WIP patch
On 16 November 2012 11:25, Kevin Grittner kgri...@mail.com wrote: 16. To get new data into the MV, the command is LOAD MATERIALIZED VIEW mat view_name. This seemed more descriptive to me that the alternatives and avoids declaring any new keywords beyond MATERIALIZED. If the MV is flagged as relisvalid == false, this will change it to true. UPDATE MATERIALIZED VIEW was problematic? Not technically, really, but I saw two reasons that I preferred LOAD MV: 1. It seems to me to better convey that the entire contents of the MV will be built from scratch, rather than incrementally adjusted. 2. We haven't hashed out the syntax for more aggressive maintenance of an MV, and it seemed like UPDATE MV might be syntax we would want to use for something which updated selected parts of an MV when we do. Does LOAD automatically TRUNCATE the view before reloading it? If not, why not? It builds a new heap and moves it into place. When the transaction running LMV commits, the old heap is deleted. In implementation it is closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by creating a new table. This allows all permissions, etc., to stay in place. This seems very similar to the REPLACE command we discussed earlier, except this is restricted to Mat Views. If we're going to have this, I would prefer a whole command. e.g. REPLACE matviewname REFRESH that would also allow REPLACE tablename AS query Same thing under the covers, just more widely applicable and thus more useful. Either way, I don't much like overloading the use of LOAD, which already has a very different meaning. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser - Query Analyser
On 11/17/2012 10:18 PM, Michael Giannakopoulos wrote: Hello guys, My name is Michail Giannakopoulos and I am a graduate student at University of Toronto. I have no previous experience in developing a system like postgreSQL before. What I am trying to explore is if it is possible to extend postgreSQL in order to accept queries of the form: Select function(att1, att2, att3) AS output(out1, out2, ..., outk) FROM [database_name]; I think you meant FROM [table_name]. It certainly seems like it, as you describe database_name as a relation a little later. If you really meant FROM [database_name], you're not going to have much luck. PostgreSQL backends are associated with a single database. They cannot query across databases without hacks like dblink, which internally opens a connection to another backend. So you really can't query FROM [database_name], you must connect to a database then issue queries against it. If you meant FROM relation_name: it sounds like you are describing a stored procedure that returns SETOF RECORD. If so, you can already do this, though the syntax is a little different. You have to pass the relation *name* or regclass oid into the procedure, where it builds a dynamic SQL statement to SELECT from the table and return the result. Alternately: Are you trying to describe a *row filter function*? Like a WHERE clause wrapped up in a function? It would really help if you could show some mock examples of what you're trying to achieve. Inputs as CREATE TABLE and INSERT statements, mock output, explanation of how you'd get that output, what problem you're trying to solve, etc. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On 17 November 2012 21:20, Jeff Davis pg...@j-davis.com wrote: ISTM that we should tune that specifically by performing a VM lookup for next 32 pages (or more), so we reduce the lookups well below 1 per page. That way the overhead of using the VM will be similar to using the PD_ALL_VISIBLE. That's another potential way to mitigate the effects during a scan, but it does add a little complexity. Right now, it share locks a buffer, and uses an array with one element for each tuple in the page. If PD_ALL_VISIBLE is set, then it marks all of the tuples *currently present* on the page as visible in the array, and then releases the share lock. Then, when reading the page, if another tuple is added (because we released the share lock and only have a pin), it doesn't matter because it's already invisible according to the array. With this approach, we'd need to keep a larger array to represent many pages. And it sounds like we'd need to share lock the pages ahead, and find out which items are currently present, in order to properly fill in the array. Not quite sure what to do there, but would require some more thought. Hmm, that's too much and not really what I was thinking, but I concede that was a little vague. No need for bigger arrays etc.. If we check the VM for next N blocks, then we know that all completed transactions are commited. Yes, the VM can change, but that is not a problem. What I mean is that we keep an array of boolean[N] that simply tracks what the VM said last time we checked it. If that is true for a block then we do special processing, similar to the current all-visible path and yet different, desribed below. What we want is to do a HeapTupleVisibility check that does not rely on tuple hints AND yet avoids all clog access. So when we scan a buffer in page mode and we know the VM said it was all visible we still check each tuple's visibility. If xid is below snapshot xmin then the xid is known committed and the tuple is visible to this scan (not necessarily all scans). We know this because the VM said this page was all-visible AFTER our snapshot was taken. If tuple xid is within snapshot or greater than snapshot xmax then the tuple is invisible to our snapshot and we don't need to check clog. So once we know the VM said the page was all visible we do not need to check clog to establish visibility, we only need to check the tuple xmin against our snapshot xmin. So the VM can change under us and it doesn't matter. We don't need a pin or lock on the VM, we just read it and let go. No race conditions, no fuss. The difference here is that we still need to check visibility of each tuple, but that can be a very cheap check and never involves clog, nor does it dirty the page. Tuple access is reasonably expensive in comparison with a clog-less check on tuple xmin against snapshot xmin, so the extra work is negligible. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote: Process A (process that clears a VM bit for a data page): 1. Acquires exclusive lock on data buffer 2. Acquires exclusive lock on VM buffer 3. clears VM bit 4. Releases VM buffer lock 5. Releases data buffer lock Well, but right this is a rather big difference. If vm pages get unconditionally locked all the time we will have a huge source of new contention as they are shared between so many heap pages. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On 18 November 2012 08:52, Simon Riggs si...@2ndquadrant.com wrote: The difference here is that we still need to check visibility of each tuple, but that can be a very cheap check and never involves clog, nor does it dirty the page. Tuple access is reasonably expensive in comparison with a clog-less check on tuple xmin against snapshot xmin, so the extra work is negligible. The attached *partial* patch shows how the tuple checks would work. This should fit in neatly with the vismap skip code you've got already. Looks to me possible to skip the all-vis hint completely, as you suggest, plus avoid repeated checks of VM or clog. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services all_visible_cheap_check.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 pg_archivecleanup broken?
(In a test setup) I can't get pg_archivecleanup to remove WALfiles in 9.3devel. (A very similar setup in 9.2 works fine). In 9.3 pg_archivecleanup just keeps repeating lines like: pg_archivecleanup: keep WAL file /home/aardvark/pg_stuff/archive_dir93/ and later (and does not delete any files.) Configuration: # master pgsql.93_1/data/postgresql.conf: data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_1/data' listen_addresses = '*' max_connections = 100 shared_buffers = 128MB wal_level = hot_standby synchronous_commit = on checkpoint_segments = 3 archive_mode = on archive_command = 'cp %p /home/aardvark/pg_stuff/archive_dir93/%f /dev/null' max_wal_senders = 3 synchronous_standby_names = '*' # slave pgsql.93_2/data/postgresql.conf: data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_2/data' listen_addresses = '*' port = 6665 max_connections = 100 shared_buffers = 128MB wal_level = hot_standby synchronous_commit = on checkpoint_segments = 3 max_wal_senders = 3 synchronous_standby_names = '' hot_standby = on wal_receiver_status_interval = 59 # pgsql.93_2/data/recovery.conf primary_conninfo = 'host=127.0.0.1 port=6664 user=aardvark password=sekr1t application_name=wal_receiver_01' standby_mode = 'on' restore_command = 'cp /home/aardvark/pg_stuff/archive_dir93/%f %p /dev/null' archive_cleanup_command = 'pg_archivecleanup -d /home/aardvark/pg_stuff/archive_dir93 %r' Seeing that the same setup in 9.2 has pg_archivecleanup deleting files, it would seem that some bug exists but I haven't followed changes regarding WAL too closely. (My apologies if it's a config mistake on my part after all; but in that case: I cannot find it and would be thankful for a hint) Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH 05/14] Add a new relmapper.c function RelationMapFilenodeToOid that acts as a reverse of RelationMapOidToFilenode
On 2012-11-17 19:14:06 +0900, Michael Paquier wrote: On Fri, Nov 16, 2012 at 7:58 PM, Andres Freund and...@2ndquadrant.comwrote: Hi, On 2012-11-16 13:44:45 +0900, Michael Paquier wrote: This patch looks OK. I got 3 comments: 1) Why changing the OID of pg_class_tblspc_relfilenode_index from 3171 to 3455? It does not look necessary. Its a mismerge and should have happened in Add a new RELFILENODE syscache to fetch a pg_class entry via (reltablespace, relfilenode) but it seems I squashed the wrong two commits. I had originally used 3171 but that since got used up for lo_tell64... 2) You should perhaps change the header of RelationMapFilenodeToOid so as not mentionning it as the opposite operation of RelationMapOidToFilenode but as an operation that looks for the OID of a relation based on its relfilenode. Both functions are opposite but independent. I described it as the opposite because RelationMapOidToFilenode is the relmappers stated goal and RelationMapFilenodeToOid is just some side-business. 3) Both functions are doing similar operations. Could it be possible to wrap them in the same central function? I don't really see how without making both quite a bit more complicated. The amount of if's needed seems to be too large to me. OK thanks for your answers. As this patch only adds a new function and is not that much complicated, I think there is no problem in committing it. The only thing to remove is the diff in indexing.h. Could someone take care of that? I pushed a rebase to the git repository that fixed it... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
On Tue, Oct 23, 2012 at 5:08 PM, Magnus Hagander mag...@hagander.net wrote: On Oct 23, 2012 4:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Boszormenyi Zoltan escribió: Also, the check for conflict between -R and -x/-X is now removed. The documentation for option -R has changed to reflect this and there is no different error code 2 now: it would make the behaviour inconsistent between -Fp and -Ft. The PQconninfo patch is also attached but didn't change since the last mail. Magnus, This patch is all yours to handle. I'm guessing nothing will happen until pgconf.eu is done and over, but hopefully you can share a few beers with Zoltan over the whole subject (and maybe with Peter about the PQconninfo stuff?) I'm not closing this just yet, but if you're not able to handle this soon, maybe it'd be better to punt it to the November commitfest. It's on my to do list for when I get back, but correct, won't get to it until after the conference. I finally got around to looking at this patch now. Sorry about the way too long delay. A few thoughts: First, on the libpq patch: I'm not sure I like the for_replication flag to PQconninfo(). It seems we're it's a quite limited API, and not very future proof. What's to say that an app would only be interested in filtering based on for_replication? One idea might be to have a bitmap field there, and assign *all* conninfo options to a category. We could then have categories for NORMAL and REPLICATION. But we could also for example have a category for PASSWORD (or similar), so that you could get with and without those. Passing in a 32-bit integer would allow us to have 32 different categories, and we could then use a bitmask to pick things out. It might sound a bit like overengineering, but it's also an API and it's a PITA to change it in the future if more needs show up.. Second, I wonder if we really need to add the code for requiressl=1, or if we should just remove it. The spelling requiressl=1 was deprecated back in 7.4 - which has obviously been out of support for a long time now. Third, in fillPGconn. If mapping has both conninfoValue and connvalue, it does a free() on the old value in memberaddr, but if it doesn't it just overwrites memberaddr with strdup(). Shouldn't those two paths be the same, meaning shouldn't the if (*memberaddr) free(*memberaddr); check be outside the if block? Fourth, I'm not sure I like the memberaddr term. It's not wrong, but it threw me off a couple of times while reading it. It's not all that important, and I'm not sure about another idea for it though - maybe just connmember? Then, about the pg_basebackup patch: What's the reason for the () around 512 for TARCHUNKSZ? We have a lot of hardcoded entries of the 512 for tar in that file. We should either keep using 512 as a constant, or change all those to *also* use the #define. Right now, the code will obviously break if you change the #define (for example, it compares to 512, but then uses hdrleft = TARCHUNKSZ - tarhdrsz; to do calculation). The name choice of basebackup for the bool in ReceiveTarFile() is unfortunate, since we use the term base backup to refer to the complete thing, not just the main tablespace. Probably basetablespcae instead. And it should then be assigned at the top of the function (to the result of PQgetisnull()), and used in the main if() branch as well. Should we really print the status message even if not in verbose mode? We only print the base backup complete messages in verbose mode, for example. It seems wrong to free() recoveryconf in ReceiveTarFile(). It's allocated globally at the beginning. While that loop should only be called once (since only one tablespace can be the main one), it's a confusing location for the free. The whole tar writing part of the code needs a lot more comments. It's entirely unclear what the code does there. Why does the recovery.conf writing code need to be split up in multiple locations inside ReceiveTarFile(), for example? It either needs to be simplified, or explained why it can't be simplified in comments. _tarCreateHeader() is really badly named, since it specifically creates a tar header for recovery.conf only. Either that needs to be a parameter, or it needs to have a name that indicates this is the only thing it does. The former is probably better. Much of the tar stuff is very similar (I haven't looked to see if it's identical) to the stuff in backend/replication/basebackup.c. Perhaps we should have a src/port/tarutil.c? escape_string() - already exists as escape_quotes() in initdb, AFAICT. We should either move it to src/port/, or at least copy it so it's exactly the same. CreateRecoveryConf() should just use PQExpBuffer (in libpq), I think - that does away with a lot of code. We already use this from e.g. pg_dump, so there's a precedent for using internal code from libpq in frontends. Again, my apologies for this review taking so long. I will try to be more
Re: [HACKERS] Do we need so many hint bits?
On Sun, 2012-11-18 at 15:19 +0100, Andres Freund wrote: On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote: Process A (process that clears a VM bit for a data page): 1. Acquires exclusive lock on data buffer 2. Acquires exclusive lock on VM buffer 3. clears VM bit 4. Releases VM buffer lock 5. Releases data buffer lock Well, but right this is a rather big difference. If vm pages get unconditionally locked all the time we will have a huge source of new contention as they are shared between so many heap pages. No, that is only for the process *clearing* the bit, and this already happens. I am not planning on introducing any new locks, aside from the buffer header lock when acquiring a pin. And I plan to keep those pins for long enough that those don't matter, either. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
Hi, 2012-11-18 17:20 keltezéssel, Magnus Hagander írta: On Tue, Oct 23, 2012 at 5:08 PM, Magnus Hagander mag...@hagander.net wrote: On Oct 23, 2012 4:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Boszormenyi Zoltan escribió: Also, the check for conflict between -R and -x/-X is now removed. The documentation for option -R has changed to reflect this and there is no different error code 2 now: it would make the behaviour inconsistent between -Fp and -Ft. The PQconninfo patch is also attached but didn't change since the last mail. Magnus, This patch is all yours to handle. I'm guessing nothing will happen until pgconf.eu is done and over, but hopefully you can share a few beers with Zoltan over the whole subject (and maybe with Peter about the PQconninfo stuff?) I'm not closing this just yet, but if you're not able to handle this soon, maybe it'd be better to punt it to the November commitfest. It's on my to do list for when I get back, but correct, won't get to it until after the conference. I finally got around to looking at this patch now. Sorry about the way too long delay. No problem, thanks for looking at it. A few thoughts: First, on the libpq patch: I'm not sure I like the for_replication flag to PQconninfo(). It seems we're it's a quite limited API, and not very future proof. What's to say that an app would only be interested in filtering based on for_replication? One idea might be to have a bitmap field there, and assign *all* conninfo options to a category. We could then have categories for NORMAL and REPLICATION. But we could also for example have a category for PASSWORD (or similar), so that you could get with and without those. Passing in a 32-bit integer would allow us to have 32 different categories, and we could then use a bitmask to pick things out. It might sound a bit like overengineering, but it's also an API and it's a PITA to change it in the future if more needs show up.. You are right, I will change this accordingly. Second, I wonder if we really need to add the code for requiressl=1, or if we should just remove it. The spelling requiressl=1 was deprecated back in 7.4 - which has obviously been out of support for a long time now. This needs opinions from more people, I am not the one to decide it. The code would be definitely cleaner without processing this extra non-1:1 mapping. Third, in fillPGconn. If mapping has both conninfoValue and connvalue, it does a free() on the old value in memberaddr, but if it doesn't it just overwrites memberaddr with strdup(). Shouldn't those two paths be the same, meaning shouldn't the if (*memberaddr) free(*memberaddr); check be outside the if block? Yes, and set it to NULL too. Otherwise there might be a case when the free() leaves a stale pointer value if the extra mapping fails the strcmp() check. This is all unnecessary if the extra mapping for requiressl=1 is removed, the code would be straight. Fourth, I'm not sure I like the memberaddr term. It's not wrong, but it threw me off a couple of times while reading it. It's not all that important, and I'm not sure about another idea for it though - maybe just connmember? I am not attached to this variable name, I will change it. Then, about the pg_basebackup patch: What's the reason for the () around 512 for TARCHUNKSZ? It's simply a habit, to not forget it for more complex macros. We have a lot of hardcoded entries of the 512 for tar in that file. We should either keep using 512 as a constant, or change all those to *also* use the #define. Right now, the code will obviously break if you change the #define (for example, it compares to 512, but then uses hdrleft = TARCHUNKSZ - tarhdrsz; to do calculation). Yes, I left 5 pieces of the hardcoded value of 512, because I (maybe erroneously) distinguished between a file header and file chunks inside a TAR file, they are all 512. Is it okay to change every hardcoded 512 to TARCHUNKSZ, maybe adding a comment to the #define that it must not be modified ever? The name choice of basebackup for the bool in ReceiveTarFile() is unfortunate, since we use the term base backup to refer to the complete thing, not just the main tablespace. Probably basetablespcae instead. And it should then be assigned at the top of the function (to the result of PQgetisnull()), and used in the main if() branch as well. Will change it. Should we really print the status message even if not in verbose mode? We only print the base backup complete messages in verbose mode, for example. OK. It seems wrong to free() recoveryconf in ReceiveTarFile(). It's allocated globally at the beginning. While that loop should only be called once (since only one tablespace can be the main one), it's a confusing location for the free. The whole tar writing part of the code needs a lot more comments. It's entirely unclear what the code does there. Why does the recovery.conf writing code need to be split up in multiple locations inside
[HACKERS] Avoiding overflow in timeout-related calculations
The discussion of bug #7670 showed that what's happening there is that if you specify a log_rotation_age of more than 25 days (2^31 msec), WaitLatch will sometimes be passed a timeout of more than 2^31 msec, leading to unportable behavior. At least some kernels will return EINVAL for that, and it's not very clear what will happen on others. After some thought about this, I think the best thing to do is to tweak syslogger.c to to clamp the requested sleep to INT_MAX msec. The fact that a couple of people have tried to set log_rotation_age to 30 days or more suggests that it's useful, so reducing the GUC's upper limit isn't a desirable fix. This should be an easy change since the logic in that loop will already behave correctly if it's woken up before the requested rotation time. I went looking for other timeout-related GUC variables that might have overoptimistic upper limits, and found these cases: PostAuthDelay is converted to microseconds, and therefore had better have an upper bound of INT_MAX/100 seconds. (Larger values would work on machines with 64-bit long, but it doesn't seem worth complicating the code to allow for that.) contrib/auth_delay's auth_delay_milliseconds is likewise converted to microseconds, so its max had better be INT_MAX/1000 msec. (Likewise, I don't see any value in supporting larger limits.) XLogArchiveTimeout, although it's never multiplied by anything, is compared to ((int) (now - last_xlog_switch_time)) which means that its nominal maximum of INT_MAX is problematic: if you actually set it to that it would be quite possible for the system to miss the timeout occurrence if it didn't chance to service the timeout signal in exactly the last second of the interval. (After that second, the above-quoted expression would overflow and lead to the wrong comparison result.) So it seems to me we'd better back it off some. I'm inclined to propose dropping it to INT_MAX/2 seconds. Comments, objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2012-11-14 at 17:40 -0800, Jeff Davis wrote: I'll do another pass to make sure I update all of the comments, and try to self review it. Updated patches attached (the TLI patch wasn't changed though, only the main checksums patch). Changes: * A lot of cleanup * More testing * Added check during pg_upgrade to make sure the checksum settings match. * Fixed output of pg_resetxlog to include information about checksums. * fixed contrib/pageinspect, and included upgrade script for it * removed code to skip the page hole during the checksum calculation. We can reconsider if we think performance will be a real problem. * I added the header bits back in, because we will need them when we want to support enabling/disabling checksums when the system is online. I also did quite a bit more testing, although it could use some performance testing. I'll also probably do another review pass myself, but I think it's in good shape. Also, if performance of the checksum calculation itself turns out to be a problem, we might consider modifying the algorithm to do multiple bytes at a time. One purpose of this patch is to establish the on-disk format for checksums, so we shouldn't defer decisions that would affect that (e.g. doing checksum calculation in larger chunks, ignoring the page hole, or using a different scheme for the bits in the header). Regards, Jeff Davis replace-tli-with-checksums-20121118.patch.gz Description: GNU Zip compressed data checksums-20121118.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding overflow in timeout-related calculations
On 2012-11-18 14:57:51 -0500, Tom Lane wrote: The discussion of bug #7670 showed that what's happening there is that if you specify a log_rotation_age of more than 25 days (2^31 msec), WaitLatch will sometimes be passed a timeout of more than 2^31 msec, leading to unportable behavior. At least some kernels will return EINVAL for that, and it's not very clear what will happen on others. After some thought about this, I think the best thing to do is to tweak syslogger.c to to clamp the requested sleep to INT_MAX msec. The fact that a couple of people have tried to set log_rotation_age to 30 days or more suggests that it's useful, so reducing the GUC's upper limit isn't a desirable fix. This should be an easy change since the logic in that loop will already behave correctly if it's woken up before the requested rotation time. Cool. Agreed. I went looking for other timeout-related GUC variables that might have overoptimistic upper limits, and found these cases: [sensible stuff] Lowering the maximum of those seems sensible to me. Anybody using that large value for those already had a problem even if it worked. I think at least wal_sender_timeout and wal_receiver_timeout are also problematic. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding overflow in timeout-related calculations
Andres Freund and...@anarazel.de writes: I think at least wal_sender_timeout and wal_receiver_timeout are also problematic. I looked at those and didn't see a problem --- what are you worried about exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding overflow in timeout-related calculations
On 2012-11-18 15:21:34 -0500, Tom Lane wrote: Andres Freund and...@anarazel.de writes: I think at least wal_sender_timeout and wal_receiver_timeout are also problematic. I looked at those and didn't see a problem --- what are you worried about exactly? Forget it, too hungry to read the code properly... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: store additional info in GIN index
Hackers, Attached patch enables GIN to store additional information with item pointers in posting lists and trees. Such additional information could be positions of words, positions of trigrams, lengths of arrays and so on. This is the first and most huge patch of serie of GIN improvements which was presented at PGConf.EU http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf Patch modifies GIN interface as following: 1) Two arguments are added to extractValue Datum **addInfo, bool **addInfoIsNull 2) Two arguments are added to consistent Datum addInfo[], bool addInfoIsNull[] 3) New method config is introduced which returns datatype oid of addtional information (analogy with SP-GiST config method). Patch completely changes storage in posting lists and leaf pages of posting trees. It uses varbyte encoding for BlockNumber and OffsetNumber. BlockNumber are stored incremental in page. Additionally one bit of OffsetNumber is reserved for additional information NULL flag. To be able to find position in leaf data page quickly patch introduces small index in the end of page. -- With best regards, Alexander Korotkov. ginaddinfo.1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
Hi! On 26.9.2012 19:18, Jeff Janes wrote: On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012: On 26-09-2012 09:43, Tomas Vondra wrote: 5) splitting the single stat file into multiple pieces - e.g. per database, written separately, so that the autovacuum workers don't need to read all the data even for databases that don't need to be vacuumed. This might be combined with (4). IMHO that's the definitive solution. It would be one file per database plus a global one. That way, the check would only read the global.stat and process those database that were modified. Also, an in-memory map could store that information to speed up the checks. +1 That would help for the case of hundreds of databases, but how much does it help for lots of tables in a single database? It doesn't help that case, but that case doesn't need much help. If you have N statistics-kept objects in total spread over M databases, of which T objects need vacuuming per naptime, the stats file traffic is proportional to N*(M+T). If T is low, then there is generally is no problem if M is also low. Or at least, the problem is much smaller than when M is high for a fixed value of N. I've done some initial hacking on splitting the stat file into multiple smaller pieces over the weekend, and it seems promising (at least with respect to the issues we're having). See the patch attached, but be aware that this is a very early WIP (or rather a proof of concept), so it has many rough edges (read sloppy coding). I haven't even added it to the commitfest yet ... The two main changes are these: (1) The stats file is split into a common db file, containing all the DB Entries, and per-database files with tables/functions. The common file is still called pgstat.stat, the per-db files have the database OID appended, so for example pgstat.stat.12345 etc. This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile functions, introducing two new functions: pgstat_read_db_statsfile pgstat_write_db_statsfile that do the trick of reading/writing stat file for one database. (2) The pgstat_read_statsfile has an additional parameter onlydbs that says that you don't need table/func stats - just the list of db entries. This is used for autovacuum launcher, which does not need to read the table/stats (if I'm reading the code in autovacuum.c correctly - it seems to be working as expected). So what are the benefits? (a) When a launcher asks for info about databases, something like this is called in the end: pgstat_read_db_statsfile(InvalidOid, false, true) which means all databases (InvalidOid) and only db info (true). So it reads only the one common file with db entries, not the table/func stats. (b) When a worker asks for stats for a given DB, something like this is called in the end: pgstat_read_db_statsfile(MyDatabaseId, false, false) which reads only the common stats file (with db entries) and only one file for the one database. The current implementation (with the single pgstat.stat file), all the data had to be read (and skipped silently) in both cases. That's a lot of CPU time, and we're seeing ~60% of CPU spent on doing just this (writing/reading huge statsfile). So with a lot of databases/objects, this pgstat.stat split saves us a lot of CPU ... (c) This should lower the space requirements too - with a single file, you actually need at least 2x the disk space (or RAM, if you're using tmpfs as we are), because you need to keep two versions of the file at the same time (pgstat.stat and pgstat.tmp). Thanks to this split you only need additional space for a copy of the largest piece (with some reasonable safety reserve). Well, it's very early patch, so there are rough edges too (a) It does not solve the many-schema scenario at all - that'll need a completely new approach I guess :-( (b) It does not solve the writing part at all - the current code uses a single timestamp (last_statwrite) to decide if a new file needs to be written. That clearly is not enough for multiple files - there should be one timestamp for each database/file. I'm thinking about how to solve this and how to integrate it with pgstat_send_inquiry etc. One way might be adding the timestamp(s) into PgStat_StatDBEntry and the other one is using an array of inquiries for each database. And yet another one I'm thinking about is using a fixed-length array of timestamps (e.g. 256), indexed by mod(dboid,256). That would mean stats for all databases with the same mod(oid,256) would be written at the same time. Seems like an over-engineering though. (c) I'm a bit worried about the number of files - right now there's one
Re: [HACKERS] pg_dump --split patch
Hi, On 16/11/2012 15:52, Dimitri Fontaine wrote: Marko Tiikkaja pgm...@joh.to writes: The general output scheme looks like this: schemaname/OBJECT_TYPES/object_name.sql, I like this feature, I actually did have to code it myself in the past and several other people did so, so we already have at least 3 copies of `getddl` variants around. I really think this feature should be shipped by default with PostgreSQL. I don't much care for the all uppercase formating of object type directories in your patch though. *shrug* I have no real preference to one way or the other. Overloaded functions are dumped into the same file. Object names are encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by replacing any characters outside that set with an underscore. What happens if you have a table foo and another table FoO? They would go to the same file. If you think there are technical issues behind that decision (e.g. the dump would not restore), I would like to hear an example case. On the other hand, some people might find it preferrable to have them in different files (for example foo, foo.1, foo.2 etc). Or some might prefer some other naming scheme. One of the problems with this patch is exactly that people prefer different things, and providing switches for all of the different options people come up with would mean a lot of switches. :-( Restoring the dump is supported through an index.sql file containing statements which include (through \i) the actual object files in the dump directory. I think we should be using \ir now that we have that. Good point, will have to get that fixed. Any thoughts? Objections on the idea or the implementation? As far as the implementation goes, someone with more experience on the Archiver Handles should have a look. To me, it looks like you are trying to shoehorn your feature in the current API and that doesn't feel good. It feels a bit icky to me too, but I didn't feel comfortable with putting in a lot of work to refactor the API because of how controversial this feature is. The holly grail here that we've been speaking about in the past would be to separate out tooling and formats so that we have: pg_dump | pg_restore pg_export | psql In that case we would almost certainly need libpgdump to share the code, and we maybe could implement a binary output option for pg_dump too (yeah, last time it was proposed we ended up with bytea_output = 'hex'). While I agree that this idea - when implemented - would be nicer in practically every way, I'm not sure I want to volunteer to do all the necessary work. That libpgdump idea basically means we won't have the --split feature in 9.3, and that's really bad, as we already are some releases late on delivering that, in my opinion. Maybe the pg_export and pg_dump tool could share code by just #include magic rather than a full blown lib in a first incantation? That's one idea.. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On 26.9.2012 18:29, Tom Lane wrote: What seems to me like it could help more is fixing things so that the autovac launcher needn't even launch a child process for databases that haven't had any updates lately. I'm not sure how to do that, but it probably involves getting the stats collector to produce some kind of summary file. Couldn't we use the PgStat_StatDBEntry for this? By splitting the pgstat.stat file into multiple pieces (see my other post in this thread) there's a file with StatDBEntry items only, so maybe it could be used as the summary file ... I've been thinking about this: (a) add needs_autovacuuming flag to PgStat_(TableEntry|StatDBEntry) (b) when table stats are updated, run quick check to decide whether the table needs to be processed by autovacuum (vacuumed or analyzed), and if yes then set needs_autovacuuming=true and both for table and database The worker may read the DB entries from the file and act only on those that need to be processed (those with needs_autovacuuming=true). Maybe the DB-level field might be a counter of tables that need to be processed, and the autovacuum daemon might act on those first? Although the simpler the better I guess. Or did you mean something else? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [RFC] Fix div/mul crash and more undefined behavior
The INT_MIN / -1 crash problem was partially addressed in 2006 and commit 9fc6f4e1ae107f44807c4906105e1f7eb052ecb1. http://archives.postgresql.org/pgsql-patches/2006-06/msg00102.php However, the fix is incomplete and incorrect for some cases. 64-bit crash Below is an example that crashes PostgreSQL on Windows, using the 64-bit binary from http://www.postgresql.org/download/windows/. SELECT ((-9223372036854775808)::int8) % (-1); Note that the previous discussion concluded that int8 (64-bit) division didn't crash, which is incorrect. http://archives.postgresql.org/pgsql-patches/2006-06/msg00104.php My guess is that the previous test was carried out on a 32-bit Windows, where there's no 64-bit division instruction. In that case, the generated code calls a runtime function (e.g., lldiv), which doesn't trap. However, on a 64-bit system, the compiler generates the idivq instruction, which leads to a crash. Note that the problem is not limited to division. The following multiplication crashes as well: SELECT ((-9223372036854775808)::int8) * ((-1)::int8); The reason is that the multiplication overflow check uses a division. 32-bit crash The previous fix doesn't fix all possible crashes, even on a 32-bit Windows. Below is an example to crash a 32-bit PostgreSQL: SELECT ((-2147483648)::int4) % ((-1)::int2); Portability === The previous fix uses #ifdef WIN32 ... #endif, which is not portable, as noted below: http://archives.postgresql.org/pgsql-patches/2006-06/msg00103.php Using a SIGFPE handler is also dangerous (e.g., causing an infinite loop sometimes): https://www.securecoding.cert.org/confluence/display/seccode/SIG35-C.+Do+not+return+from+SIGSEGV,+SIGILL,+or+SIGFPE+signal+handlers Strictly speaking, using postcondition checking to detect signed division overflows (actually, all signed integer overflows) violates the C language standard, because signed integer overflow is undefined behavior in C and we cannot first compute the result and then check for overflow. Compilers can do a lot of funny and crazy things (e.g., generating traps or even optimizing away overflow checks). BTW, PostgreSQL currently uses gcc's workaround option -fwrapv to disable offending optimizations based on integer overflows. The problems are: (1) it doesn't always work (e.g., for division), and (2) many other C compilers do not even support this workaround option and can perform offending optimizations. Patching Below is a patch that fixes division crashes. It removes #ifdef WIN32 and tries to use portable checks. I'll send more (e.g., for fixing multiplication crashes) if this looks good. I understand that the existing integer overflow checks tried to avoid dependency on constants like INT64_MIN. But I'm not sure how to perform simpler and portable overflow checks without using such constants. Also, I could use the SHRT_MIN rather than introducing INT16_MIN; I just feel like using INT16_MIN with int16 is clearer and less confusing. diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c index 9f752ed..d7867cb 100644 --- a/src/backend/utils/adt/int.c +++ b/src/backend/utils/adt/int.c @@ -732,30 +732,18 @@ int4div(PG_FUNCTION_ARGS) PG_RETURN_NULL(); } -#ifdef WIN32 - /* -* Win32 doesn't throw a catchable exception for SELECT -2147483648 / -* (-1); -- INT_MIN +* Overflow check. The only possible overflow case is for arg1 = +* INT32_MIN, arg2 = -1, where the correct result is -INT32_MIN, which +* can't be represented on a two's-complement machine. */ - if (arg2 == -1 arg1 == INT_MIN) + if (arg1 == INT32_MIN arg2 == -1) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); -#endif result = arg1 / arg2; - /* -* Overflow check. The only possible overflow case is for arg1 = INT_MIN, -* arg2 = -1, where the correct result is -INT_MIN, which can't be -* represented on a two's-complement machine. Most machines produce -* INT_MIN but it seems some produce zero. -*/ - if (arg2 == -1 arg1 0 result = 0) - ereport(ERROR, - (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), -errmsg(integer out of range))); PG_RETURN_INT32(result); } @@ -877,18 +865,17 @@ int2div(PG_FUNCTION_ARGS) PG_RETURN_NULL(); } - result = arg1 / arg2; - - /* -* Overflow check. The only possible overflow case is for arg1 = -* SHRT_MIN, arg2 = -1, where the correct result is -SHRT_MIN, which can't -* be represented on a two's-complement machine. Most machines produce -* SHRT_MIN but it seems some produce zero. + /*
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
Xi Wang xi.w...@gmail.com writes: [ patch adding a bunch of explicit INT_MIN/MAX constants ] I was against this style of coding before, and I still am. For one thing, it's just about certain to introduce conflicts against system headers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 pg_archivecleanup broken?
On Mon, Nov 19, 2012 at 12:43 AM, Erik Rijkers e...@xs4all.nl wrote: (In a test setup) I can't get pg_archivecleanup to remove WALfiles in 9.3devel. (A very similar setup in 9.2 works fine). In 9.3 pg_archivecleanup just keeps repeating lines like: pg_archivecleanup: keep WAL file /home/aardvark/pg_stuff/archive_dir93/ and later (and does not delete any files.) Configuration: # master pgsql.93_1/data/postgresql.conf: data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_1/data' listen_addresses = '*' max_connections = 100 shared_buffers = 128MB wal_level = hot_standby synchronous_commit = on checkpoint_segments = 3 archive_mode = on archive_command = 'cp %p /home/aardvark/pg_stuff/archive_dir93/%f /dev/null' max_wal_senders = 3 synchronous_standby_names = '*' # slave pgsql.93_2/data/postgresql.conf: data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_2/data' listen_addresses = '*' port = 6665 max_connections = 100 shared_buffers = 128MB wal_level = hot_standby synchronous_commit = on checkpoint_segments = 3 max_wal_senders = 3 synchronous_standby_names = '' hot_standby = on wal_receiver_status_interval = 59 # pgsql.93_2/data/recovery.conf primary_conninfo = 'host=127.0.0.1 port=6664 user=aardvark password=sekr1t application_name=wal_receiver_01' standby_mode = 'on' restore_command = 'cp /home/aardvark/pg_stuff/archive_dir93/%f %p /dev/null' archive_cleanup_command = 'pg_archivecleanup -d /home/aardvark/pg_stuff/archive_dir93 %r' Seeing that the same setup in 9.2 has pg_archivecleanup deleting files, it would seem that some bug exists but I haven't followed changes regarding WAL too closely. Thanks for the report! I was able to reproduce this problem. What's broken is not pg_archivecleanup itself but %r in archive_cleanup_command which is replaced by the name of the file containing the last valid restart point. In 9.3dev, %r is always replaced by an invalid WAL filename (i.e., ) wrongly. This bug is derived from the commit d5497b95f3ca2fc50c6eef46d3394ab6e6855956. This commit changed ExecuteRecoveryCommand() so that it calculates the the last valid retart file by using GetOldestRestartPoint(), even though GetOldestRestartPoint() only works in the startup process and only while WAL replay is in progress (i.e., InRedo = true). In archive_cleanup_command, ExecuteRecoveryCommand() is executed by the checkpointer process, so the problem happened. I found recovery_end_command also has the same bug because it calls ExecuteRecoveryComand() after WAL replay is completed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parser - Query Analyser
On 11/18/2012 09:57 PM, Michael Giannakopoulos wrote: Hi guys, Thanks for your answers. Yes, what I meant is to create a function that takes as an input rows of a specific relation, does something and returns as an output rows with different attributes. I am experimenting right now with the 'CREATE TYPE' and 'CREATE FUNCTION' commands in order to see what I can get out of them! I have replied on pgsql-general. Please do not top-post, and reply to the list not directly to me. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote: Run the modes in reciprocating order? Sorry, I didn't understood this, What do you mean by modes in reciprocating order? Sorry for the long delay. In your scripts, it looks like you always run the unpatched first, and then the patched second. By reciprocating, I mean to run them in the reverse order, or in random order. Also, for the select only transactions, I think that 20 minutes is much longer than necessary. I'd rather see many more runs, each one being shorter. Because you can't restart the server without wiping out the shared_buffers, what I would do is make a test patch which introduces a new guc.c setting which allows the behavior to be turned on and off with a SIGHUP (pg_ctl reload). I haven't been able to detect any reliable difference in performance with this patch. I've been testing with 150 scale factor with 4GB of ram and 4 cores, over a variety of shared_buffers and concurrencies. I think the main reason for this is that when shared buffers are less, then there is no performance gain, even the same is observed by me when I ran this test with shared buffers=2G, there is no performance gain. Please see the results of shared buffers=2G in below mail: http://archives.postgresql.org/pgsql-hackers/2012-09/msg00422.php True, but I think that testing with shared_buffers=2G when RAM is 4GB (and pgbench scale is also lower) should behave different than doing so when RAM is 24 GB. The reason I can think of is because when shared buffers are less then clock sweep runs very fast and there is no bottleneck. Only when shared buffers increase above some threshhold, it spends reasonable time in clock sweep. I am rather skeptical of this. When the work set doesn't fit in memory under a select-only workload, then about half the buffers will be evictable at any given time, and half will have usagecount=1, and a handful will usagecount=4 (index meta, root and branch blocks). This will be the case over a wide range of shared_buffers, as long as it is big enough to hold all index branch blocks but not big enough to hold everything. Given this state of affairs, the average clock sweep should be about 2, regardless of the exact size of shared_buffers. The one wrinkle I could think of is if all the usagecount=1 buffers are grouped into a continuous chunk, and all the usagecount=0 are in another chunk. The average would still be 2, but the average would be made up of N/2 runs of length 1, followed by one run of length N/2. Now if 1 process is stuck in the N/2 stretch and all other processes are waiting on that, maybe that somehow escalates the waits so that they are larger when N is larger, but I still don't see how the math works on that. Are you working on this just because it was on the ToDo List, or because you have actually run into a problem with it? I've never seen freelist lock contention be a problem on machines with less than 8 CPU, but both of us are testing on smaller machines. I think we really need to test this on something bigger. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
On Fri, Nov 16, 2012 at 5:16 PM, Andrea Suisani sick...@opinioni.netwrote: Il 16/11/2012 05:34, Michael Paquier ha scritto: Do you have a git repository or something where all the 14 patches are applied? I would like to test the feature globally. Sorry I recall that you put a link somewhere but I cannot remember its email... http://archives.postgresql.org/pgsql-hackers/2012-11/msg00686.php Thanks Andrea. I am pretty sure I will be able to provide some feedback by Friday. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote: Today again I have again collected the data for configuration Shared_buffers = 7G along with vmstat. The data and vmstat information (bi) are attached with this mail. It is observed from vmstat info that I/O is happening for both cases, however after running for long time, the I/O is also comparatively less with new patch. What I see in the vmstat report is that it takes 5.5 runs to get really good and warmed up, and so it crawls for the first 5.5 benchmarks and then flies for the last 0.5 benchmark. The way you have your runs ordered, that last 0.5 of a benchmark is for the patched version, and this drives up the average tps for the patched case. Also, there is no theoretical reason to think that your patch would decrease the amount of IO needed (in fact, by invalidating buffers early, it could be expected to increase the amount of IO). So this also argues that the increase in performance is caused by the decrease in IO, but the patch isn't causing that decrease, it merely benefits from it due to an accident of timing. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] pg_ping utility
On Sat, Nov 17, 2012 at 2:48 AM, Phil Sorber p...@omniti.com wrote: On Thu, Nov 15, 2012 at 10:55 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Nov 16, 2012 at 12:34 PM, Phil Sorber p...@omniti.com wrote: On Thu, Nov 15, 2012 at 9:23 PM, Michael Paquier michael.paqu...@gmail.com wrote: 3) Having an output close to what ping actually does would also be nice, the current output like Accepting/Rejecting Connections are not that Could you be more specific? Are you saying you don't want to see accepting/rejecting info output? OK sorry. I meant something like that for an accessible server: $ pg_ping -c 3 -h server.com PING server.com (192.168.1.3) accept from 192.168.1.3: icmp_seq=0 time=0.241 ms accept from 192.168.1.3: icmp_seq=0 time=0.240 ms accept from 192.168.1.3: icmp_seq=0 time=0.242 ms Like that for a rejected connection: reject from 192.168.1.3: icmp_seq=0 time=0.241 ms Like that for a timeout: timeout from 192.168.1.3: icmp_seq=0 Then print 1 line for each ping taken to stdout. How does icmp_seq fit into this? Or was that an oversight? Yes, sorry it doesn't fit in this model. Please forget about it. Also, in standard ping if you don't pass -c it will continue to loop until interrupted. Would you suggest that pg_ping mimic that, or that we add an additional flag for that behavior? By targeting pg_ping as a clone of ping, yes it would mean that we target it to loop indefinitely if no c flags is given. FWIW, I would use 'watch' with the existing output for cases that I would need something like that. watch allows you to launch a program given a certain time period. I am not sure this is related with pinging a server. When pinging a server, what you are looking for is not only the connectivity to it but also the latency you have with it, no? -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [WIP] pg_ping utility
On Fri, Nov 16, 2012 at 2:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe I missed something here, but I believe it's standard that program --help should result in exit(0), no matter what the program's exitcode conventions are for live-fire exercises. Yes indeed you are right. Thanks. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
On 11/18/12 6:47 PM, Tom Lane wrote: Xi Wang xi.w...@gmail.com writes: [ patch adding a bunch of explicit INT_MIN/MAX constants ] I was against this style of coding before, and I still am. For one thing, it's just about certain to introduce conflicts against system headers. I totally agree. I would be happy to rewrite the integer overflow checks without using these explicit constants, but it seems extremely tricky to do so. One possible check without using INTn_MIN is: if (arg1 0 -arg1 0 arg2 == -1) { ... } Compared to (arg1 == INTn_MIN arg2 == -1), the above check is not only more confusing and difficult to understand, but it also invokes undefined behavior (-INT_MIN overflow), which is dangerous: many C compilers will optimize away the check. I've tried gcc, clang, PathScale, and AMD's Open64, all of which perform such optimizations. Since INTn_MIN and INTn_MAX are standard macros from the C library, can we assume that every C compiler should provide them in stdint.h? At least this is true for gcc, clang, and Visual C++. Then we don't have to define them and worry about possible conflicts (though I think using #ifndef...#endif should be able to avoid conflicts). - xi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm partial-match
On 15.11.2012 20:39, Fujii Masao wrote: Hi, I'd like to propose to extend pg_trgm so that it can compare a partial-match query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial' GIN method for pg_trgm. Currently, when the query key is less than three characters, we cannot use a GIN index (+ pg_trgm) efficiently, because pg_trgm doesn't support a partial-match method. In this case, seq scan or index full scan would be executed, and its response time would be very slow. I'd like to alleviate this problem. Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, i.e., if query key contains multibyte characters. In this case, byte length of the trigram string might be larger than three, and its CRC is used as a trigram key instead of the trigram string itself. Because of using CRC, we cannot do a partial-match. Attached patch extends pg_trgm so that it compares a partial-match query key only when KEEPONLYALNUM is enabled. Attached patch is WIP yet. What I should do next is: * version up pg_trgm from 1.0 to 1.1, i.e., create pg_trgm--1.1.sql, etc. * write the regression test Comments? Review? Objection? Hi, I've done a quick review of the current patch: (1) It applies cleanly on the current master and builds fine. (2) In pg_trgm--1.0.sql the gin_trgm_compare_partial is indented differently (using tabs instead of spaces). (3) In trgm_gin.c, function gin_extract_value_trgm contains #ifdef KEEPONLYALNUM, although trgm_pmatch is not used at all. (4) The patch removes some commented-out variables, but there still remain various commented-out variables. Will this be cleaned too? (5) I've done basic functionality of the patch, it really seems to work: CREATE EXTENSION pg_trgm ; CREATE TABLE TEST (val TEXT); INSERT INTO test SELECT md5(i::text) FROM generate_series(1,100) s(i); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; EXPLAIN SELECT * FROM test WHERE val LIKE '%aa%'; QUERY PLAN Bitmap Heap Scan on test (cost=1655.96..11757.63 rows=141414 width=33) Recheck Cond: (val ~~ '%aa%'::text) - Bitmap Index Scan on trgm_idx (cost=0.00..1620.61 rows=141414 width=0) Index Cond: (val ~~ '%aa%'::text) (4 rows) Without the patch, this gives a seq scan (as expected). Do you expect to update the docs too? IMHO it's worth mentioning that the pg_trgm can handle even patterns shorter than 2 chars ... regards Tomas Vondra -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Sunday, November 18, 2012 3:08 AM Fujii Masao wrote: On Sat, Nov 17, 2012 at 10:25 PM, Amit Kapila amit.kap...@huawei.com wrote: 1. have a system table pg_global_system_settings(key,value) Do we really need to store the settings in a system table? Since WAL would be generated when storing the settings in a system table, this approach seems to prevent us from changing the settings in the standby. Thanks for your feedback. It is one of the Approach, we were trying to evaluate for this feature. However this feature can be done by directly operating on file as well. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Patch to fix missing libecpg_compat.lib and libpgtypes.lib.
hi I install postgresql-9.1.5 from source code on windows successfully. But there is not libecpg_compat.lib and libpgtypes.lib in the installed lib directory . If someone used functions of pgtypes or ecpg_compat library in ecpg programs,the ecpg program build will fail. So i modify src/tools/msvc/Install.pm to resolve this problem. The diff file refer to the attachment Install.pm.patch. Regards, Jiang Guiqing diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm index 6036714..91e6a53 100644 --- a/src/tools/msvc/Install.pm +++ b/src/tools/msvc/Install.pm @@ -86,7 +86,7 @@ sub Install 'Import libraries', $target . '/lib/', $conf\\, postgres\\postgres.lib, libpq\\libpq.lib, libecpg\\libecpg.lib, - libpgport\\libpgport.lib); + libpgport\\libpgport.lib,libpgtypes\\libpgtypes.lib,libecpg_compat\\libecpg_compat.lib); CopySetOfFiles( 'timezone names', [ glob('src\timezone\tznames\*.txt') ], -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm partial-match
Hi! On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao masao.fu...@gmail.com wrote: Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, i.e., if query key contains multibyte characters. In this case, byte length of the trigram string might be larger than three, and its CRC is used as a trigram key instead of the trigram string itself. Because of using CRC, we cannot do a partial-match. Attached patch extends pg_trgm so that it compares a partial-match query key only when KEEPONLYALNUM is enabled. Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte? CREATE TABLE test (val TEXT); INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш'); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; test=# SELECT * FROM test WHERE val LIKE '%aa%'; val -- aa aaa шaaш (3 rows) test=# set enable_seqscan = off; SET test=# SELECT * FROM test WHERE val LIKE '%aa%'; val - aa aaa (2 rows) I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this). -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes
Hi, I am just looking at this patch and will provide some comments. By the way, you forgot the installation part of pg_receivellog, please see patch attached. Thanks, On Thu, Nov 15, 2012 at 10:17 AM, Andres Freund and...@2ndquadrant.comwrote: --- src/bin/pg_basebackup/Makefile | 7 +- src/bin/pg_basebackup/pg_receivellog.c | 717 + src/bin/pg_basebackup/streamutil.c | 3 +- src/bin/pg_basebackup/streamutil.h | 1 + 4 files changed, 725 insertions(+), 3 deletions(-) create mode 100644 src/bin/pg_basebackup/pg_receivellog.c -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Michael Paquier http://michael.otacoo.com 20121119_pg_receivellog_install.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Sunday, November 18, 2012 3:28 AM Tom Lane wrote: Fujii Masao masao.fu...@gmail.com writes: Do we really need to store the settings in a system table? Since WAL would be generated when storing the settings in a system table, this approach seems to prevent us from changing the settings in the standby. That's a really good point: if we try to move all GUCs into a system table, there's no way for a standby to have different values; and for some of them different values are *necessary*. I think that shoots down this line of thought entirely. Can we go back to the plain write a file approach now? Sure. I think a SET PERSISTENT command that's disallowed in transaction blocks and just writes the file immediately is perfectly sensible. I got the point that we can disallow inside transaction blocks. Just to clarify, that by above do you mean to say that file write (rename from .conf.lock to .conf) should be done as soon as command execution ends rather than the transaction end or it should be done at transaction end? Still I think we are not able to completely rule out one or other from syntax perspective. We have discussion about below 3 different syntaxes for this command 1. ALTER SYSTEM 2. SET PERSISENT 3. pg_change_conf() I think to conclude, we need to evaluate which syntax has more advantages. Comparison for above syntax 1. If we want to allow multiple configuration parameters now or in future to be updated in single command. Example a. Alter System Set port = 5435, max_connections = 100; b. Select pg_change_conf('port', 5435), pg_change_conf('max_connections',100); I think it might be convenient for user to use Command syntax. 2. If we provide built-in, user can try to use in some complicated syntax Select 1/0 from tbl where a= pg_change_conf('max_connections',100); The design and test needs to take care of such usage, so that it doesn't create any problem. 3. Using with the SET command syntax can create some confusion for user, as SET SESSION | LOCAL option can work in transaction blocks, but this feature may not be required to work in transaction blocks as it will change in config file which can take effect only on re-start or sighup. I believe some more thoughts and suggestions are required to conclude. Thoughts/Suggestions/Comments? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Sunday, November 18, 2012 3:22 PM Cédric Villemain wrote: Le samedi 17 novembre 2012 22:57:49, Tom Lane a écrit : Fujii Masao masao.fu...@gmail.com writes: Do we really need to store the settings in a system table? Since WAL would be generated when storing the settings in a system table, this approach seems to prevent us from changing the settings in the standby. That's a really good point: if we try to move all GUCs into a system table, there's no way for a standby to have different values; and for some of them different values are *necessary*. I think that shoots down this line of thought entirely. Can we go back to the plain write a file approach now? I think a SET PERSISTENT command that's disallowed in transaction blocks and just writes the file immediately is perfectly sensible. I was justifying the usage of a table structure, not to keep it in sync (just use it to hide the complexity of locks). Anyway that was just comments. Thanks. You comments are thought provoking. I was able to proceed for table related approach based on your suggestions. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
Hi Andres, I have been able to fetch your code (thanks Andrea!) and some it. For the time being I am spending some time reading the code and understanding the whole set of features you are trying to implement inside core, even if I got some background from what you presented at PGCon and from the hackers ML. Btw, as a first approach, I tried to run the logical log receiver plugged on a postgres server, and I am not able to make it work. Well, I am using settings similar to yours. # Run master rm -r ~/bin/pgsql/master/ initdb -D ~/bin/pgsql/master/ echo local replication $USER trust ~/bin/pgsql/master/pg_hba.conf postgres -D ~/bin/pgsql/master \ -c wal_level=logical \ -c max_wal_senders=10 \ -c max_logical_slots=10 \ -c wal_keep_segments=100 \ -c log_line_prefix=[%p %x] # Logical log receiver pg_receivellog -f $HOME/output.txt -d postgres -v After launching some SQLs, the logical receiver is stuck just after sending INIT_LOGICAL_REPLICATION, please see bt of process waiting: (gdb) bt #0 0x7f1bbc13b170 in __poll_nocancel () from /usr/lib/libc.so.6 #1 0x7f1bbc43072d in pqSocketPoll (sock=3, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1089 #2 0x7f1bbc43060d in pqSocketCheck (conn=0x1dd0290, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1031 #3 0x7f1bbc4304dd in pqWaitTimed (forRead=1, forWrite=0, conn=0x1dd0290, finish_time=-1) at fe-misc.c:963 #4 0x7f1bbc4304af in pqWait (forRead=1, forWrite=0, conn=0x1dd0290) at fe-misc.c:946 #5 0x7f1bbc42c64c in PQgetResult (conn=0x1dd0290) at fe-exec.c:1709 #6 0x7f1bbc42cd62 in PQexecFinish (conn=0x1dd0290) at fe-exec.c:1974 #7 0x7f1bbc42c9c8 in PQexec (conn=0x1dd0290, query=0x406c60 INIT_LOGICAL_REPLICATION 'test_decoding') at fe-exec.c:1808 #8 0x00402370 in StreamLog () at pg_receivellog.c:263 #9 0x004030c9 in main (argc=6, argv=0x7fff44edb288) at pg_receivellog.c:694 So I am not able to output any results using pg_receivellog. Also, I noticed 2 errors in your set of tests. On Thu, Nov 15, 2012 at 9:27 AM, Andres Freund and...@anarazel.de wrote: -- wrapped in a transaction BEGIN; INSERT INTO replication_example(somedata, text) VALUES (1, 1); UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); In SET clause, the column name is *somedata* and not *somedate* -- dont write out aborted data BEGIN; INSERT INTO replication_example(somedata, text) VALUES (2, 1); UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); Same error here, *somedata* and not *somedate*. Not a big deal, it made the transactions failing though. -- Michael Paquier http://michael.otacoo.com