Re: [HACKERS] 9.3 pg_archivecleanup broken?
On 19.11.2012 02:17, Fujii Masao wrote: This bug is derived from the commit d5497b95f3ca2fc50c6eef46d3394ab6e6855956. This commit changed ExecuteRecoveryCommand() so that it calculates the the last valid retart file by using GetOldestRestartPoint(), even though GetOldestRestartPoint() only works in the startup process and only while WAL replay is in progress (i.e., InRedo = true). In archive_cleanup_command, ExecuteRecoveryCommand() is executed by the checkpointer process, so the problem happened. Fixed. Thanks for the diagnosis! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gset updated patch
Hello 2012/11/16 Karl O. Pinc k...@meme.com: Hi Pavel, On 11/16/2012 12:21:11 AM, Pavel Stehule wrote: 2012/11/16 Karl O. Pinc k...@meme.com: As long as I'm talking crazy talk, why not abandon psql as a shell language and instead make a pl/pgsql interpreter with readlne() in front of it? Solve all these language-related issues by using an actual programming language. :-) I though about it more time, but I don't thinking so this has a sense. Actually we cannot do perfect autocomplete for significantly simpler SQL and there are lot of client side interprets - is not reason for next one. I use psql together bash and it works well. But just very simple task as storing some volatile data for repetitive usage is relative laborious and it is a motivation for this patch. In psql I can simply work with any fields of returned record - what is more terrible work outside psql You might consider using do. it is reason, why I don't thinking about plpgsql on client side. But I don't understand how it is related to gset ? I remember, there is one significant limit of DO statement - it cannot return table - so it cannot substitute psql simple scripts. But I don't would open this topic now - it is related to real stored procedures implementation, and it is long time task. So gset allow some simple tasks solve simply Regards Pavel Stehule http://www.postgresql.org/docs/9.1/static/sql-do.html If you need to maintain a single connection you can do some interesting things with socat to feed a running psql in the background. socat -u UNIX-RECV:/tmp/msock EXEC:psql Followed by lots of echo bar | socat -u STDIN UNIX-SENDTO:/tmp/mysock \o can be used to send output for pickup, although you do need to fuss around with the asynchronous nature of things to be sure you're waiting for output. I used inotifywait for this. YMMV. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
Marko Tiikkaja pgm...@joh.to writes: What happens if you have a table foo and another table FoO? They would go to the same file. If you think there are technical issues behind that decision (e.g. the dump would not restore), I would like to hear an example case. I didn't try the patch itself yet so I wanted to hear that's something you did actually try out, that's about it :) As soon as we're past the initial agreement on this patch landing in (which I really want to see happen), I'll devote some time on testing it. On the other hand, some people might find it preferrable to have them in different files (for example foo, foo.1, foo.2 etc). Or some might prefer some other naming scheme. One of the problems with this patch is exactly that people prefer different things, and providing switches for all of the different options people come up with would mean a lot of switches. :-( I think this facility should provide something simple and useful, and not something tasty. Database backups and exports are not meant to cater with taste, they are meant to be easy to restore. In that very case, we want to have a set of properly organized SQL files for doing partial restores, right? It feels a bit icky to me too, but I didn't feel comfortable with putting in a lot of work to refactor the API because of how controversial this feature is. +1 pg_dump | pg_restore pg_export | psql While I agree that this idea - when implemented - would be nicer in practically every way, I'm not sure I want to volunteer to do all the necessary work. What I think needs to happen now is a commiter's buy in that we want to get there at some point and that your current patch is not painting us into any corner now. So that we can accept it and have a documented path forward. Regards, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Josh Berkus wrote: It would be nice for the user to have some way to know that a matview is empty due to never being LOADed or recently being TRUNCATEd. However, I don't think that relisvalid flag -- and preventing scanning the relation -- is a good solution. What I'd rather have instead is a timestamp of when the MV was last LOADed. If the MV was never loaded (or was truncated) that timestamp would be NULL. Such a timestamp would allow users to construct all kinds of ad-hoc refresh schemes for MVs which would not be possible without it. +1 Kevin Grittner wrote: I see your point there; I'll think about that. My take was more that MVs would often be refreshed by crontab, and that you would want to keep subsequent steps from running and generating potentially plausible but completely inaccurate results if the LMV failed. If one of these subsequent steps doesn't care if refresh failed once, it shouldn't be forced to fail. I imagine that for many applications yesterday's data can be good enough. Those that care should check the timestamp. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
Hi Michael, On 2012-11-19 16:28:55 +0900, Michael Paquier wrote: I have been able to fetch your code (thanks Andrea!) and some it. For the time being I am spending some time reading the code and understanding the whole set of features you are trying to implement inside core, even if I got some background from what you presented at PGCon and from the hackers ML. Cool. Btw, as a first approach, I tried to run the logical log receiver plugged on a postgres server, and I am not able to make it work. Well, I am using settings similar to yours. # Run master rm -r ~/bin/pgsql/master/ initdb -D ~/bin/pgsql/master/ echo local replication $USER trust ~/bin/pgsql/master/pg_hba.conf postgres -D ~/bin/pgsql/master \ -c wal_level=logical \ -c max_wal_senders=10 \ -c max_logical_slots=10 \ -c wal_keep_segments=100 \ -c log_line_prefix=[%p %x] # Logical log receiver pg_receivellog -f $HOME/output.txt -d postgres -v After launching some SQLs, the logical receiver is stuck just after sending INIT_LOGICAL_REPLICATION, please see bt of process waiting: Its waiting till it sees initial an initial xl_running_xacts record. The easiest way to do that is manually issue a checkpoint. Sorry, should have included that in the description. Otherwise you can wait till the next routine checkpoint comes arround... I plan to cause more xl_running_xacts record to be logged in the future. I think the timing of those currently is non-optimal, you have the same problem as here in normal streaming replication as well :( -- wrapped in a transaction BEGIN; INSERT INTO replication_example(somedata, text) VALUES (1, 1); UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT currval('replication_example_id_seq')); In SET clause, the column name is *somedata* and not *somedate* Crap. Sorry for that. I wrote the example in the mailclient and then executed it and I seem to have forgot to put back some of the fixes... I am just looking at this patch and will provide some comments. By the way, you forgot the installation part of pg_receivellog, please see patch attached. That actually was somewhat intended, I thought people wouldn't like the name and I didn't want a binary thats going to be replaced anyway lying arround ;) Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm partial-match
On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao masao.fu...@gmail.comwrote: Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, i.e., if query key contains multibyte characters. In this case, byte length of the trigram string might be larger than three, and its CRC is used as a trigram key instead of the trigram string itself. Because of using CRC, we cannot do a partial-match. Attached patch extends pg_trgm so that it compares a partial-match query key only when KEEPONLYALNUM is enabled. Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte? CREATE TABLE test (val TEXT); INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш'); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; test=# SELECT * FROM test WHERE val LIKE '%aa%'; val -- aa aaa шaaш (3 rows) test=# set enable_seqscan = off; SET test=# SELECT * FROM test WHERE val LIKE '%aa%'; val - aa aaa (2 rows) I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this). Actually, I also was fiddling around idea of partial match on trigrams when I was working on initial LIKE patch. But, I concluded that we would need a separate opclass which always keeps full trigram in entry. -- With best regards, Alexander Korotkov.
Re: [HACKERS] too much pgbench init output
Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line. However, I see, there are two types of users here: 1. Who likes these log lines, so that they can troubleshoot some slowness and all 2. Who do not like these log lines. So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high. However, assuming we settled on 5 sec delay, here are few comments on that patch attached: Comments: = Patch gets applied cleanly with some whitespace errors. make and make install too went smooth. make check was smooth. Rather it should be smooth since there are NO changes in other part of the code rather than just pgbench.c and we do not have any test-case as well. However, here are few comments on changes in pgbench.c 1. Since the final discussion ended at keeping a 5 seconds interval will be good enough, Author used a global int variable for that. Given that it's just a constant, #define would be a better choice. 2. +/* let's not call the timing for each row, but only each 100 rows */ Why only 100 rows ? Have you done any testing to come up with number 100 ? To me it seems very low. It will be good to test with 1K or even 10K. On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So checking every 100 rows looks overkill. 3. Please indent following block as per the indentation just above that /* used to track elapsed time and estimate of the remaining time */ instr_timestart, diff; double elapsed_sec, remaining_sec; int log_interval = 1; 4. +/* have ve reached the next interval? */ Do you mean have WE reached... 5. While applying a patch, I got few white-space errors. But I think every patch goes through pgindent which might take care of this. Thanks On Sun, Nov 11, 2012 at 11:02 PM, Tomas Vondra t...@fuzzy.cz wrote: On 23.10.2012 18:21, Robert Haas wrote: On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tomas Vondra wrote: I've been thinking about this a bit more, and do propose to use an option that determines logging step i.e. number of items (either directly or as a percentage) between log lines. The attached patch defines a new option --logging-step that accepts either integers or percents. For example if you want to print a line each 1000 lines, you can to this $ pgbench -i -s 1000 --logging-step 1000 testdb I find it hard to get excited about having to specify a command line argument to tweak this. Would it work to have it emit messages depending on elapsed time and log scale of tuples emitted? So for example emit the first message after 5 seconds or 100k tuples, then back off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is to make it verbose enough to keep a human satisfied with what he sees, but not flood the terminal with pointless updates. (I think printing the ETA might be nice as well, not sure). I like this idea. One of the times when the more verbose output is really useful is when you expect it to run fast but then it turns out that for some reason it runs really slow. If you make the output too terse, then you end up not really knowing what's going on. Having it give an update at least every 5 seconds would be a nice way to give the user a heads-up if things aren't going as planned, without cluttering the normal case. I've prepared a patch along these lines. The attached version used only elapsed time to print the log messages each 5 seconds, so now it prints a meessage each 5 seconds no matter what, along with an estimate of remaining time. I've removed the config option, although it might be useful to specify the interval? I'm not entirely sure how the 'log scale of tuples' should work - for example when the time 15 seconds limit is reached, should it be reset back to the previous step (5 seconds) to give a more detailed info, or should it be kept at 15 seconds? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This
Re: [HACKERS] foreign key locks
On 2012-11-14 13:27:26 -0300, Alvaro Herrera wrote: Alvaro Herrera wrote: * In heap_lock_tuple's XMAX_IS_MULTI case [snip] why is it membermode mode and not membermode = mode? Uh, that's a bug. Fixed. As noticed in the comment above that snippet, there was a deadlock possible here. Maybe I should add a test to ensure this doesn't happen. Done: https://github.com/alvherre/postgres/commit/df2847e38198e99f57e52490e1e9391ebb70d770 Some more review bits, based on ffd6250d1d393f2ecb9bfc55c2c6f715dcece557 - if oldestMultiXactId + db is set and then that database is dropped we seem to have a problem because MultiXactAdvanceOldest won't overwrite those values. Should probably use SetMultiXactIdLimit directly. - what stop multixacts only being filled out (i.e RecordNewMultiXact()) *after* the XLogInsert() *and* after a MultiXactGetCheckptMulti()? Afaics MultiXactGenLock is not hold in CreateMultiXactId(). If we crash in that moment we loose the multixact data which now means potential data loss... - multixact member group data crossing 512 sector boundaries makes me uneasy (as its 5 bytes). I don't really see a scenario where its dangerous, but ... Does anybody see a problem here? - there are quite some places that do multiStopLimit = multiWrapLimit - 100; if (multiStopLimit FirstMultiXactId) multiStopLimit -= FirstMultiXactId; perhaps MultiXactIdAdvance and MultiXactIdRetreat macros are in order? - I find using a default: clause in switches with enum types where everything is expected to be handled like the following a bad idea, this way the compiler won't warn you if youve missed case's which makes changing/extending code harder: switch (rc-strength) { case LCS_FORNOKEYUPDATE: newrc-markType = ROW_MARK_EXCLUSIVE; break; case LCS_FORSHARE: newrc-markType = ROW_MARK_SHARE; break; case LCS_FORKEYSHARE: newrc-markType = ROW_MARK_KEYSHARE; break; case LCS_FORUPDATE: newrc-markType = ROW_MARK_KEYEXCLUSIVE; break; default: elog(ERROR, unsupported rowmark type %d, rc-strength); } - #if 0 /* * The idea here is to remove the IS_MULTI bit, and replace the * xmax with the updater's Xid. However, we can't really do it: * modifying the Xmax is not allowed under our buffer locking * rules, unless we have an exclusive lock; but we don't know that * we have it. So the multi needs to remain in place :-( */ ResetMultiHintBit(tuple, buffer, xmax, true); #endif Three things: - HeapTupleSatisfiesUpdate is actually always called exclusively locked ;) - Extending something like LWLockHeldByMe to also return the current lockmode doesn't sound hard - we seem to be using #ifdef NOT_YET for such cases - Using a separate production for the lockmode seems to be nicer imo, not really important though for_locking_item: FOR UPDATE locked_rels_list opt_nowait ... | FOR NO KEY UPDATE locked_rels_list opt_nowait ... | FOR SHARE locked_rels_list opt_nowait ... | FOR KEY SHARE locked_rels_list opt_nowait ; - not really padding, MultiXactStatus is 4bytes... /* * XXX Note: there's a lot of padding space in MultiXactMember. We could * find a more compact representation of this Xlog record -- perhaps all the * status flags in one XLogRecData, then all the xids in another one? Not * clear that it's worth the trouble though. */ - why #define SizeOfMultiXactCreate (offsetof(xl_multixact_create, nmembers) + sizeof(int32)) and not #define SizeOfMultiXactCreate offsetof(xl_multixact_create, members) - starting a critical section in GetNewMultiXactId but not ending it is ugly, but not new Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3 - Source for Slony
First, you can add me to the list of people saying 'wow', I'm impressed. The approach I am taking to reviewing this to try and answer the following question 1) How might a future version of slony be able to use logical replication as described by your patch and design documents and what would that look like. 2) What functionality is missing from the patch set that would stop me from implementing or prototyping the above. Connecting slon to the remote postgresql Today the slony remote listener thread queries a bunch of events from sl_event for a batch of SYNC events. Then the remote helper thread queries data from sl_log_1 and sl_log_2.I see this changing, instead the slony remote listener thread would connect to the remote system and get a logical replication stream. 1) Would slony connect as a normal client connection and call something like 'select pg_slony_process_xlog(...)' to get bunch of logical replication change records to process. OR 2) Would slony connect as a replication connection similar to how the pg_receivelog program does today and then process the logical changeset outputs. Instead of writing it to a file (as pg_receivelog does) It seems that the second approach is what is encouraged. I think we would put a lot of the pg_receivelog functionality into slon and it would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the slony logical replication plugin. Slon would also have to provide feedback to the walsender about what it has processed so the origin database knows what catalog snapshots can be expired. Based on eyeballing pg_receivelog.c it seems that about half the code in the 700 file is related to command line arguments etc, and the other half is related to looping over the copy out stream, sending feedback and other things that we would need to duplicate in slon. pg_receivelog.c has comment: /* * We have to use postgres.h not postgres_fe.h here, because there's so much * backend-only stuff in the XLOG include files we need. But we need a * frontend-ish environment otherwise.Hence this ugly hack. */ This looks like more of a carryover from pg_receivexlog.c. From what I can tell we can eliminate the postgres.h include if we also eliminate the utils/datetime.h and utils/timestamp.h and instead add in: #include postgres_fe.h #define POSTGRES_EPOCH_JDATE 2451545 #define UNIX_EPOCH_JDATE 2440588 #define SECS_PER_DAY 86400 #define USECS_PER_SEC INT64CONST(100) typedef int64 XLogRecPtr; #define InvalidXLogRecPtr 0 If there is a better way of getting these defines someone should speak up. I recall that in the past slon actually did include postgres.h and it caused some issues (I think with MSVC win32 builds). Since pg_receivelog.c will be used as a starting point/sample for third parties to write client programs it would be better if it didn't encourage client programs to include postgres.h The Slony Output Plugin = Once we've modified slon to connect as a logical replication client we will need to write a slony plugin. As I understand the plugin API: * A walsender is processing through WAL records, each time it sees a COMMIT WAL record it will call my plugins .begin .change (for each change in the transaction) .commit * The plugin for a particular stream/replication client will see one transaction at a time passed to it in commit order. It won't see .change(t1) followed by .change (t2), followed by a second .change(t1). The reorder buffer code hides me from all that complexity (yah) From a slony point of view I think the output of the plugin will be rows, suitable to be passed to COPY IN of the form: origin_id, table_namespace,table_name,command_type, cmd_updatencols,command_args This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] I don't t think our output plugin will be much more complicated than the test_decoding plugin. I suspect we will want to give it the ability to filter out non-replicated tables. We will also have to filter out change records that didn't originate on the local-node that aren't part of a cascaded subscription. Remember that in a two node cluster slony will have connections from A--B and from B---A even if user tables only flow one way. Data that is replicated from A into B will show up in the WAL stream for B. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the past
Re: [HACKERS] logical changeset generation v3 - Source for Slony
Hi Steve! On 2012-11-17 22:50:35 -0500, Steve Singer wrote: First, you can add me to the list of people saying 'wow', I'm impressed. Thanks! The approach I am taking to reviewing this to try and answer the following question 1) How might a future version of slony be able to use logical replication as described by your patch and design documents and what would that look like. 2) What functionality is missing from the patch set that would stop me from implementing or prototyping the above. Sounds like a good plan to me. Connecting slon to the remote postgresql Today the slony remote listener thread queries a bunch of events from sl_event for a batch of SYNC events. Then the remote helper thread queries data from sl_log_1 and sl_log_2.I see this changing, instead the slony remote listener thread would connect to the remote system and get a logical replication stream. 1) Would slony connect as a normal client connection and call something like 'select pg_slony_process_xlog(...)' to get bunch of logical replication change records to process. OR 2) Would slony connect as a replication connection similar to how the pg_receivelog program does today and then process the logical changeset outputs. Instead of writing it to a file (as pg_receivelog does) It would need to be the latter. We need the feedback messages it sends for several purposes: - increasing the lowered xmin - implementing optionally synchronous replication at some point - using 1) would mean having transactions open... It seems that the second approach is what is encouraged. I think we would put a lot of the pg_receivelog functionality into slon and it would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the slony logical replication plugin. Slon would also have to provide feedback to the walsender about what it has processed so the origin database knows what catalog snapshots can be expired. Based on eyeballing pg_receivelog.c it seems that about half the code in the 700 file is related to command line arguments etc, and the other half is related to looping over the copy out stream, sending feedback and other things that we would need to duplicate in slon. I think we should provide some glue code to do this, otherwise people will start replicating all the bugs I hacked into this... More seriously: I think we should have support code here, no user will want to learn the intracacies of feedback messages and such. Where that would live? No idea. pg_receivelog.c has comment: (its pg_receivellog btw. ;)) /* * We have to use postgres.h not postgres_fe.h here, because there's so much * backend-only stuff in the XLOG include files we need. But we need a * frontend-ish environment otherwise.Hence this ugly hack. */ This looks like more of a carryover from pg_receivexlog.c. From what I can tell we can eliminate the postgres.h include if we also eliminate the utils/datetime.h and utils/timestamp.h and instead add in: #include postgres_fe.h #define POSTGRES_EPOCH_JDATE 2451545 #define UNIX_EPOCH_JDATE 2440588 #define SECS_PER_DAY 86400 #define USECS_PER_SEC INT64CONST(100) typedef int64 XLogRecPtr; #define InvalidXLogRecPtr 0 If there is a better way of getting these defines someone should speak up. I recall that in the past slon actually did include postgres.h and it caused some issues (I think with MSVC win32 builds). Since pg_receivelog.c will be used as a starting point/sample for third parties to write client programs it would be better if it didn't encourage client programs to include postgres.h I wholeheartedly aggree. It should also be cleaned up a fair bit before others copy it should we not go for having some client side library. Imo the library could very roughly be something like: state = SetupStreamingLLog(replication-slot, ...); while((message = StreamingLLogNextMessage(state)) { write(outfd, message-data, message-length); if (received_100_messages) { fsync(outfd); StreamingLLogConfirm(message); } } Although I guess thats not good enough because StreamingLLogNextMessage would be blocking, but that shouldn't be too hard to work around. The Slony Output Plugin = Once we've modified slon to connect as a logical replication client we will need to write a slony plugin. As I understand the plugin API: * A walsender is processing through WAL records, each time it sees a COMMIT WAL record it will call my plugins .begin .change (for each change in the transaction) .commit * The plugin for a particular stream/replication client will see one transaction at a time passed to it in commit order. It won't see .change(t1) followed by .change (t2), followed by a second .change(t1). The reorder buffer code hides me from all that complexity (yah) Correct. From a slony point of view I think the output of the plugin will be
Re: [HACKERS] foreign key locks
On 2012-11-14 13:27:26 -0300, Alvaro Herrera wrote: Alvaro Herrera wrote: * In heap_lock_tuple's XMAX_IS_MULTI case [snip] why is it membermode mode and not membermode = mode? Uh, that's a bug. Fixed. As noticed in the comment above that snippet, there was a deadlock possible here. Maybe I should add a test to ensure this doesn't happen. Done: https://github.com/alvherre/postgres/commit/df2847e38198e99f57e52490e1e9391ebb70d770 (I don't think this is worth a v24 submission). One more observation: /* * Get and lock the updated version of the row; if fail, return NULL. */ - copyTuple = EvalPlanQualFetch(estate, relation, LockTupleExclusive, + copyTuple = EvalPlanQualFetch(estate, relation, LockTupleNoKeyExclusive, That doesn't seem to be correct to me. Why is it ok to acquire a potentially too low locklevel here? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] review: Reduce palloc's in numeric operations
Hello all I tested this patch and I can confirm, so this patch can increase speed about 16-22% (depends on IO waits, load type). I tested real query (anonymyzed) SELECT SUM(COALESCE((a * b * c * (d + ((1 -(d)) * (1 -(e),0)) m1 FROM tab; -- patched 4493 26.5591 postgres slot_deform_tuple 1327 7.8442 postgres AllocSetAlloc 1313 7.7614 postgres ExecMakeFunctionResultNoSets 1105 6.5319 postgres set_var_from_num_nocopy 924 5.4620 postgres make_result 637 3.7654 postgres mul_var 635 3.7536 postgres numeric_mul 560 3.3103 postgres MemoryContextAlloc 405 2.3940 postgres AllocSetFree 389 2.2995 postgres ExecEvalScalarVarFast 332 1.9625 postgres slot_getsomeattrs 322 1.9034 postgres numeric_add 313 1.8502 postgres add_abs 304 1.7970 postgres pfree 238 1.4069 postgres slot_getattr 216 1.2768 postgres numeric_sub 200 1.1822 postgres heap_tuple_untoast_attr 183 1.0818 postgres strip_var 180 1.0640 postgres ExecEvalConst 173 1.0226 postgres alloc_var 172 1.0167 postgres check_stack_depth -- origin 4419 22.8325 postgres slot_deform_tuple 2041 10.5456 postgres AllocSetAlloc 1248 6.4483 postgres set_var_from_num 1186 6.1279 postgres ExecMakeFunctionResultNoSets 886 4.5779 postgres MemoryContextAlloc 856 4.4229 postgres make_result 757 3.9113 postgres numeric_mul 731 3.7770 postgres AllocSetFree 625 3.2293 postgres mul_var 601 3.1053 postgres alloc_var 545 2.8160 postgres pfree 503 2.5989 postgres free_var 458 2.3664 postgres slot_getsomeattrs 378 1.9531 postgres numeric_add 360 1.8601 postgres add_abs 336 1.7361 postgres ExecEvalScalarVarFast 266 1.3744 postgres slot_getattr 221 1.1419 postgres numeric_sub Review: 1) this patch was clearly applied and compilation was without warning 2) regress tests - All 133 tests passed. 4) don't see any memory leaks there (verified by following code) CREATE OR REPLACE FUNCTION public.fx(_m integer) RETURNS void LANGUAGE plpgsql AS $function$ declare m numeric = 10; n numeric = 20022.222; r numeric; begin for i in 1.._m loop r := m * n; end loop; end; $function$ postgres=# select fx(1000); fx (1 row) Time: 5312.623 ms --- original ( 4798.103 ms -- patched 10% speedup) 5) it respect PostgreSQL's coding standards 6) we would to accept this patch - it can carry 10% speedup calculations with numerics. Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
Dimitri Fontaine dimi...@2ndquadrant.fr writes: pg_dump | pg_restore pg_export | psql While I agree that this idea - when implemented - would be nicer in practically every way, I'm not sure I want to volunteer to do all the necessary work. What I think needs to happen now is a commiter's buy in that we want to get there at some point and that your current patch is not painting us into any corner now. So that we can accept it and have a documented path forward. Just stumbled accross this message while reading some older threads about the current topic: http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php Where Robert Treat said: I've both enjoyed reading this thread and seeing this wheel reinvented yet again, and wholeheartedly +1 the idea of building this directly into pg_dump. (The only thing better would be to make everything thing sql callable, but that's a problem for another day). I know Andrew has been working on his Retail DDL project which is basically a bunch of server-side functions that spits out SQL object definitions. Andrew, were you able to make progress on that project? On the other hand, pg_dump -Fs still is something I would like to have as a complement to Andrew's facility. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
Amit Kapila amit.kap...@huawei.com writes: We have discussion about below 3 different syntaxes for this command 1. ALTER SYSTEM 2. SET PERSISENT 3. pg_change_conf() I think to conclude, we need to evaluate which syntax has more advantages. Comparison for above syntax I think ALTER SYSTEM should be what Peter Eisentraut proposed in another thread, using system catalogs and thus not supporting the whole range of parameters and reset behavior on SIGHUP. That's still very useful, and seems to me clear enough to document. Then, I think you could implement a SET PERSISENT command that call the pg_change_conf() fonction. The problem is that you then can't have the command unavailable in a transaction block if all it does is calling the function, because the function call needs to happen in a transaction. I'd vote for having a lock that serialize any calls to that function. My understanding of the use cases makes it really ok not be to accept any concurrency behavior here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch, put RAISE USING keywords into a table
On 11/17/2012 12:16:06 AM, Peter Eisentraut wrote: I'm unsure whether splitting this out into a list (or table) is an improvement. Other opinions? This page is written as a narrative explanation of the RAISE feature, but there is probably a desire to also have it serve as a reference page in the style of the SQL command reference pages. Yes. I do want to be able to scan the page quickly to pick out the keywords and their use. There is no reference page. FWIW There are lists and tables in narrative parts of the documentation. The very next section, (PL/pgSQL) Trigger Procedures has a variablelist of special variables and The SQL Language top-level chapter, which claims to be narrative, has many tables. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Monday, November 19, 2012 7:53 PM Dimitri Fontaine wrote: Amit Kapila amit.kap...@huawei.com writes: We have discussion about below 3 different syntaxes for this command 1. ALTER SYSTEM 2. SET PERSISENT 3. pg_change_conf() I think to conclude, we need to evaluate which syntax has more advantages. Comparison for above syntax I think ALTER SYSTEM should be what Peter Eisentraut proposed in another thread, using system catalogs and thus not supporting the whole range of parameters and reset behavior on SIGHUP. That's still very useful, and seems to me clear enough to document. Then, I think you could implement a SET PERSISENT command that call the pg_change_conf() fonction. The problem is that you then can't have the command unavailable in a transaction block if all it does is calling the function, because the function call needs to happen in a transaction. If we want to go for SET PERSISTENT, then no need of built-in function call pg_change_conf(), the functionality can be implemented in some internal function. I believe still avoiding start of transaction is un-avoidable, as even parse of statement is called after StartTransaction. The only point I can see against SET PERSISTENT is that other variants of SET command can be used in transaction blocks means for them ROLLBACK TO SAVEPOINT functionality works, but for SET PERSISTENT, it can't be done. So to handle that might be we need to mention this point in User Manual, so that users can be aware of this usage. If that is okay, then I think SET PERSISTENT is good to go. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
Hi, Thanks for working on that cleanup job! Kohei KaiGai kai...@kaigai.gr.jp writes: The attached patch is the revised version of ALTER RENAME TO consolidation. According to the previous suggestion, it uses a common logic to check object-naming duplication at check_duplicate_objectname(). I think you need to better comment which object types are supported by the new generic function and which are not in AlterObjectRename(). At the code path on AlterObjectNamespace_internal, it lost ObjectType information to the supplied object, so I also added get_object_type() function at objectaddress.c for inverse translation from a couple of classId/objectId to OBJECT_* label. I don't understand that part, and it looks like it would be way better to find a way to pass down the information you already have earlier in the code than to compute it again doing syscache lookups… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
Amit Kapila escribió: The only point I can see against SET PERSISTENT is that other variants of SET command can be used in transaction blocks means for them ROLLBACK TO SAVEPOINT functionality works, but for SET PERSISTENT, it can't be done. So to handle that might be we need to mention this point in User Manual, so that users can be aware of this usage. If that is okay, then I think SET PERSISTENT is good to go. I think that's okay. There are other commands which have some forms that can run inside a transaction block and others not. CLUSTER is one example (maybe the only one? Not sure). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --split patch
On 11/19/2012 09:07 AM, Dimitri Fontaine wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: pg_dump | pg_restore pg_export | psql While I agree that this idea - when implemented - would be nicer in practically every way, I'm not sure I want to volunteer to do all the necessary work. What I think needs to happen now is a commiter's buy in that we want to get there at some point and that your current patch is not painting us into any corner now. So that we can accept it and have a documented path forward. Just stumbled accross this message while reading some older threads about the current topic: http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php Where Robert Treat said: I've both enjoyed reading this thread and seeing this wheel reinvented yet again, and wholeheartedly +1 the idea of building this directly into pg_dump. (The only thing better would be to make everything thing sql callable, but that's a problem for another day). I know Andrew has been working on his Retail DDL project which is basically a bunch of server-side functions that spits out SQL object definitions. Andrew, were you able to make progress on that project? On the other hand, pg_dump -Fs still is something I would like to have as a complement to Andrew's facility. No, sorry, it's on hold - I'm finding trouble finding time to work on it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gset updated patch
On 11/19/2012 02:30:49 AM, Pavel Stehule wrote: Hello 2012/11/16 Karl O. Pinc k...@meme.com: Hi Pavel, On 11/16/2012 12:21:11 AM, Pavel Stehule wrote: 2012/11/16 Karl O. Pinc k...@meme.com: As long as I'm talking crazy talk, why not abandon psql as a shell language and instead make a pl/pgsql interpreter with readlne() in front of it? Solve all these language-related issues by using an actual programming language. :-) I though about it more time, but I don't thinking so this has a sense. You might consider using do. it is reason, why I don't thinking about plpgsql on client side. But I don't understand how it is related to gset ? Because the plpgsql SELECT INTO sets variables from query results, exactly what \gset does. You have to use EXECUTE in plpgsql to do the substitution into statements, but that's syntax. I remember, there is one significant limit of DO statement - it cannot return table - so it cannot substitute psql simple scripts. Yes. I'm wrong. For some reason I thought you could use DO to make an anonymous code block that would act as a SETOF function, allowing RETURN NEXT expr (et-al) to be used in the plpgsql code, allowing DO to return table results. (Or, perhaps, instead, be used in place of a table in a SELECT statement.) Oh well. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
Hi Dimitri, Thanks for your checks. 2012/11/19 Dimitri Fontaine dimi...@2ndquadrant.fr: Kohei KaiGai kai...@kaigai.gr.jp writes: The attached patch is the revised version of ALTER RENAME TO consolidation. According to the previous suggestion, it uses a common logic to check object-naming duplication at check_duplicate_objectname(). I think you need to better comment which object types are supported by the new generic function and which are not in AlterObjectRename(). OK, Are you suggesting to add a generic comments such as Generic function to change the name of a given object, for simple cases ..., not a list of OBJECT_* at the head of this function, aren't you? At the code path on AlterObjectNamespace_internal, it lost ObjectType information to the supplied object, so I also added get_object_type() function at objectaddress.c for inverse translation from a couple of classId/objectId to OBJECT_* label. I don't understand that part, and it looks like it would be way better to find a way to pass down the information you already have earlier in the code than to compute it again doing syscache lookups… The pain point is AlterObjectNamespace_internal is not invoked by only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid() also. It is the reason why I had to put get_object_type() to find out OBJECT_* from the supplied ObjectAddress, because this code path does not available to pass down its ObjectType from the caller. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gset updated patch
2012/11/19 Karl O. Pinc k...@meme.com: On 11/19/2012 02:30:49 AM, Pavel Stehule wrote: Hello 2012/11/16 Karl O. Pinc k...@meme.com: Hi Pavel, On 11/16/2012 12:21:11 AM, Pavel Stehule wrote: 2012/11/16 Karl O. Pinc k...@meme.com: As long as I'm talking crazy talk, why not abandon psql as a shell language and instead make a pl/pgsql interpreter with readlne() in front of it? Solve all these language-related issues by using an actual programming language. :-) I though about it more time, but I don't thinking so this has a sense. You might consider using do. it is reason, why I don't thinking about plpgsql on client side. But I don't understand how it is related to gset ? Because the plpgsql SELECT INTO sets variables from query results, exactly what \gset does. You have to use EXECUTE in plpgsql to do the substitution into statements, but that's syntax. yes, but I cannot set a psql variable but a original proposal for gset was \exec into var :) although \gset is more simply and there are no problem with multiline queries I remember, there is one significant limit of DO statement - it cannot return table - so it cannot substitute psql simple scripts. Yes. I'm wrong. For some reason I thought you could use DO to make an anonymous code block that would act as a SETOF function, allowing RETURN NEXT expr (et-al) to be used in the plpgsql code, allowing DO to return table results. (Or, perhaps, instead, be used in place of a table in a SELECT statement.) Oh well. yes, I understand - batch model for DO is more natural, than function - but it is not true :(. I hope, so one time we will have a real stored procedures with unbound queries. Then we can redefine DO behave, because it doesn't break compatibility. But it is long way - maybe 9.5 Regards Pavel Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
Kohei KaiGai wrote: I am not so happy with GetForeignRelInfo: - The name seems ill-chosen from the FDW API side. I guess that you chose the name because the function is called from get_relation_info, but I think the name should be more descriptive for the FDW API. Something like PlanForeignRelRowid. Indeed, GetForeignRelInfo might give misleading impression as if this routine collects widespread information about target relation. So, how about GetForeignRelWidth() instead? That would be better for the function as it is now. - I guess that every FDW that only needs rowid will do exactly the same as your fileGetForeignRelInfo. Why can't that be done in core? The function could pass an AttrNumber for the rowid to the FDW, and will receive a boolean return code depending on whether the FDW plans to use rowid or not. That would be more convenient for FDW authors. This design tries to kill two-birds with one-stone. It enables to add multiple number of pseudo-columns, not only rowid, and makes possible to push-down complex calculation of target list into external computing resource. For example, when user gives the following query: SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable it contains a complex calculation in the target-list, thus, it also takes CPU cycles of local process. If we can replace the ((c1 - c2) * (c2 - c3))^2 by a reference to a pseudo-column that also references the calculation result on external node, it effectively off-load CPU cycles. In this case, all we need to do is (1) acquire a slot for pseudo-column at GetForeignRelInfo (2) replace TargetEntry::expr by Var node that reference this pseudo-column. It makes sense for performance optimization, so I don't want to restrict this handler for rowid only. I understand. But I think that you still can do that with the change that I suggest. I suggest that GetForeignRelInfo (or whatever the name ends up being) gets the AttrNumber of the proposed rowid column in addition to the parameters you need for what you want to do. Then nothing would keep you from defining those pseudo-columns. But all the setup necessary for the rowid column could be moved out of the FDW. So for the 99% of all FDW which are only interested in the rowid, things would get much simpler and they don't all have to implement the same code. Did I make clear what I mean? Would that be difficult? - I guess the order is dictated by planner steps, but it would be nice to have if GetForeignRelInfo were not the first function to be called during planning. That would make it easier for a FDW to support both 9.2 and 9.3 (fewer #ifdefs), because the FDW plan state will probably have to be created in the first API function. The baserel-fdw_private should be initialized to NULL, so it can perform as a mark whether private data is already constructed, or not. Right, if that pointer is pre-initialized to NULL, that should work. Forget my quibble. In addition, I noticed my patch didn't update documentation stuff. I also add mention about new handlers. I didn't get into documentation, comment and spelling issues since the patch was still called POC, but yes, eventually that would be necessary. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Monday, November 19, 2012 6:05 AM Jeff Janes wrote: On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote: Today again I have again collected the data for configuration Shared_buffers = 7G along with vmstat. The data and vmstat information (bi) are attached with this mail. It is observed from vmstat info that I/O is happening for both cases, however after running for long time, the I/O is also comparatively less with new patch. What I see in the vmstat report is that it takes 5.5 runs to get really good and warmed up, and so it crawls for the first 5.5 benchmarks and then flies for the last 0.5 benchmark. The way you have your runs ordered, that last 0.5 of a benchmark is for the patched version, and this drives up the average tps for the patched case. Also, there is no theoretical reason to think that your patch would decrease the amount of IO needed (in fact, by invalidating buffers early, it could be expected to increase the amount of IO). So this also argues that the increase in performance is caused by the decrease in IO, but the patch isn't causing that decrease, it merely benefits from it due to an accident of timing. Today, I have ran in the opposite order, still I see for some readings the similar observation. I am also not sure of IO part, just based on data I was trying to interpret that way. However may be for some particular scenario, due to OS buffer management it behaves that way. As I am not aware of OS buffer management algorithm, so it's difficult to say that such a change would have any impact on OS buffer management which can yield better performance. With Regards, Amit Kapila. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Monday, November 19, 2012 8:36 PM Alvaro Herrera wrote: Amit Kapila escribió: The only point I can see against SET PERSISTENT is that other variants of SET command can be used in transaction blocks means for them ROLLBACK TO SAVEPOINT functionality works, but for SET PERSISTENT, it can't be done. So to handle that might be we need to mention this point in User Manual, so that users can be aware of this usage. If that is okay, then I think SET PERSISTENT is good to go. I think that's okay. There are other commands which have some forms that can run inside a transaction block and others not. CLUSTER is one example (maybe the only one? Not sure). In that case, it can have one more advantage that all configuration setting can be done with one command and in future we might want to have option like BOTH where the command will take effect for memory as well as file. Can you think of any strong reason why not to have with Alter System Command? In any case SET PERSISTENT is fine. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
Kohei KaiGai kai...@kaigai.gr.jp writes: OK, Are you suggesting to add a generic comments such as Generic function to change the name of a given object, for simple cases ..., not a list of OBJECT_* at the head of this function, aren't you? Just something like * Most simple objects are covered by a generic function, the other * still need special processing. Mainly I was surprised to see collation not supported here, but I didn't take enough time to understand why that is the case. I will do that later in the review process. The pain point is AlterObjectNamespace_internal is not invoked by only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid() also. I should remember better about that as the use case is extensions… It is the reason why I had to put get_object_type() to find out OBJECT_* from the supplied ObjectAddress, because this code path does not available to pass down its ObjectType from the caller. If we really want to do that, I think I would only do that in AlterObjectNamespace_oid and add an argument to AlterObjectNamespace_internal, that you already have in ExecAlterObjectSchemaStmt(). But really, what about just removing that part of your patch instead: @@ -396,14 +614,23 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid) * Check for duplicate name (more friendly than unique-index failure). * Since this is just a friendliness check, we can just skip it in cases * where there isn't a suitable syscache available. +* +* XXX - the caller should check object-name duplication, if the supplied +* object type need to take object arguments for identification, such as +* functions. */ - if (nameCacheId = 0 - SearchSysCacheExists2(nameCacheId, name, ObjectIdGetDatum(nspOid))) - ereport(ERROR, - (errcode(ERRCODE_DUPLICATE_OBJECT), -errmsg(%s already exists in schema \%s\, - getObjectDescriptionOids(classId, objid), - get_namespace_name(nspOid; + if (get_object_catcache_name(classId) = 0) + { + ObjectAddress address; + + address.classId = classId; + address.objectId = objid; + address.objectSubId = 0; + + objtype = get_object_type(address); + check_duplicate_objectname(objtype, nspOid, + NameStr(*DatumGetName(name)), NIL); + } It would be much simpler to retain the old-style duplicate object check, and compared to doing extra cache lookups, it'd still be much cleaner in my view. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
Xi Wang xi.w...@gmail.com writes: On 11/18/12 6:47 PM, Tom Lane wrote: I was against this style of coding before, and I still am. For one thing, it's just about certain to introduce conflicts against system headers. I totally agree. I would be happy to rewrite the integer overflow checks without using these explicit constants, but it seems extremely tricky to do so. I thought about this some more and realized that we can handle it by realizing that division by -1 is the same as negation, and so we can copy the method used in int4um. So the code would look like if (arg2 == -1) { result = -arg1; if (arg1 != 0 SAMESIGN(result, arg1)) ereport(ERROR, ...); PG_RETURN_INT32(result); } (with rather more comments than this, of course). This looks faster than what's there now, as well as removing the need for use of explicit INT_MIN constants. Compared to (arg1 == INTn_MIN arg2 == -1), the above check is not only more confusing and difficult to understand, but it also invokes undefined behavior (-INT_MIN overflow), which is dangerous: many C compilers will optimize away the check. They'd better not, else they'll break many of our overflow checks. This is why we use -fwrapv with gcc, for example. Any other compiler with similar optimizations needs to be invoked with a similar switch. Since INTn_MIN and INTn_MAX are standard macros from the C library, can we assume that every C compiler should provide them in stdint.h? Not every C compiler provides stdint.h, unfortunately --- otherwise I'd not be so resistant to depending on this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gset updated patch
Karl O. Pinc k...@meme.com writes: Yes. I'm wrong. For some reason I thought you could use DO to make an anonymous code block that would act as a SETOF function, allowing RETURN NEXT expr (et-al) to be used in the plpgsql code, allowing DO to return table results. (Or, perhaps, instead, be used in place of a table in a SELECT statement.) Oh well. My key for remembering about that point is that DO is a utility command, not a query. Now, the proposal I pushed last time we opened that very can of worms was to have inline functions rather than anonymous code blocks: WITH FUNCTION foo(integer) returns bigint language SQL AS $$ SELECT $1 + 1; $$, Not sure how much that relates to $topic, but still something that raises in my mind with enough presence that I need to write about it so that it stops calling for attention :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dumping an Extension's Script
On Thu, Nov 15, 2012 at 3:09 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: Please find attached to this email an RFC patch implementing the basics of the pg_dump --extension-script option. After much discussion around the concept of an inline extension, we decided last year that a good first step would be pg_dump support for an extension's script. Do you have a link to the original thread? I have to confess I don't remember what the purpose of this was and, heh heh, there are no documentation changes in the patch itself either. My notes include those links to the original thread: http://archives.postgresql.org/message-id/3157.1327298...@sss.pgh.pa.us http://archives.postgresql.org/pgsql-hackers/2012-01/msg01311.php https://commitfest.postgresql.org/action/patch_view?id=746 I could of course work on documenting the changes prior to the reviewing, the thing is that I've been taking a different implementation route towards the pg_dump --extension-script idea we talked about, that I think is much simpler than anything else. So I'd like to know if that approach is deemed acceptable by the Guardians Of The Code before expanding any more hour on this… It basically boils down to this hunk in dumpExtension(): output CREATE EXTENSION x WITH … AS $x$ /* * Have another archive for this extension: this allows us to simply * walk the extension's dependencies and use the existing pg_dump code * to get the object create statement to be added in the script. * */ eout = CreateArchive(NULL, archNull, 0, archModeAppend); EH = (ArchiveHandle *) eout; /* grab existing connection and remote version information */ EH-connection = ((ArchiveHandle *)fout)-connection; eout-remoteVersion = fout-remoteVersion; /* dump all objects for this extension, that have been sorted out in * the right order following dependencies etc */ ... /* restore the eout Archive into the local buffer */ for (te = EH-toc-next; te != EH-toc; te = te-next) { if (strlen(te-defn) 0) appendPQExpBuffer(q, %s, te-defn); } CloseArchive(eout); output $x$; What do you think? That approach seems likely to break things for the hoped-for parallel pg_dump feature, though I'm not sure exactly in what way. Beyond that, I think much of the appeal of the extension feature is that it dumps as CREATE EXTENSION hstore; and nothing more. That allows you to migrate a dump between systems with different but compatible versions of the hstore and have things work as intended. I'm not opposed to the idea of being able to make extensions without files on disk work ... but I consider it a niche use case; the behavior we have right now works well for me and hopefully for others most of the time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
On 11/19/12 11:04 AM, Tom Lane wrote: I thought about this some more and realized that we can handle it by realizing that division by -1 is the same as negation, and so we can copy the method used in int4um. So the code would look like if (arg2 == -1) { result = -arg1; if (arg1 != 0 SAMESIGN(result, arg1)) ereport(ERROR, ...); PG_RETURN_INT32(result); } (with rather more comments than this, of course). This looks faster than what's there now, as well as removing the need for use of explicit INT_MIN constants. Hah, I used exactly the same code in my initial patch. :) See below. They'd better not, else they'll break many of our overflow checks. This is why we use -fwrapv with gcc, for example. Any other compiler with similar optimizations needs to be invoked with a similar switch. Yes, that's the scary part. Even with -fwrapv in gcc (I tried 4.6/4.7/4.8), it still optimizes away my overflow check! if (arg1 0 -arg1 0) { ... } Fortunately, it doesn't optimize way checks using SAMESIGN() for now. Who knows what could happen in the next version.. Clang's -fwrapv works as expected. PathScale and Open64 (and probably icc) don't support -fwrapv at all. Not sure if they have similar options. They do such optimizations, too. The reality is that C compilers are not friendly to postcondition checking; they consider signed integer overflow as undefined behavior, so they do whatever they want to do. Even workaround options like -fwrapv are often broken, not to mention that they may not even have those options. That's why I am suggesting to avoid postcondition checking for signed integer overflow. Not every C compiler provides stdint.h, unfortunately --- otherwise I'd not be so resistant to depending on this. Sigh.. You are right. - xi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
2012/11/19 Dimitri Fontaine dimi...@2ndquadrant.fr: Kohei KaiGai kai...@kaigai.gr.jp writes: OK, Are you suggesting to add a generic comments such as Generic function to change the name of a given object, for simple cases ..., not a list of OBJECT_* at the head of this function, aren't you? Just something like * Most simple objects are covered by a generic function, the other * still need special processing. Mainly I was surprised to see collation not supported here, but I didn't take enough time to understand why that is the case. I will do that later in the review process. The reason why collation is not supported is that takes special name- duplication checks. The new collation name must have no collision on both of current database encoding and any encoding. It might be an idea to have a simple rule similar to AlterObjectNamespace_internal(); that requires caller to check namespace-duplication, if the given object type has no catcache-id with name-key. The pain point is AlterObjectNamespace_internal is not invoked by only ExecAlterObjectSchemaStmt(), but AlterObjectNamespace_oid() also. I should remember better about that as the use case is extensions… It is the reason why I had to put get_object_type() to find out OBJECT_* from the supplied ObjectAddress, because this code path does not available to pass down its ObjectType from the caller. If we really want to do that, I think I would only do that in AlterObjectNamespace_oid and add an argument to AlterObjectNamespace_internal, that you already have in ExecAlterObjectSchemaStmt(). But really, what about just removing that part of your patch instead: @@ -396,14 +614,23 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid) * Check for duplicate name (more friendly than unique-index failure). * Since this is just a friendliness check, we can just skip it in cases * where there isn't a suitable syscache available. +* +* XXX - the caller should check object-name duplication, if the supplied +* object type need to take object arguments for identification, such as +* functions. */ - if (nameCacheId = 0 - SearchSysCacheExists2(nameCacheId, name, ObjectIdGetDatum(nspOid))) - ereport(ERROR, - (errcode(ERRCODE_DUPLICATE_OBJECT), -errmsg(%s already exists in schema \%s\, - getObjectDescriptionOids(classId, objid), - get_namespace_name(nspOid; + if (get_object_catcache_name(classId) = 0) + { + ObjectAddress address; + + address.classId = classId; + address.objectId = objid; + address.objectSubId = 0; + + objtype = get_object_type(address); + check_duplicate_objectname(objtype, nspOid, + NameStr(*DatumGetName(name)), NIL); + } It would be much simpler to retain the old-style duplicate object check, and compared to doing extra cache lookups, it'd still be much cleaner in my view. Now, I get inclined to follow the manner of AlterObjectNamespace_internal at AlterObjectRename also; that requires caller to check name duplication in case when no catcache entry is supported. That simplifies the logic to check name duplication, and allows to eliminate get_object_type() here, even though RenameAggregate and RenameFunction have to be remained. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Nov 15, 2012 at 2:44 PM, Jeff Davis pg...@j-davis.com wrote: The issue about external utilities is a bigger problem than I realized at first. Originally, I thought that it was just a matter of code to associate the checksum with the data. However, an external utility will never see a torn page while the system is online (after recovery); but it *will* see an inconsistent view of the checksum and the data if they are issued in separate write() calls. So, the hazard of storing the checksum in a different place is not equivalent to the existing hazard of a torn page. I agree that the hazards are not equivalent, but I'm not sure I agree that an external utility will never see a torn page while the system is on-line. We have a bunch of code that essentially forces full_page_writes=on during a base backup even if it's normally off. I think that's necessary precisely because neither the 8kB write() nor the unknown-sized-read used by the external copy program are guaranteed to be atomic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon Riggs wrote: This seems very similar to the REPLACE command we discussed earlier, except this is restricted to Mat Views. I don't remember that discussion -- do you have a reference? If we're going to have this, I would prefer a whole command. e.g. REPLACE matviewname REFRESH that would also allow REPLACE tablename AS query Same thing under the covers, just more widely applicable and thus more useful. An interesting throught. I would have thought that if we were going to allow changing the definition of an existing MV, we would be better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if you allow the column types or the number of columns to be changed, you do tend to run into issues if there are other MVs, views, triggers, rules, etc., which depend on the MV, so I don't think it's material for an initial patch. But it is worth considering which way we might want to extend it. Either way, I don't much like overloading the use of LOAD, which already has a very different meaning. Well, it's hard to avoid creating new keywords without overloading the meaning of exsiting ones. Personally I didn't find LOAD MATERIALIZED VIEW matview_name; to be very easy to confuse with LOAD 'filename'; But that's a subjective thing. If too many people find that confusing, it may be worth creating a new keyword; but I wanted to see whether it was really necessary first. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
Xi Wang xi.w...@gmail.com writes: The reality is that C compilers are not friendly to postcondition checking; they consider signed integer overflow as undefined behavior, so they do whatever they want to do. Even workaround options like -fwrapv are often broken, not to mention that they may not even have those options. I think it's probably past time that we stopped guessing about this sort of thing and added some regression test cases for it. I'm planning to add cases like this: -- check sane handling of INT_MIN overflow cases SELECT (-2147483648)::int4 * (-1)::int4; SELECT (-2147483648)::int4 / (-1)::int4; SELECT (-2147483648)::int4 % (-1)::int4; regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dumping an Extension's Script
Robert Haas robertmh...@gmail.com writes: That approach seems likely to break things for the hoped-for parallel pg_dump feature, though I'm not sure exactly in what way. Will the parallel dump solve the dependencies and extension membership properties in parallel too? Beyond that, I think much of the appeal of the extension feature is that it dumps as CREATE EXTENSION hstore; and nothing more. That allows you to migrate a dump between systems with different but compatible versions of the hstore and have things work as intended. Yes. That's the only use case supported so far. The contrib/ use case. I'm not opposed to the idea of being able to make extensions without files on disk work ... but I consider it a niche use case; the behavior we have right now works well for me and hopefully for others most of the time. I hear you. I'm not doing that on my free time, it's not a hobby, I have customers that want it bad enough to be willing to sponsor my work here. I hope that helps you figuring about the use case being a niche or not. The current extension support has been targeted at a single use case, because that's how you bootstrap that kind of feature. We have request for extensions that will not include a part written in C. We've been around the topic last year, we spent much energy trying to come up with something easy enough to accept as a first step in that direction, and the conclusion at the time was that we want to be able to dump an extension's script. That's what my current patch is all about. More about use cases. Consider PL/Proxy. By the way, that should really really get in core and be called a FOREIGN FUNCTION, but let's get back on topic. So I have customers with between 8 and 256 plproxy partitions, that means each database upgrade has to reach that many databases. Now, I've built a automatic system that will fetch the PL function code from the staging database area, put them into files depending on the schema they live in, package those files into a single one that can be used by the CREATE EXTENSION command, automatically create an upgrade file to be able to ALTER EXTENSION … TO VERSION …, and create a bunch of debian packages out of that (a single debian source package that will build as many binary packages as we have extensions). Then, the system will push those packages to an internal repository, run apt-get update on all the database hosts, then connect to each partition and run the upgrade command. All of that could get simplified to getting the PL code into a single SQL command then running it on all the members of the cluster by using a plproxy RUN ON ALL command, now that it's a self-contained single SQL command. Of course that's only one use case, but that email is already only too long for what it does: rehashing a story we already ran last year. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
2012/11/19 Albe Laurenz laurenz.a...@wien.gv.at: Kohei KaiGai wrote: I am not so happy with GetForeignRelInfo: - The name seems ill-chosen from the FDW API side. I guess that you chose the name because the function is called from get_relation_info, but I think the name should be more descriptive for the FDW API. Something like PlanForeignRelRowid. Indeed, GetForeignRelInfo might give misleading impression as if this routine collects widespread information about target relation. So, how about GetForeignRelWidth() instead? That would be better for the function as it is now. - I guess that every FDW that only needs rowid will do exactly the same as your fileGetForeignRelInfo. Why can't that be done in core? The function could pass an AttrNumber for the rowid to the FDW, and will receive a boolean return code depending on whether the FDW plans to use rowid or not. That would be more convenient for FDW authors. This design tries to kill two-birds with one-stone. It enables to add multiple number of pseudo-columns, not only rowid, and makes possible to push-down complex calculation of target list into external computing resource. For example, when user gives the following query: SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable it contains a complex calculation in the target-list, thus, it also takes CPU cycles of local process. If we can replace the ((c1 - c2) * (c2 - c3))^2 by a reference to a pseudo-column that also references the calculation result on external node, it effectively off-load CPU cycles. In this case, all we need to do is (1) acquire a slot for pseudo-column at GetForeignRelInfo (2) replace TargetEntry::expr by Var node that reference this pseudo-column. It makes sense for performance optimization, so I don't want to restrict this handler for rowid only. I understand. But I think that you still can do that with the change that I suggest. I suggest that GetForeignRelInfo (or whatever the name ends up being) gets the AttrNumber of the proposed rowid column in addition to the parameters you need for what you want to do. Then nothing would keep you from defining those pseudo-columns. But all the setup necessary for the rowid column could be moved out of the FDW. So for the 99% of all FDW which are only interested in the rowid, things would get much simpler and they don't all have to implement the same code. Did I make clear what I mean? Would that be difficult? All we have to do at get_relation_info() to deal with pseudo- columns (including alternatives of complex calculation, not only rowid) is just expansion of rel-max_attr. So, if FDW is not interested in something except for rowid, it can just inform the caller Yeah, we need just one slot for a pseudo-column of rowid. Otherwise, it can return another value to acquire the slot for arbitrary pseudo-column. I don't think it is a problematic design. However, I'm skeptical 99% of FDWs don't support target-list push-down. At least, it was very desired feature when I had a talk at PGconf.EU last month. :-) So, if we rename it to GetForeignRelWidth, is it defined as follows? extern AttrNumber GetForeignRelWidth(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid, bool inhparent, List *targetList); Right now, inhparent makes no sense because foreign table does not support table inheritance, but it seems to me we shall have this functionality near future. - I guess the order is dictated by planner steps, but it would be nice to have if GetForeignRelInfo were not the first function to be called during planning. That would make it easier for a FDW to support both 9.2 and 9.3 (fewer #ifdefs), because the FDW plan state will probably have to be created in the first API function. The baserel-fdw_private should be initialized to NULL, so it can perform as a mark whether private data is already constructed, or not. Right, if that pointer is pre-initialized to NULL, that should work. Forget my quibble. In addition, I noticed my patch didn't update documentation stuff. I also add mention about new handlers. I didn't get into documentation, comment and spelling issues since the patch was still called POC, but yes, eventually that would be necessary. Yours, Laurenz Albe Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Albe Laurenz wrote: Kevin Grittner wrote: My take was more that MVs would often be refreshed by crontab, and that you would want to keep subsequent steps from running and generating potentially plausible but completely inaccurate results if the LMV failed. If one of these subsequent steps doesn't care if refresh failed once, it shouldn't be forced to fail. I imagine that for many applications yesterday's data can be good enough. Those that care should check the timestamp. It sounds like you and I are in agreement on this; I just didn't state it very precisely. If a LMV on a MV which already has data fails, the relisvalid would not prevent it from being used -- it would be stale, but still valid data from *some* point in time. The point is that if an MV is created WITH NO DATA or has been TRUNCATEd and there has not been a subsequent LMV, what it contains may not represent any state which was *ever* valid, or it may represent a state which would only have been valid hundreds of years in the past, had the system been computerized at that time. To me, that does not seem like the same thing as a simple stale state. I'm looking at whether there is some reasonable way to detect invalid data as well as capture age of data. Every solution I've thought of so far has at least one hard-to-solve race condition, but I have hopes that I can either solve that for one of the ideas, or come up with an idea which falls more gracefully under MVCC management. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior
On 2012-11-19 11:04:31 -0500, Tom Lane wrote: Xi Wang xi.w...@gmail.com writes: Since INTn_MIN and INTn_MAX are standard macros from the C library, can we assume that every C compiler should provide them in stdint.h? Not every C compiler provides stdint.h, unfortunately --- otherwise I'd not be so resistant to depending on this. We already have hardcoded values for INT64_MIN in numutils.c's pg_lltoa(). Given that I think we could just bite the apple and provide our own values if the platform doesn't provide them. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote: I agree that the hazards are not equivalent, but I'm not sure I agree that an external utility will never see a torn page while the system is on-line. We have a bunch of code that essentially forces full_page_writes=on during a base backup even if it's normally off. I think that's necessary precisely because neither the 8kB write() nor the unknown-sized-read used by the external copy program are guaranteed to be atomic. This seems like a standards question that we should be able to answer definitively: Is it possible for a reader to see a partial write if both use the same block size? Maybe the reason we need full page writes during base backup is because we don't know the block size of the reader, but if we did know that it was the same, it would be fine? If that is not true, then I'm concerned about replicating corruption, or backing up corrupt blocks over good ones. How do we prevent that? It seems like a pretty major hole if we can't, because it means the only safe replication is streaming replication; a base-backup is essentially unsafe. And it means that even an online background checking utility would be quite hard to do properly. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Maintenance announcement for trill.postgresql.org
Hi all! There will be planned hardware/software maintenance this tomorrow Tuesday (20th November 2012) from starting at 16:30 CET - affecting some redundant services (ftp and www mirrors) as well as the following public hosts: * yridian.postgresql.org (www.postgresql.eu) * antos.postgresql.org (anoncvs.postgresql.org) * malur.postgresql.org (mailinglists) During this maintenance window we will be doing some software and hardware changes involving a number of reboots and we expect a maximum outage of an hour. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 10.10.2012 17:54, Thom Brown wrote: Hmm... I get something different. When I promote standby B, standby C's log shows: LOG: walreceiver ended streaming and awaits new instructions LOG: re-handshaking at position 0/400 on tli 1 LOG: fetching timeline history file for timeline 2 from primary server LOG: walreceiver ended streaming and awaits new instructions LOG: new target timeline is 2 Then when I stop then start standby C I get: FATAL: timeline history was not contiguous LOG: startup process (PID 22986) exited with exit code 1 LOG: aborting startup due to startup process failure Found fixed this one. A paren was misplaced in tliOfPointInHistory() function.. On 16.11.2012 16:01, Amit Kapila wrote: The following problems are observed while testing of the patch. Defect-1: 1. start primary A 2. start standby B following A 3. start cascade standby C following B. 4. Promote standby B. 5. After successful time line switch in cascade standby C, stop C. 6. Restart C, startup is failing with the following error. LOG: database system was shut down in recovery at 2012-11-16 16:26:29 IST FATAL: requested timeline 2 does not contain minimum recovery point 0/30143A0 on timeline 1 LOG: startup process (PID 415) exited with exit code 1 LOG: aborting startup due to startup process failure The above defect is already discussed in the following link. http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@ka p...@huawei.com Fixed now, sorry for neglecting this earlier. The problem was that if the primary switched to a new timeline at position X, and the standby followed that switch, on restart it would set minRecoveryPoint to X, and the new Defect-2: 1. start primary A 2. start standby B following A 3. start cascade standby C following B with 'recovery_target_timeline' option in recovery.conf is disabled. 4. Promote standby B. 5. Cascade Standby C is not able to follow the new master B because of timeline difference. 6. Try to stop the cascade standby C (which is failing and the server is not stopping, observations are as WAL Receiver process is still running and clients are not allowing to connect). The defect-2 is happened only once in my test environment, I will try to reproduce it. Found it. When restarting the streaming, I reused the WALRCV_STARTING state. But if you then exited recovery, WalRcvRunning() would think that the walreceiver is stuck starting up, because it's been longer than 10 seconds since it was launched and it's still in WALRCV_STARTING state, so it put it into WALRCV_STOPPED state. And walreceiver didn't expect to be put into STOPPED state after having started up successfully already. I added a new explicit WALRCV_RESTARTING state to handle that. In addition to the above bug fixes, there's some small changes since last patch version: * I changed the LOG messages printed in various stages a bit, hopefully making it easier to follow what's happening. Feedback is welcome on when and how we should log, and whether some error messages need clarification. * 'ps' display is updated when the walreceiver enters and exits idle mode * Updated pg_controldata and pg_resetxlog to handle the new minRecoveryPointTLI field I added to the control file. * startup process wakes up walsenders at the end of recovery, so that cascading standbys are notified immediately when the timeline changes. That removes some of the delay in the process. - Heikki streaming-tli-switch-7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 2012-11-19 09:22:45 -0800, Jeff Davis wrote: On Mon, 2012-11-19 at 11:48 -0500, Robert Haas wrote: I agree that the hazards are not equivalent, but I'm not sure I agree that an external utility will never see a torn page while the system is on-line. We have a bunch of code that essentially forces full_page_writes=on during a base backup even if it's normally off. I think that's necessary precisely because neither the 8kB write() nor the unknown-sized-read used by the external copy program are guaranteed to be atomic. This seems like a standards question that we should be able to answer definitively: Is it possible for a reader to see a partial write if both use the same block size? Yes, definitely. If that is not true, then I'm concerned about replicating corruption, or backing up corrupt blocks over good ones. How do we prevent that? It seems like a pretty major hole if we can't, because it means the only safe replication is streaming replication; a base-backup is essentially unsafe. And it means that even an online background checking utility would be quite hard to do properly. I am not sure I see the danger in the base backup case here? Why would we have corrupted backup blocks? While postgres is running we won't see such torn pages because its all done under proper locks... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY
On Thu, Nov 15, 2012 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Yeah. If we're going to do this at all, and I'm not convinced it's worth the work, I think it's definitely good to support a variant where we specify exactly the things that will be passed to exec(). There's just too many ways to accidentally shoot yourself in the foot otherwise. If we want to have an option that lets people shoot themselves in the foot, that's fine. But I think we'd be smart not to make that the only option. [ shrug... ] Once again, that will turn this from a ten-line patch into hundreds of lines (and some more, different, hundreds of lines for Windows I bet), with a corresponding growth in the opportunities for bugs, for a benefit that's at best debatable. The biggest problem this patch has had from the very beginning is overdesign, and this is more of the same. Let's please just define the feature as popen, not fopen, the given string and have done. I just don't agree with that. popen() is to security holes as cars are to alcohol-related fatalities. In each case, the first one doesn't directly cause the second one; but it's a pretty darn powerful enabler. Your proposed solution won't force people to write insecure applications; it'll just make it much more likely that they will do so ... after which, presumably, you'll tell them it's their own darn fault for using the attractive nuisance. The list of security vulnerabilities that are the result of insufficiently careful validation of strings passed to popen() is extremely long. If we give people a feature that can only be leveraged via popen(), the chances that someone will thereby open a security hole are indistinguishable from 1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@mail.com writes: Simon Riggs wrote: Either way, I don't much like overloading the use of LOAD, which already has a very different meaning. Well, it's hard to avoid creating new keywords without overloading the meaning of exsiting ones. FWIW, I'd much rather see us overload LOAD (which is seldom used) than REPLACE (which might in the future become a widely-used DML command). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY
Robert Haas robertmh...@gmail.com writes: On Thu, Nov 15, 2012 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: The biggest problem this patch has had from the very beginning is overdesign, and this is more of the same. Let's please just define the feature as popen, not fopen, the given string and have done. ... If we give people a feature that can only be leveraged via popen(), the chances that someone will thereby open a security hole are indistinguishable from 1. You are absolutely right that this feature is a security risk, but it will be one whether it exposes popen() or only exec(). I do not believe that the incremental gain in security from disallowing shell notation is worth either the loss of functionality or the amount of added effort (and added bugs, some of which will be security issues in themselves) we'd need to write it that way. The correct response to the security risks is to (a) make it superuser-only and (b) document that it's a seriously bad idea to allow the argument string to come from any untrusted sources. Please note that we'd have to do these same things with an exec-based patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin, I'm looking at whether there is some reasonable way to detect invalid data as well as capture age of data. Every solution I've thought of so far has at least one hard-to-solve race condition, but I have hopes that I can either solve that for one of the ideas, or come up with an idea which falls more gracefully under MVCC management. What's the race condition? I'd think that LOAD would take an exclusive lock on the matview involved. LOAD MATERIALIZED VIEW matview_name; to be very easy to confuse with LOAD 'filename'; But that's a subjective thing. If too many people find that confusing, it may be worth creating a new keyword; but I wanted to see whether it was really necessary first. I do not find them confusing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 11/19/12 9:57 AM, Josh Berkus wrote: Kevin, I'm looking at whether there is some reasonable way to detect invalid data as well as capture age of data. Every solution I've thought of so far has at least one hard-to-solve race condition, but I have hopes that I can either solve that for one of the ideas, or come up with an idea which falls more gracefully under MVCC management. What's the race condition? I'd think that LOAD would take an exclusive lock on the matview involved. BTW, another thought on the timestamp: while it would be better to have a lastrefresh timestamp in pg_class, the other option is to have an extra column in the matview (pg_last_update). While that would involve some redundant storage, it would neatly solve the issues around unlogged matviews; the timestamp and the data would vanish at the same time. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On Sun, Nov 18, 2012 at 10:43 PM, Jeff Davis pg...@j-davis.com wrote: On Sun, 2012-11-18 at 15:19 +0100, Andres Freund wrote: On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote: Process A (process that clears a VM bit for a data page): 1. Acquires exclusive lock on data buffer 2. Acquires exclusive lock on VM buffer 3. clears VM bit 4. Releases VM buffer lock 5. Releases data buffer lock Well, but right this is a rather big difference. If vm pages get unconditionally locked all the time we will have a huge source of new contention as they are shared between so many heap pages. No, that is only for the process *clearing* the bit, and this already happens. I am not planning on introducing any new locks, aside from the buffer header lock when acquiring a pin. And I plan to keep those pins for long enough that those don't matter, either. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Sorry If I am being a bit naive, but shouldnt a simple mutex work in the case when a process wants to change the VM bit in cache? Mutex would be cheaper than locks. Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Do we need so many hint bits?
It's quite common to load a lot of data, and then do some reads for a while (setting hint bits and flushing them to disk), and then do a VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the pages again. Also, if I remember correctly, Robert went to significant effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM bits consistent. Maybe this was all discussed before? All of these hint bits will have a bit more of a performance impact after checksums are introduced (for those that use them in conjunction with large data loads), so I'm looking for some simple ways to mitigate those effects. What kind of worst-case tests could I construct to see if there are worrying performance effects to removing these hint bits? Regards, Jeff Davis I completely agree.In fact, that is the problem that we are trying to solve in our patch(https://commitfest.postgresql.org/action/patch_view?id=991). Essentially, we are trying to mitigate the expense of maintaining hint bits in the cases when the user loads a lot of data, does some operations such as SELECT, and deletes them all.We maintain a cache that can be used to fetch the commit status of XMAX or XMIN instead of hint bits.As the cache is single frame, it has no issues in replacement algorithm.Cache lookup is pretty cheap. I agree with the removal of PD_ALL_VISIBLE.AFAIK(pardon me if I am wrong, I have been trying to research while following this thread), PD_AL_VISIBLE was really useful when VM bits were not really safe, and crashes could lead to redo setting the bit on the heap pages. Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Enabling Checksums
On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote: Yes, definitely. OK. I suppose that makes sense for large writes. If that is not true, then I'm concerned about replicating corruption, or backing up corrupt blocks over good ones. How do we prevent that? It seems like a pretty major hole if we can't, because it means the only safe replication is streaming replication; a base-backup is essentially unsafe. And it means that even an online background checking utility would be quite hard to do properly. I am not sure I see the danger in the base backup case here? Why would we have corrupted backup blocks? While postgres is running we won't see such torn pages because its all done under proper locks... Yes, the blocks written *after* the checkpoint might have a bad checksum that will be fixed during recovery. But the blocks written *before* the checkpoint should have a valid checksum, but if they don't, then recovery doesn't know about them. So, we can't verify the checksums in the base backup because it's expected that some blocks will fail the check, and they can be fixed during recovery. That gives us no protection for blocks that were truly corrupted and written long before the last checkpoint. I suppose if we could somehow differentiate the blocks, that might work. Maybe look at the LSN and only validate blocks written before the checkpoint? But of course, that's a problem because a corrupt block might have the wrong LSN (in fact, it's likely, because garbage is more likely to make the LSN too high than too low). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line. However, I see, there are two types of users here: 1. Who likes these log lines, so that they can troubleshoot some slowness and all 2. Who do not like these log lines. Who likes these lines / needs them for something useful? So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high. So what option(s) would you expect? Something that tunes the interval length or something else? A switch that'd choose between the old and new behavior might be a good idea, but I'd strongly vote against automagic heuristics. It makes the behavior very difficult to predict and I really don't want to force the users to wonder whether the long delay is due to general slowness of the machine or some clever logic that causes long delays between log messages. That's why I choose a very simple approach with constant time interval. It does what I was aiming for (less logs) and it's easy to predict. Sure, we could choose different interval (or make it an option). However, assuming we settled on 5 sec delay, here are few comments on that patch attached: Comments: = Patch gets applied cleanly with some whitespace errors. make and make install too went smooth. make check was smooth. Rather it should be smooth since there are NO changes in other part of the code rather than just pgbench.c and we do not have any test-case as well. However, here are few comments on changes in pgbench.c 1. Since the final discussion ended at keeping a 5 seconds interval will be good enough, Author used a global int variable for that. Given that it's just a constant, #define would be a better choice. Good point. Although if we add an option to supply different values, a variable is a better match. 2. +/* let's not call the timing for each row, but only each 100 rows */ Why only 100 rows ? Have you done any testing to come up with number 100 ? To me it seems very low. It will be good to test with 1K or even 10K. On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So checking every 100 rows looks overkill. Well, the 100 is clearly a magical constant. The goal was to choose a value large enough to amortize the getlocaltime() cost, but small enough to print info even in cases when the performance sucks for some reason. I've seen issues where the speed suddenly dropped to a fraction of the expected value, e.g. 100 rows/second, and in those cases you'd have to wait for a very long time to actually get the next log message (with the suggested 10k step). So 100 seems like a good compromise to me ... 3. Please indent following block as per the indentation just above that /* used to track elapsed time and estimate of the remaining time */ instr_timestart, diff; double elapsed_sec, remaining_sec; int log_interval = 1; OK 4. +/* have ve reached the next interval? */ Do you mean have WE reached... OK 5. While applying a patch, I got few white-space errors. But I think every patch goes through pgindent which might take care of this. OK Thanks Thanks for the review. I'll wait for a bit more discussion about the choices before submitting another version of the patch. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On Tue, Nov 20, 2012 at 12:08 AM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2012-11-19 at 23:50 +0530, Atri Sharma wrote: Sorry If I am being a bit naive, but shouldnt a simple mutex work in the case when a process wants to change the VM bit in cache? Mutex would be cheaper than locks. I thought mutexes are locks? The point is to avoid taking new locks (or mutexes) during a read of the VM bit, because there is concern that it could cause contention. If we lock the entire VM page, that represents many, many data pages, so it's worrisome. Regards, Jeff Davis My mistake...I thought we were more concerned about the cost of locking. I agree, locking many data pages simultaneously could be hazardous. This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a great idea after all... Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Do we need so many hint bits?
On Mon, 2012-11-19 at 23:50 +0530, Atri Sharma wrote: Sorry If I am being a bit naive, but shouldnt a simple mutex work in the case when a process wants to change the VM bit in cache? Mutex would be cheaper than locks. I thought mutexes are locks? The point is to avoid taking new locks (or mutexes) during a read of the VM bit, because there is concern that it could cause contention. If we lock the entire VM page, that represents many, many data pages, so it's worrisome. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On Tue, 2012-11-20 at 00:13 +0530, Atri Sharma wrote: My mistake...I thought we were more concerned about the cost of locking. I agree, locking many data pages simultaneously could be hazardous. This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a great idea after all... As I said elsewhere in the thread, I'm not planning to introduce any additional locking. There is already precedent in IndexOnlyNext. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On 2012-11-19 10:46:37 -0800, Jeff Davis wrote: On Tue, 2012-11-20 at 00:13 +0530, Atri Sharma wrote: My mistake...I thought we were more concerned about the cost of locking. I agree, locking many data pages simultaneously could be hazardous. This requires more thought.Maybe removing PD_ALL_VISIBLE isnt such a great idea after all... As I said elsewhere in the thread, I'm not planning to introduce any additional locking. There is already precedent in IndexOnlyNext. Yea, but that precedent is only valid because IOS does check for a MVCC snapshots. Also it doesn't set anything. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
Jeff Davis pg...@j-davis.com writes: As I said elsewhere in the thread, I'm not planning to introduce any additional locking. There is already precedent in IndexOnlyNext. Of course, that just begs the question of whether the code in IndexOnlyNext is correct. And after looking at it, it seems pretty broken, or at least the argument for its correctness is broken. That comment seems to consider only the case where the target tuple has just been inserted --- but what of the case where the target tuple is in process of being deleted? The deleter will not make a new index entry for that. Of course, there's a pretty long code path from marking a tuple deleted to committing the deletion, and it seems safe to assume that the deleter will hit a write barrier in that path. But I don't believe the argument here that the reader must hit a read barrier in between. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade speed for many tables
On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote: Is turning off synchronous_commit enough? What about turning off fsync? I did some testing with the attached patch on a magnetic disk with no BBU that turns off fsync; With which file system? I wouldn't expect you to see a benefit with ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits group fsync of new file creations but rather does each one serially. Whether it is worth applying a fix that is only needed for that one file system, I don't know. The trade-offs are not all that clear to me yet. I got these results sync_com=off fsync=off 115.90 13.51 100026.09 24.56 200033.41 31.20 400057.39 57.74 8000 102.84116.28 16000 189.43207.84 It shows fsync faster for 4k, and slower for 4k. Not sure why this is the cause but perhaps the buffering of the fsync is actually faster than doing a no-op fsync. synchronous-commit=off turns off not only the fsync at each commit, but also the write-to-kernel at each commit; so it is not surprising that it is faster at large scale. I would specify both synchronous-commit=off and fsync=off. When I'm doing a pg_upgrade with thousands of tables, the shutdown checkpoint after restoring the dump to the new cluster takes a very long time, as the writer drains its operation table by opening and individually fsync-ing thousands of files. This takes about 40 ms per file, which I assume is a combination of slow lap-top disk drive, and a strange deal with ext4 which makes fsyncing a recently created file very slow. But even with faster hdd, this would still be a problem if it works the same way, with every file needing 4 rotations to be fsynced and this happens in serial. Is this with the current code that does synchronous_commit=off? If not, can you test to see if this is still a problem? Yes, it is with synchronous_commit=off. (or if it wasn't originally, it is now, with the same result) Applying your fsync patch does solve the problem for me on ext4. Having the new cluster be on ext3 rather than ext4 also solves the problem, without the need for a patch; but it would be nice to more friendly to ext4, which is popular even though not recommended. Anyway, the reason I think turning fsync off might be reasonable is that as soon as the new cluster is shut down, pg_upgrade starts overwriting most of those just-fsynced file with other files from the old cluster, and AFAICT makes no effort to fsync them. So until there is a system-wide sync after the pg_upgrade finishes, your new cluster is already in mortal danger anyway. pg_upgrade does a cluster shutdown before overwriting those files. Right. So as far as the cluster is concerned, those files have been fsynced. But then the next step is go behind the cluster's back and replace those fsynced files with different files, which may or may not have been fsynced. This is what makes me thing the new cluster is in mortal danger. Not only have the new files perhaps not been fsynced, but the cluster is not even aware of this fact, so you can start it up, and then shut it down, and it still won't bother to fsync them, because as far as it is concerned they already have been. Given that, how much extra danger would be added by having the new cluster schema restore run with fsync=off? In any event, I think the documentation should caution that the upgrade should not be deemed to be a success until after a system-wide sync has been done. Even if we use the link rather than copy method, are we sure that that is safe if the directories recording those links have not been fsynced? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch making firm recommendation for setting the value of commit_delay
I've added this to the open commitfest. I think that a formal review is probably required here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Kohei KaiGai escribió: 2012/10/22 Alvaro Herrera alvhe...@2ndquadrant.com: Here's an updated version of this patch, which also works in an EXEC_BACKEND environment. (I haven't tested this at all on Windows, but I don't see anything that would create a portability problem there.) I also tried to check the latest patch briefly. Let me comment on several points randomly. Once bgworker process got crashed, postmaster tries to restart it about 60 seconds later. My preference is this interval being configurable with initial registration parameter; that includes never restart again option. Probably, some extensions don't want to restart again on unexpected crash. I changed this. Stop is one other significant event for bgworker, not only start-up. This interface has no way to specify when we want to stop bgworker. Probably, we can offer two options. Currently, bgworkers are simultaneously terminated with other regular backends. In addition, I want an option to make sure bgworker being terminated after all the regular backends exit. It is critical issue for me, because I try to implement parallel calculation server with bgworker, thus, it should not be terminated earlier than regular backend. I am not really sure about this. Each new behavior we want to propose requires careful attention, because we need to change postmaster's shutdown sequence in pmdie(), and also reaper() and PostmasterStateMachine(). After looking into it, I am hesitant to change this too much unless we can reach a very detailed agreement of exactly what we want to happen. Also note that there are two types of workers: those that require a database connection, and those that do not. The former are signalled via SignalSomeChildren(BACKEND_TYPE_BGWORKER); the latter are signalled via SignalUnconnectedWorker(). Would this distinction be enough for you? Delivering more precise shutdown signals is tricky; we would have to abandon SignalSomeChildren() and code something new to scan the workers list checking the stop time for each one. (Except in emergency situations, of course, in which we would continue to rely on SignalSomeChildren). How about to move bgw_name field into tail of BackgroundWorker structure? It makes simplifies the logic in RegisterBackgroundWorker(), if it is located as: typedef struct BackgroundWorker { int bgw_flags; BgWorkerStartTime bgw_start_time; bgworker_main_type bgw_main; void *bgw_main_arg; bgworker_sighdlr_type bgw_sighup; bgworker_sighdlr_type bgw_sigterm; charbgw_name[1]; == (*) } BackgroundWorker; This doesn't work; or rather, it works and it makes RegisterBackgroundWorker() code somewhat nicer, but according to Larry Wall's Conservation of Cruft Principle, the end result is that the module code to register a new worker is a lot messier; it can no longer use a struct in the stack, but requires malloc() or similar. I don't see that this is a win overall. StartOneBackgroundWorker always scan the BackgroundWorkerList from the head. Isn't it available to save the current position at static variable? If someone tries to manage thousand of bgworkers, it makes a busy loop. :( Seems messy; we would have to get into the guts of slist_foreach (unroll the macro and make the iterator static). I prefer not to go that path, at least not for now. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane wrote: Kevin Grittner kgri...@mail.com writes: Josh Berkus wrote: What use would a temporary matview be? It would be essentially like a temporary table, with all the same persistence options. I'm not really sure how often it will be more useful than a temporary table before we have incremental maintenance of materialized views; once we have that, though, it seems likely that there could be reasonable use cases. One of the principal attributes of a temp table is that its contents aren't (reliably) accessible from anywhere except the owning backend. Not sure where you're going to hide the incremental maintenance in that scenario. The more I think about that, the less sensible temporary MVs seem. Unless I can figure out some reasonable use case, I'll diable that in the next version of the patch. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
Alvaro Herrera alvhe...@2ndquadrant.com writes: Kohei KaiGai escribió: StartOneBackgroundWorker always scan the BackgroundWorkerList from the head. Isn't it available to save the current position at static variable? If someone tries to manage thousand of bgworkers, it makes a busy loop. :( Seems messy; we would have to get into the guts of slist_foreach (unroll the macro and make the iterator static). I prefer not to go that path, at least not for now. Thousands of bgworkers seems like a pretty unsupportable scenario anyway --- it'd presumably have most of the same problems as thousands of backends. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote: Yes, the blocks written *after* the checkpoint might have a bad checksum that will be fixed during recovery. But the blocks written *before* the checkpoint should have a valid checksum, but if they don't, then recovery doesn't know about them. So, we can't verify the checksums in the base backup because it's expected that some blocks will fail the check, and they can be fixed during recovery. That gives us no protection for blocks that were truly corrupted and written long before the last checkpoint. I suppose if we could somehow differentiate the blocks, that might work. Maybe look at the LSN and only validate blocks written before the checkpoint? But of course, that's a problem because a corrupt block might have the wrong LSN (in fact, it's likely, because garbage is more likely to make the LSN too high than too low). It might be good enough here to simply retry the checksum verification if it fails for any block. Postgres shouldn't be issuing write()s for the same block very frequently, and they shouldn't take very long, so the chances of failing several times seems vanishingly small unless it's a real failure. Through a suitably complex mechanism, I think we can be more sure. The external program could wait for a checkpoint (or force one manually), and then recalculate the checksum for that page. If checksum is the same as the last time, then we know the block is bad (because the checkpoint would have waited for any writes in progress). If the checksum does change, then we assume postgres must have modified it since the backup started, so we can assume that we have a full page image to fix it. (A checkpoint is a blunt tool here, because all we need to do is wait for the write() call to finish, but it suffices.) That complexity is probably not required, and simply retrying a few times is probably much more practical. But it still bothers me a little to think that the external tool could falsely indicate a checksum failure, however remote that chance. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
Hi! New version of patch is attached. Changes are following: 1) Right way to convert from pg_wchar to multibyte. 2) Optimization of producing CFNA-like graph on trigrams (produce smaller, but equivalent, graphs in less time). 3) Comments and refactoring. -- With best regards, Alexander Korotkov. trgm-regexp-0.2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks
Kohei KaiGai wrote: Sorry, I missed the attached version. Please use this revision. All those direct uses of object_access_hook make me think that the InvokeObjectAccessHook() macro we have is insufficient. Maybe we could have InvokeObjectAccessHookArg1() so that instead of + if (object_access_hook) + { + ObjectAccessPostCreate pc_arg; + + memset(pc_arg, 0, sizeof(ObjectAccessPostCreate)); + pc_arg.is_internal = is_internal; + (*object_access_hook)(OAT_POST_CREATE, AttrDefaultRelationId, + RelationGetRelid(rel), attnum, (void *)pc_arg); + } we can have InvokeObjectAccessHookWithArgs1(OAT_POST_CREATE, AttrDefaultRelationId, RelationGetRelid(rel), attnum, ObjectAccessPostCreate, is_internal, is_internal) which would expand into the above. (Since ObjectAccessPostCreate has two members, we presumably need InvokeObjectAccessHookWithArgs2 as well. But that doesn't seem to be used anywhere, so maybe that struct member is not necessary?) The second hunk to alter.c does not apply anymore; please rebase. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Mon, November 19, 2012 22:58, Alexander Korotkov wrote: New version of patch is attached. Hi Alexander, I get some compile-errors: (Centos 6.3, Linux 2.6.32-279.14.1.el6.x86_64 GNU/Linux, gcc (GCC) 4.7.2) make contrib trgm_regexp.c:73:2: error: unknown type name TrgmStateKey make[1]: *** [trgm_regexp.o] Error 1 make: *** [all-pg_trgm-recurse] Error 2 trgm_regexp.c:73:2: error: unknown type name TrgmStateKey make[1]: *** [trgm_regexp.o] Error 1 make: *** [install-pg_trgm-recurse] Error 2 Did I forget something? Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On 19.11.2012 22:58, Alexander Korotkov wrote: Hi! New version of patch is attached. Changes are following: 1) Right way to convert from pg_wchar to multibyte. 2) Optimization of producing CFNA-like graph on trigrams (produce smaller, but equivalent, graphs in less time). 3) Comments and refactoring. Hi, thanks for the updated message-id. I've done the initial review: 1) Patch applies fine to the master. 2) It's common to use upper-case names for macros, but trgm.h defines macro iswordchr - I see it's moved from trgm_op.c but maybe we could make it a bit more correct? 3) I see there are two '#ifdef KEEPONLYALNUM blocks right next to each other in trgm.h - maybe it'd be a good idea to join them? 4) The two new method prototypes at the end of trgm.h use different indendation than the rest (spaces only instead of tabs). 5) There are no regression tests / updated docs (yet). 6) It does not compile - I do get a bunch of errors like this trgm_regexp.c:73:2: error: expected specifier-qualifier-list before ‘TrgmStateKey’ trgm_regexp.c: In function ‘addKeys’: trgm_regexp.c:291:24: error: ‘TrgmState’ has no member named ‘keys’ trgm_regexp.c:304:10: error: ‘TrgmState’ has no member named ‘keys’ ... It seems this is cause by the order of typedefs in trgm_regexp.c, namely TrgmState referencing TrgmStateKey before it's defined. Moving the TrgmStateKey before TrgmState fixed the issue (I'm using gcc-4.5.4 but I think it's not compiler-dependent.) 7) Once fixed, it seems to work CREATE EXTENSION pg_trgm ; CREATE TABLE TEST (val TEXT); INSERT INTO test SELECT md5(i::text) FROM generate_series(1,100) s(i); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*'; QUERY PLAN - Bitmap Heap Scan on test (cost=16.77..385.16 rows=100 width=33) Recheck Cond: (val ~ '.*qqq.*'::text) - Bitmap Index Scan on trgm_idx (cost=0.00..16.75 rows=100 width=0) Index Cond: (val ~ '.*qqq.*'::text) (4 rows) but I do get a bunch of NOTICE messages with debugging info (no matter if the GIN index is used or not, so it's somewhere in the common regexp code). But I guess that's due to WIP status. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
Some quick comments. On Tue, Nov 20, 2012 at 3:02 AM, Tomas Vondra t...@fuzzy.cz wrote: 6) It does not compile - I do get a bunch of errors like this Fixed. 7) Once fixed, it seems to work CREATE EXTENSION pg_trgm ; CREATE TABLE TEST (val TEXT); INSERT INTO test SELECT md5(i::text) FROM generate_series(1,100) s(i); CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops); ANALYZE test; EXPLAIN SELECT * FROM test WHERE val ~ '.*qqq.*'; QUERY PLAN - Bitmap Heap Scan on test (cost=16.77..385.16 rows=100 width=33) Recheck Cond: (val ~ '.*qqq.*'::text) - Bitmap Index Scan on trgm_idx (cost=0.00..16.75 rows=100 width=0) Index Cond: (val ~ '.*qqq.*'::text) (4 rows) but I do get a bunch of NOTICE messages with debugging info (no matter if the GIN index is used or not, so it's somewhere in the common regexp code). But I guess that's due to WIP status. It's due to TRGM_REGEXP_DEBUG macro. I disabled it by default. But I think pieces of code hidden by that macro could be useful for debug even after WIP status. -- With best regards, Alexander Korotkov. trgm-regexp-0.3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: But, with the attached patch: rhaas=# create function xyz(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION rhaas=# select xyz(5); xyz - 5 (1 row) rhaas=# create table abc (a int); CREATE TABLE rhaas=# select lpad(a, 5, '0') from abc; lpad -- (0 rows) I continue to be of the opinion that allowing this second case to work is not desirable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
On Mon, Nov 19, 2012 at 5:50 PM, Andres Freund and...@2ndquadrant.comwrote: Hi Michael, On 2012-11-19 16:28:55 +0900, Michael Paquier wrote: I have been able to fetch your code (thanks Andrea!) and some it. For the time being I am spending some time reading the code and understanding the whole set of features you are trying to implement inside core, even if I got some background from what you presented at PGCon and from the hackers ML. Cool. Btw, as a first approach, I tried to run the logical log receiver plugged on a postgres server, and I am not able to make it work. Well, I am using settings similar to yours. # Run master rm -r ~/bin/pgsql/master/ initdb -D ~/bin/pgsql/master/ echo local replication $USER trust ~/bin/pgsql/master/pg_hba.conf postgres -D ~/bin/pgsql/master \ -c wal_level=logical \ -c max_wal_senders=10 \ -c max_logical_slots=10 \ -c wal_keep_segments=100 \ -c log_line_prefix=[%p %x] # Logical log receiver pg_receivellog -f $HOME/output.txt -d postgres -v After launching some SQLs, the logical receiver is stuck just after sending INIT_LOGICAL_REPLICATION, please see bt of process waiting: Its waiting till it sees initial an initial xl_running_xacts record. The easiest way to do that is manually issue a checkpoint. Sorry, should have included that in the description. Otherwise you can wait till the next routine checkpoint comes arround... I plan to cause more xl_running_xacts record to be logged in the future. I think the timing of those currently is non-optimal, you have the same problem as here in normal streaming replication as well :( I am just looking at this patch and will provide some comments. By the way, you forgot the installation part of pg_receivellog, please see patch attached. That actually was somewhat intended, I thought people wouldn't like the name and I didn't want a binary that's going to be replaced anyway lying around ;) OK no problem. For sure this is going to happen, I was wondering myself if it could be possible to merge pg_receivexlog and pg_receivellog into a single utility with multiple modes :) Btw, here are some extra comments based on my progress, hope it will be useful for other people playing around with your patches. 1) Necessary to install the contrib module test_decoding on server side or the test case will not work. 2) Obtention of the following logs on server: LOG: forced to assume catalog changes for xid 1370 because it was running to early WARNING: ABORT 1370 Actually I saw that there are many warnings like this. 3) Assertion failure while running pgbench, I was just curious to see how it reacted when logical replication was put under a little bit of load. TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3)) ((snapstate-xmin_running) = ((TransactionId) 3))), File: snapbuild.c, Line: 877) = pgbench -i postgres; pgbench -T 500 -c 8 postgres 4) Mentionned by Andres above, but logical replication begins only there is a xl_running_xacts record. I just enforced a checkpoint manually. With all those things done, I have been able to set up the system, for example those queries: postgres=# create table ac (a int); CREATE TABLE postgres=# insert into ac values (1); INSERT 0 1 created the expected output: BEGIN 32135 COMMIT 32135 BEGIN 32136 table ac: INSERT: a[int4]:1 COMMIT 32136 Now it is time to dig into the code... -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] logical changeset generation v3 - Source for Slony
On 12-11-18 11:07 AM, Andres Freund wrote: Hi Steve! I think we should provide some glue code to do this, otherwise people will start replicating all the bugs I hacked into this... More seriously: I think we should have support code here, no user will want to learn the intracacies of feedback messages and such. Where that would live? No idea. libpglogicalrep.so ? I wholeheartedly aggree. It should also be cleaned up a fair bit before others copy it should we not go for having some client side library. Imo the library could very roughly be something like: state = SetupStreamingLLog(replication-slot, ...); while((message = StreamingLLogNextMessage(state)) { write(outfd, message-data, message-length); if (received_100_messages) { fsync(outfd); StreamingLLogConfirm(message); } } Although I guess thats not good enough because StreamingLLogNextMessage would be blocking, but that shouldn't be too hard to work around. How about we pass a timeout value to StreamingLLogNextMessage (..) where it returns if no data is available after the timeout to give the caller a chance to do something else. This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] It seems to me that that makes escaping unneccesarily complicated, but given you already have all the code... ;) When I look at the actual code/representation we picked it is closer to {column1,value1,column2,value2...} I don't t think our output plugin will be much more complicated than the test_decoding plugin. Good. Thats the idea ;). Are you ok with the interface as it is now or would you like to change something? I'm going to think about this some more and maybe try to write an example plugin before I can say anything with confidence. Yes. We will also need something like that. If you remember the first prototype we sent to the list, it included the concept of an 'origin_node' in wal record. I think you actually reviewed that one ;) That was exactly aimed at something like this... Since then my thoughts about how the origin_id looks like have changed a bit: - origin id is internally still represented as an uint32/Oid - never visible outside of wal/system catalogs - externally visible it gets - assigned an uuid - optionally assigned a user defined name - user settable (permissions?) origin when executing sql: - SET change_origin_uuid = 'uuid'; - SET change_origin_name = 'user-settable-name'; - defaults to the local node - decoding callbacks get passed the origin of a change - txn-{origin_uuid, origin_name, origin_internal?} - the init decoding callback can setup an array of interesting origins, so the others don't even get the ReorderBuffer treatment I have to thank the discussion on -hackers and a march through prague with Marko here... So would the uuid and optional name assignment be done in the output plugin or some else? When/how does the uuid get generated and where do we store it so the same uuid gets returned when postgres restarts. Slony today stores all this type of stuff in user-level tables and user-level functions (because it has no other choice).What is the connection between these values and the 'slot-id' in your proposal for the init arguments? Does the slot-id need to be the external uuid of the other end or is there no direct connection? Today slony allows us to replicate between two databases in the same postgresql cluster (I use this for testing all the time) Slony also allows for two different 'slony clusters' to be setup in the same database (or so I'm told, I don't think I have ever tried this myself). plugin functions that let me query the local database and then return the uuid and origin_name would work in this model. +1 on being able to mark the 'change origin' in a SET command when the replication process is pushing data into the replica. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the past when the WAL was generated. In addition to the node ID I can see us wanting to be able to query other slony tables (sl_table,sl_set etc...) Hm. There is no fundamental reason not to allow normal database access to the current database but it won't be all that cheap, so doing it frequently is not a good idea. The reason its not cheap is that you basically need to teardown the postgres internal caches if you switch the timestream in which you are working. Would go something like: TransactionContext =
[HACKERS] removing some dead keep compiler quiet code
Since ereport() can now communicate to the compiler whether it returns or not, a fair amount of keep compiler quiet code is dead. Since the method that ereport() uses is not dependent on any compiler-specific attributes, I think this code can just be removed. I propose the attached patch. diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index ceec6ff..b4fd50c 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -2438,11 +2438,6 @@ static bool is_valid_dblink_option(const PQconninfoOption *options, return NULL; } - - /* - * never reached, but keep compiler quiet - */ - return NULL; } /* diff --git a/contrib/spi/moddatetime.c b/contrib/spi/moddatetime.c index 2ec96540..1d10b7a 100644 --- a/contrib/spi/moddatetime.c +++ b/contrib/spi/moddatetime.c @@ -110,13 +110,10 @@ ObjectIdGetDatum(InvalidOid), Int32GetDatum(-1)); else - { ereport(ERROR, (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION), errmsg(attribute \%s\ of \%s\ must be type TIMESTAMP or TIMESTAMPTZ, args[0], relname))); - newdt = (Datum) 0; /* keep compiler quiet */ - } /* 1 is the number of items in the arrays attnum and newdt. attnum is the positional number of the field to be updated. diff --git a/contrib/tsearch2/tsearch2.c b/contrib/tsearch2/tsearch2.c index 968bd80..ee7a05b 100644 --- a/contrib/tsearch2/tsearch2.c +++ b/contrib/tsearch2/tsearch2.c @@ -54,8 +54,6 @@ errmsg(function %s is no longer supported, \ format_procedure(fcinfo-flinfo-fn_oid)), \ errhint(Switch to new tsearch functionality.))); \ - /* keep compiler quiet */ \ - PG_RETURN_NULL();\ } \ PG_FUNCTION_INFO_V1(name) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1faf666..60bf08b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5365,7 +5365,6 @@ static bool read_backup_label(XLogRecPtr *checkPointLoc, ereport(FATAL, (errmsg(could not locate required checkpoint record), errhint(If you are not restoring from a backup, try removing the file \%s/backup_label\., DataDir))); - wasShutdown = false; /* keep compiler quiet */ } /* set flag to delete it later */ haveBackupLabel = true; diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 79b71b3..255b928 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -76,7 +76,6 @@ (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(invalid fork name), errhint(Valid fork names are \main\, \fsm\, and \vm\.))); - return InvalidForkNumber; /* keep compiler quiet */ } /* diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index b9cfee2..4084c17 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -772,7 +772,6 @@ static bool stack_address_present_add_flags(const ObjectAddress *object, (errcode(ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST), errmsg(cannot drop %s because it is required by the database system, getObjectDescription(object; -subflags = 0; /* keep compiler quiet */ break; default: elog(ERROR, unrecognized dependency type '%c' for %s, diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c index 8ac8373..2cf01d5 100644 --- a/src/backend/commands/constraint.c +++ b/src/backend/commands/constraint.c @@ -75,13 +75,10 @@ else if (TRIGGER_FIRED_BY_UPDATE(trigdata-tg_event)) new_row = trigdata-tg_newtuple; else - { ereport(ERROR, (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), errmsg(function \%s\ must be fired for INSERT or UPDATE, funcname))); - new_row = NULL; /* keep compiler quiet */ - } /* * If the new_row is now dead (ie, inserted and then deleted within our diff --git a/src/backend/commands/define.c b/src/backend/commands/define.c index e0b0fc3..87051ac 100644 --- a/src/backend/commands/define.c +++ b/src/backend/commands/define.c @@ -106,7 +106,6 @@ errmsg(%s requires a numeric value, def-defname))); } - return 0; /* keep compiler quiet */ } /* @@ -161,7 +160,6 @@ (errcode(ERRCODE_SYNTAX_ERROR), errmsg(%s requires a Boolean value, def-defname))); - return false;/* keep compiler quiet */ } /* @@ -194,7 +192,6 @@ errmsg(%s requires a numeric value, def-defname))); } - return 0; /* keep compiler quiet */ } /* @@ -223,7 +220,6 @@ errmsg(argument of %s must be a name, def-defname))); } - return NIL; /* keep compiler quiet */ } /* @@ -253,7 +249,6 @@ errmsg(argument of %s must be a type name, def-defname))); } - return NULL;/* keep compiler quiet */ } /* @@ -298,7 +293,6 @@ (errcode(ERRCODE_SYNTAX_ERROR), errmsg(invalid argument for %s: \%s\, def-defname, defGetString(def; - return 0; /* keep compiler quiet */
Re: [HACKERS] removing some dead keep compiler quiet code
Peter Eisentraut pete...@gmx.net writes: Since ereport() can now communicate to the compiler whether it returns or not, a fair amount of keep compiler quiet code is dead. Since the method that ereport() uses is not dependent on any compiler-specific attributes, I think this code can just be removed. I propose the attached patch. Meh. This will only work if the compiler understands that abort() doesn't return --- compilers that don't know that will start bleating about uninitialized variables again. And the ones that do know it will be able to throw away the dead code anyway. I doubt this is a good idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too much pgbench init output
Hi, On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz wrote: On 19.11.2012 11:59, Jeevan Chalke wrote: Hi, I gone through the discussion for this patch and here is my review: The main aim of this patch is to reduce the number of log lines. It is also suggested to use an options to provide the interval but few of us are not much agree on it. So final discussion ended at keeping 5 sec interval between each log line. However, I see, there are two types of users here: 1. Who likes these log lines, so that they can troubleshoot some slowness and all 2. Who do not like these log lines. Who likes these lines / needs them for something useful? No idea. I fall in second category. But from the discussion, I believe some people may need detailed (or lot more) output. So keeping these in mind, I rather go for an option which will control this. People falling in category one can set this option to very low where as users falling under second category can keep it high. So what option(s) would you expect? Something that tunes the interval length or something else? Interval length. A switch that'd choose between the old and new behavior might be a good idea, but I'd strongly vote against automagic heuristics. It makes the behavior very difficult to predict and I really don't want to force the users to wonder whether the long delay is due to general slowness of the machine or some clever logic that causes long delays between log messages. That's why I choose a very simple approach with constant time interval. It does what I was aiming for (less logs) and it's easy to predict. Sure, we could choose different interval (or make it an option). I am preferring an option for choosing an interval, say from 1 second to 10 seconds. BTW, what if, we put one log message every 10% (or 5%) with time taken (time taken for last 10% (or 5%) and cumulative) over 5 seconds ? This will have only 10 (or 20) lines per pgbench initialisation. And since we are showing time taken for each block, if any slowness happens, one can easily find a block by looking at the timings and troubleshoot it. Though 10% or 5% is again a debatable number, but keeping it constant will eliminate the requirement of an option. However, assuming we settled on 5 sec delay, here are few comments on that patch attached: Comments: = Patch gets applied cleanly with some whitespace errors. make and make install too went smooth. make check was smooth. Rather it should be smooth since there are NO changes in other part of the code rather than just pgbench.c and we do not have any test-case as well. However, here are few comments on changes in pgbench.c 1. Since the final discussion ended at keeping a 5 seconds interval will be good enough, Author used a global int variable for that. Given that it's just a constant, #define would be a better choice. Good point. Although if we add an option to supply different values, a variable is a better match. Exactly, if we ended up with an option then it looks good. But in your current patch it was constant, so #define should be preferred. 2. +/* let's not call the timing for each row, but only each 100 rows */ Why only 100 rows ? Have you done any testing to come up with number 100 ? To me it seems very low. It will be good to test with 1K or even 10K. On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So checking every 100 rows looks overkill. Well, the 100 is clearly a magical constant. The goal was to choose a value large enough to amortize the getlocaltime() cost, but small enough to print info even in cases when the performance sucks for some reason. I've seen issues where the speed suddenly dropped to a fraction of the expected value, e.g. 100 rows/second, and in those cases you'd have to wait for a very long time to actually get the next log message (with the suggested 10k step). So 100 seems like a good compromise to me ... Hmm... inserting just 100 rows / seconds is really slow. Since you already seen such behaviour, I have no objection keeping it 100. 3. Please indent following block as per the indentation just above that /* used to track elapsed time and estimate of the remaining time */ instr_timestart, diff; double elapsed_sec, remaining_sec; int log_interval = 1; OK 4. +/* have ve reached the next interval? */ Do you mean have WE reached... OK 5. While applying a patch, I got few white-space errors. But I think every patch goes through pgindent which might take care of this. OK Thanks Thanks for the review. I'll wait for a bit more discussion about the choices before submitting another version of the patch. Sure Tomas -- Sent via pgsql-hackers mailing list
[HACKERS] cannot disable hashSetOp
hello I looked on issue with consuming memory by query HashSetOp Except (cost=0.00..297100.69 rows=594044 width=30) - Append (cost=0.00..234950.32 rows=8286716 width=30) - Subquery Scan on *SELECT* 1 (cost=0.00..168074.62 rows=5940431 width=29) - Seq Scan on ac (cost=0.00..108670.31 rows=5940431 width=29) - Subquery Scan on *SELECT* 2 (cost=0.00..66875.70 rows=2346285 width=32) - Seq Scan on artist_credit (cost=0.00..43412.85 rows=2346285 width=32) I was surprised, because I didn't find any way how to disable hashing. Is there some way? Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the number of pending entries in GIN index with FASTUPDATE=on
Hello, Agreed. Attached patch introduces the pgstatginindex() which now reports GIN version number, number of pages in the pending list and number of tuples in the pending list, as information about a GIN index. It seems fine on the whole, and I have some suggestions. 1. This patch applies current git head cleanly, but installation ends up with failure because of the lack of pgstattuple--1.0--1.1.sql which added in Makefile. 2. I feel somewhat uneasy with size for palloc's (it's too long), and BuildTupleFromCString used instead of heap_from_tuple.. But it would lead additional modification for existent simillars. You can leave that if you prefer to keep this patch smaller, but it looks to me more preferable to construct the result tuple not via c-strings in some aspects. (*1) 3. pgstatginidex shows only version, pending_pages, and pendingtuples. Why don't you show the other properties such as entry pages, data pages, entries, and total pages as pgstatindex does? regards, -- Kyotaro Horiguchi NTT Open Source Software Center (*1) Sample diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index 8a2ae85..71c2023 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -29,2 +29,3 @@ +#include access/htup_details.h #include access/gin_private.h @@ -39,3 +40,2 @@ - extern Datum pgstatindex(PG_FUNCTION_ARGS); @@ -330,4 +330,5 @@ pgstatginindex(PG_FUNCTION_ARGS) int j; - char *values[3]; + Datum values[3]; Datum result; + bool nulls[3] = {false, false, false}; @@ -376,11 +377,6 @@ pgstatginindex(PG_FUNCTION_ARGS) j = 0; - values[j] = palloc(32); - snprintf(values[j++], 32, %d, stats.version); - values[j] = palloc(32); - snprintf(values[j++], 32, %u, stats.pending_pages); - values[j] = palloc(64); - snprintf(values[j++], 64, INT64_FORMAT, stats.pending_tuples); - - tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), - values); + values[j++] = Int32GetDatum(stats.version); + values[j++] = UInt32GetDatum(stats.pending_pages); + values[j++] = Int64GetDatumFast(stats.pending_tuples); + tuple = heap_form_tuple(tupleDesc, values, nulls); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers