Re: [HACKERS][OT] somebody could explain this?
[snip] Floating points numbers are accurate but not precise. OK, now this one beats me... what's the difference between accurate and exact ? I thought both mean something like correct, but precise refers to some action and accurate applies to a situation or description... I'm actually curios what it means. Merriam-Webster refers for both to correct as a synonym. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] somebody could explain this?
Cristian, I bet it's related to some rounding issue and the fact that floating formats are approximative even for small integers. Probably 12 ands up being slightly less in floating format (something like 11.999...), and the cast to integer is truncating it. Not 100% sure though... read up on your API, I'm not a C programmer :-) HTH, Csaba. On Fri, 2005-11-04 at 17:16, Cristian Prieto wrote: Hello, I'm using PostgreSQL 8.0.4 in Fedora Core 3, right now I'm learning a little about the postgresql internals and the way some kind of SPs could be written in c language; I found something really weird and I cannot explain to me this behavior: #include postgres.h #include fmgr.h PG_FUNCTION_INFO_V1(test); Datum repeat_item(PG_FUNCTION_ARGS) { int num_times; num_times = PG_GETARG_FLOAT8(0) * 100; PG_RETURN_INT32(num_times); } Inside psql this happens: # Create or replace function test(float) returns integer as 'test.so' language 'c' stable; select test(0.1); Returns 10 Select test(0.11); Returns 11 Select test(0.12) Returns 11 Select test(0.13) Returns 13 Select test(0.14) Returns 14 Select test(0.15) Returns 14 What Is happening here? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] parameterized limit statements
On Mon, 2005-11-07 at 18:43, Tom Lane wrote: [snip] If it doesn't have a value for the parameter, it'll assume 10% of table rows, which is what it's done for a long time if the LIMIT isn't reducible to a constant. Is 10% a reasonable guess here ? Here we use limit in combination with prepared statements to get something like less than 1% of the table. There are no exceptions to that in our code... even if the limit amount is a parameter. Furthermore, the limit amount is always a small number, usually ~ 100, but never more than 1000. So in my case, we could live with a suboptimal plan when the percentage would be more than 10%, cause then the table would be small enough not to matter that much. In turn it has a huge impact to wrongly guess 10% for a huge table... I think the best would be to guess 5% but maximum say 5000. That could work well with both small and huge tables. Maybe those values could be made configurable... just ideas, not like I could implement this... [snip] Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] someone working to add merge?
On Fri, 2005-11-11 at 18:15, Jaime Casanova wrote: On 11/11/05, Josh Berkus josh@agliodbs.com wrote: Jaime, so i suppose we can reuse many of the code breaking the merge in 3 pieces... for now they are just thougths, i will think more in this and try to implement it... comments? ideas? suggestions? Funny, we were just discussing this at OpenDBCon. Seems that you can't do a full implementation of MERGE without Predicate Locking (the ability to say lock this table against inserts or updates of any row with key=5). it isn't what select for update does? Select for update only works if the row is already there. If there's no row, you can't lock it. So you want then to insert it, but then it is possible that somebody inserted it before you, immediately after your update... so the solution would be more like: - try insert; - if insert fails, do update; You can already do that, but you have to place a save-point before the insert, so you can continue your transaction even if the insert fails. Without knowledge of postgres internals, the simplest would be to be able to do the continue transaction if insert fails with the cheapest prise to pay. This would mean wrap up existing code, except that continue transaction after failure of insert part. All this might be completely bull*it of course, I don't know too much about postgres internals. [snip] Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] someone working to add merge?
OK, I'm relatively new on this list, and I might have missed a few discussions on this topic. I wonder if doing it this way would not be better than using a table lock: - set a save point; - insert the row; - on error: - roll back to the save point; - update the row; - on success release the save point; This would provide less contention while paying the prise for the save point. In low contention scenarios the table lock would be better, and I wonder for high contention scenarios which is better, the table lock, or the save point version... Of course the table lock version is the future if predicate locking is going to be implemented later. Cheers, Csaba. On Fri, 2005-11-11 at 18:37, Peter Eisentraut wrote: Josh Berkus wrote: Funny, we were just discussing this at OpenDBCon. Seems that you can't do a full implementation of MERGE without Predicate Locking (the ability to say lock this table against inserts or updates of any row with key=5). However, Peter suggested that we could do a proof-of-concept implementation, working out syntax and trigger issues, based on a full table lock and do the hard work once it was proved to be feasable. Yes, I've started to work on this. Realizing that the current way to manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a table lock anyway, a MERGE implementation using a table lock would at least give some convenience benefit to users. (And possibly some performance, too, if the decision logic is currently run in the client.) A predicate locking implementation for MERGE might actually not be all that complicated, because you only need to look on pk = constant, not on arbitrary expressions. Nevertheless, I think it's best to write the MERGE command first and then optimize the locking. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] someone working to add merge?
On Fri, 2005-11-11 at 20:22, Bruno Wolff III wrote: On Fri, Nov 11, 2005 at 18:48:33 +0100, Csaba Nagy [EMAIL PROTECTED] wrote: OK, I'm relatively new on this list, and I might have missed a few discussions on this topic. I wonder if doing it this way would not be better than using a table lock: - set a save point; - insert the row; - on error: - roll back to the save point; - update the row; - on success release the save point; This would provide less contention while paying the prise for the save point. In low contention scenarios the table lock would be better, and I wonder for high contention scenarios which is better, the table lock, or the save point version... You may not be able to update the row after the insert fails. If there is insert occurring in another transaction, the row may not be visible to the current transaction. In which case you can neither insert or update the row. You need to wait for the other transaction to commit or rollback. Are you sure ? From what I understand, the insert will only fail when the other transaction commits, and actively wait for the commit or rollback. Look at this: session_1= create table test (col smallint primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index test_pkey for table test CREATE TABLE session_1= begin; BEGIN cnagy= insert into test values (1); INSERT 165068987 1 session_2= begin; BEGIN session_2= insert into test values (1); [session_2 is now waiting] session_1= commit; COMMIT [session_2 wakes up] ERROR: duplicate key violates unique constraint test_pkey So it looks like predicate locking is already in place for primary key conditions... Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] someone working to add merge?
Well, from my point of view it is a special case of predicate locking supported well by existing code, in this case the unique index (you said that, I'm not familiar with the code). I don't see why this cannot be capitalized on, to implement a sub-set of what predicate locking is, based on the mechanism already existing. I guess most of the users who need some kind of merge, replace, insert-or-update feature (or other reasons to block inserts/updates on a specific row) would be happy for now with the restriction that the condition must be backed by a unique index. So basically the only thing I'm trying to say is that a partial implementation which might be easily implementable (I might be wrong here), without too big performance penalties and covers a very valid problem is better than chasing the complete solution which is too complex to be implemented easily or it could be easily implemented but then has performance disadvantages. The only thing is to be labeled correctly so people don't expect something it isn't. Of course all this is hand-waving, I can't really help even if I wanted, my C skills are less then mediocre. Cheers, Csaba. On Tue, 2005-11-15 at 12:14, Alvaro Herrera wrote: Csaba Nagy wrote: session_1= create table test (col smallint primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index test_pkey for table test CREATE TABLE session_1= begin; BEGIN cnagy= insert into test values (1); INSERT 165068987 1 session_2= begin; BEGIN session_2= insert into test values (1); [session_2 is now waiting] This only happens because of the unique index. There's no predicate locking involved. The btree code goes some lengths to make this work; it would be probably simple to modify this to support MERGE or REPLACE on the limited cases where there's a UNIQUE index. Tom has already said this twice (on this thread only; he has already said it before IIRC.) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
Well, from my point of view it is more than delete and insert. That I can do right now with existing infrastructure. The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. Our current solution is to initialize all the possible counts beforehand, but that suboptimal as only a few of them will actually have data coming in later... And of course the import problem... we can have multiple concurrent imports, which must insert just once per some unique criteria, and update if the record is already there, and all this in a batch. This is also not possible without race conditions or aggressive locking. So for me the atomic, consistent and without performance penalties update_or_insert_based_on_unique_criteria does have a value, and that's coming exactly from the locking of the unique index which eliminates the race condition from this operation. I don't care about syntax sugar, just about things I could do more efficiently if this mechanism were in place... Cheers, Csaba. On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote: On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Simon Riggs Conceptually, a MERGE statement is just a long string of INSERTs and UPDATEs in the same transaction and I think we should treat it as such. Merge could also be considered as a long string of deletes and inserts. I guess that deleting those records that already exist and then inserting all of the records is faster because it could be done like a single join to perform the delete and then a single batch insert. And for us it makes no difference because in MVCC, UPDATE == DELETE + INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs in the same statement. Have a nice day, ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MERGE vs REPLACE
Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Cheers, Csaba. On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote: The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] MERGE vs REPLACE
OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) Cheers, Csaba. On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote: On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Well, then you guess wrong. This isn't what MERGE is for. MERGE is just a neat way of specifying the UPDATE and INSERT cases in the same statement. It doesn't remove the possibility duplicate inserts and thus primary key violations. If someone wants to make extensions to MERGE so that it can avoid the race condition and avoid the duplicate key violations, that's fine. But be aware that this is outside of the spec. It may be a useful addition, but perhaps we should consider MERGE and REPLACE as completely seperate targets. MERGE has a whole join construction with subqueries that would be a pain to make work in a way that is truly serialisable. REPLACE deals with only one row and tries to solve the race for that case only. Much easier to consider them seperately, no? I guess what's really irritating is that this clearly exposes the case listed in the docs as Why SERIALIZABLE isn't in all cases. If we could solve that for MERGE, we could probably solve it in the general case too. Have a nice day, ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS][OT] Doubt
http://acronymfinder.com/ Cheers, Csaba. On Fri, 2005-11-25 at 19:24, Gustavo Tonini wrote: What is ISTM? Sorry, Gustavo. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-www] [HACKERS] Upcoming PG re-releases
Maybe mausoleum would be even better name :-D Cheers, Csaba. On Thu, 2005-12-01 at 11:35, Euler Taveira de Oliveira wrote: --- Richard Huxton dev@archonet.com escreveu: If it's practical to keep them, I'd like to suggest doing so. If it's not practical, could we have a where_to_find_old_versions.txt file and open a project on sourceforge to keep them? What about an museum.postgresql.org to keep the old releases? Euler Taveira de Oliveira euler[at]yahoo_com_br ___ Yahoo! doce lar. Faça do Yahoo! sua homepage. http://br.yahoo.com/homepageset.html ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] generalizing the planner knobs
On Thu, 2005-12-01 at 22:01, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: On the other hand the type I would prefer to see are hints that feed directly into filling in information the planner lacks. This only requires that the user understand his own data and still lets the planner pick the best plan based on the provided information. This would avoid some issues, but it still is vulnerable to the problem that the hint you put in your code today will fail to track changes in your data tomorrow. Tom, I have to disagree here. At least in our application, we must provide for an acceptable worst case scenario, and sometimes a slightly wrong estimate can lead to a plan which is very fast 99% of the time but completely wrong in 1% of the cases. Sometimes the percentage is 50/50. I've had this situation with some limit plans where the planner had chosen a wrong index. The problem there was that the planner had estimated that the query will have 20 rows as a result, but it had less, and resulted in the complete scan of the index... as opposed to a much smaller scan that would have resulted by scanning the other index, as that one would have provided an end condition orders of magnitudes sooner. Now the statistics will always be only an estimation, and +/- a few can really make a big difference in some situations. In this particular situation the index choice of the planner would have been faster for all cases where there were really 20 rows returned, but I forced it to always choose the other plan (by adding the proper order by) because I can't risk a bad result in any of the cases. In this particular case I was able to force the planner choose a specific plan, but that might not be always possible, so I guess it really would make sense to be able to tell the planner how selective some conditions are. And yes, sometimes I would like to freeze a specific safe plan for a specific query, even if it is not optimal. So for me the hint mechanism is good for telling the server that I'm not interested at all in the BEST plan but which risks getting very bad on occasions, but in a good enough plan which is safe. And as for the selectivity changes over time, the hints will change along. In most of the situations when selectivity change, the SQL has to change too, sometimes even the complete workflow. I find that if changed hints will help in some occasions then having them would mean less maintenance than the code rewriting that would be otherwise involved... and I'm completely sure the server can't compensate for the change of the dynamics of the data all the time. And it definitely can't keep up with highly dynamic data, where the statistics change constantly in big tables... Our application for example has kind of batch processing, where we insert smaller or larger batches of data in a HUGE table (~200 millions of rows), and then that data is immediately used for different operations and then reports, and furthermore it is heavily updated. I can't think of any reasonable statistics target and ANALYZE strategy which could satisfy both small batches and large batches without running ANALYZE permanently with high statistics targets on the key fields... and even that would not be specific enough when limit 20 is involved. For queries involving this table I really would like to freeze plans, as any misplanning has bad consequences. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] generalizing the planner knobs
[snip] I want to be sure my existing queries keep using the plans they've been using until I allow them to change. I don't want to sit down and type select count(*) from users and have it not work correctly (ie, use a sequential scan) because the system is so single mindedly tuned for the OLTP application. Now this is exactly what I've had in mind... it would be nice to fixate a plan for some of the queries, and let the planner choose the best for all the rest. I think some other data bases have something like an optimizer plan stability feature, providing outlines of query plan bundles. This is maybe too much, but specifying that for a certain query I definitely want to use one index and not the other would be nice... On another note, it might be interesting to have some kind of prepare analyze, where the planner is allowed to go and get some more detailed estimation from the actual table data based on the hard-coded parameter values, and produce some more detailed statistics for the parameterized values so it can then produce hot-shot plans for the actual parameter values on each execution... I wonder if this makes any sense. This way we could have some very detailed statistics directly supporting the queries we actually use. I would call this kind of prepare for the most used/problematic queries from time to time, and the planner should decide what statistics it needs to support it and go and get it... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
On Wed, 2005-12-07 at 19:36, Greg Stark wrote: [snip] We periodically ran into problems with load spikes or other performance problems causing things to get very slow and stay slow for a while. Letting things settle out usually worked but occasionally we had to restart the whole system to clear out the queue of requests. Just as a personal opinion: I would love a REINDEX which does not block reads/writes, even if writes will be more expensive while it's running. There's always a period of time I can schedule the REINDEX so there's very low write activity, but it is impossible to find a time slot when there's NO write activity... and I think most of the real world applications are like this. I think it's very rare that an application is constantly getting high load, but most of them are constantly getting SOME important activity which makes downtime hard to schedule. Now if the slowdown of writes is not more than the acceptable service level, then it is a very workable solution to schedule the REINDEX on a not so busy time slot. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Reducing relation locking overhead
On Thu, 2005-12-08 at 16:05, Tom Lane wrote: [SNIP] There isn't any way for it to do that (ie, be sure everyone else has adjusted to the new state of affairs), short of acquiring some sort of short-term exclusive lock on the table, which is a really bad idea. The pending lock would block other incoming requests on the table until all the current users exited their transactions. But it is an acceptable compromise to lock the table until all current transactions are over... the alternative for reindexing a big table is usually to schedule a down-time, which is even worse... REINDEX is usually used to fix a big tables big index bloat, and that won't fly without a downtime, or, with this short-term full table lock in a low-traffic time-slot. For my usage patterns I would vote with the table lock if it is just a means of blocking new transactions until the running ones finish. I'll just make sure there are none long running when I issue the REINDEX... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Tab completion of SET TRANSACTION ISOLATION
Is there any chance for psql opening a new session if it's inside a transaction and use that to do whatever querying is needed ? Just something like the control connection on ftp (analogy not very good). That could cause other surprises though (could fail for example due to too many connections open), and I have no idea about psql internals so it might be completely against it's philosophy... Cheers, Csaba. On Tue, 2006-01-31 at 15:29, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: Some time ago, the tab completion code for the SET command was changed to read the list of available settings from the pg_settings table. This means that by the time you're done completing SET TRANSACTION ISOLATION, you've already sent a query and the command will be disallowed. It's not a major issue, but I figured I'd mention it since it confused me a while ago. If someone has an ingenious plan for working around this, let me know. Hm, that's a bit nasty. The only plan I can think of involves reading the list of available variable names in advance and keeping it around. However, I'm not sure I want psql issuing such a query at connection startup whether or not the info will ever be used :-( We also have the ability to check the current in-transaction status, so one possibility is to read the variable list only if not within a transaction (and we didn't do it already in the current session). Making the behavior of tab completion be state-dependent may seem like a non-starter, but really it is anyway --- anything involving a query will stop working in a failed transaction. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] streamlined standby process
Hi all, After spending some time (~2 days to fully(?) understand the process) setting up a standby machine using WAL shipping and experimentations to be fairly sure it is working reliably, I started thinking of how I would like it to be done. My dream-process of setting up a standby would be: - start up the postmaster on an empty directory (except maybe the postgres.conf and pg_hba.conf to be used after switching to active mode), with a build standby option; - the postmaster would start in a special standby mode, where it connects to the primary server using a normal connection (which in turn must be configured for e.g. in the command line, just as with psql); - the standby would issue a STANDBY command to the primary (possibly implemented at low level, i.e. it doesn't have to be a SQL command but could be at protocol level); - the primary would mark the start of the backup, just as today with the 'pg_start_backup' function; - the primary would start to stream on the connection all the files of the data base, interleaved with all the committed WAL records just as they commit (of course buffered somehow so it won't affect too much performance); - the standby would save the received files and store the WAL records for later replay; - when all the db files were transmitted, the primary would mark the end of the backup, just as today with 'pg_stop_backup', and signal this to the standby; - the standby can now start replaying the WAL records; - when all the saved WAL records up to the 'pg_stop_backup' mark were replayed, the standby is operational in the sense that it can be switched on in normal operation mode; - in the meantime the server keeps sending the WAL records as they become available, and the standby replays them as they arrive. This would make sure the standby is always as up to date as possible; - bringing the standby up would be a matter of loosing the connection to the master server, or some kind of local signalling; - the standby then would continue to work as a normal postmaster, using the configuration file which must be existing in the standby directory when starting the process; This process would only require from the user to know where he wants the standby and where's the master... no more scripting and worrying about partial WAL files. And it could provide some degree of automatic failover if relying on the connection failure is good enough sign of the primary server being down, but of course this is not mandatory. Is this scenario viable ? I realize it needs a lot of coding in the postmaster, and it must get somehow all WAL records as they commit. I'm mostly ignorant regarding the postgres code structure, so I have no idea how much work that would involve... so can you gurus please comment ? I really have no experience with C, but I would start doing some coding if it's deemed viable. Thanks, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] Logging statements and parameter values
Simon, For me the usage pattern would be: log all params, bind time values, on the same log line as log_min_duration entries. That's what I need to know which are the non-performant queries, and it also helps on occasions to identify application problems. In any case all your plans sound very good, can't wait to have it working :-) Thanks, Csaba. On Mon, 2006-02-06 at 12:43, Simon Riggs wrote: On Mon, 2006-01-30 at 17:19 -0500, Bruce Momjian wrote: Ted Powell wrote: On Mon, Jan 30, 2006 at 04:31:29PM -0500, Bruce Momjian wrote: I assume it is this TODO: * Allow protocol-level BIND parameter values to be logged --- Ted Powell wrote: Our development group needs to have the option of logging all SQL statements including substituted parameter values. [...] That's it! (I should have thought to look in the TODO.) Has any design work been done on this? No. I am with Simon Riggs today at my house and I asked him, hoping he can get it done for 8.2. I don't think it is very hard. Some more detailed thoughts: 1. Do we want to log parameters at Bind time or at Execution time? Bind is easier and more correct, but might look a little strange in the log since the parameters would be logged before the execution appears. IMHO Bind time is more correct. That would mean we have a separate line for logged parameters, e.g. parameters: p1=111 p2=hshssh p3=47000.5 2. Should we save them until end of execution, so we can output them on the same line as log_min_duration_statement queries? Sounds easier but the meaning might be more confused. 3. Do we want to log parameters that are used for planning, but no others? Sometimes yes, sometimes no, I think. Sounds like we need: - a log_parameters GUC with settings of: none, plan and all. - output log messages at Bind time on a separate log line, which would replace the existing statement: [protocol] BIND message with (portalname) parameters: p1=111 p2=hshssh p3=47000.5 - portalname would be blank if we aren't using named portals While we're discussing logging, I also want to be able to set log_min_duration_statement on a user by user basis (i,e, for individual applications). We set this to superuser-only for valid security reasons, but I'd like to have the ability for the superuser to relax that restriction for short periods, or even permanently on development servers. That sounds like another GUC: log_security = on which would enforce SUSET/USERSET control (and would need to be a SIGHUP parameter). Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] streamlined standby procedure
Hi all, I decided to start implementing a streamlined WAL shipping based standby building procedure. My aim is fairly simple: to be able to build a standby as automated as possible. The ultimate simplicity would be for me: - install postgres on the standby machine; - create a directory for the data base files, containing postgresql.conf and pg_hba.conf, and a standby.conf file; - start up the postmaster with a --build-standby option; All the rest should be done automatically by postgres. The procedure should be something similar to the one available today if you do it manually. The main difference would be that the standby postmaster should connect to the primary server, and get all table data and WAL record stream through normal data base connections... To facilitate this process, I thought about why not expose the WAL files through a system view ? Something along the lines of: pg_wal ( name text, walrecords blob, iscurrent boolean ) Then anybody interested in the WAL record stream could easily find out which is the current WAL record, and get any of the existing WAL records by streaming the blob. Closed WAL files would be streamed completely, and the current WAL file could be streamed in realtime as it is created... this would facilitate an always as up to date as possible standby, as it could get the WAL records in real time. To make it possible to reliably get closed WAL records, a WAL subscription system could be created, where a subscriber (the standby) could signal which is the oldest WAL file it did not get yet. The primary machine would keep all the WAL files extending back to the oldest subscribed one. Then each time the subscriber finishes processing a WAL file, it can signal it's interest in the next one. This could be implemented by a table like: pg_wal_subscription ( subscriber text, name text ) The subscribers would insert a record in this table, and update it to the next WAL file after they processed one. The subscriber names should be unique across subscribers, this should be managed by the admin who sets up the subscribers. When the subscriber is not interested anymore, it can delete it's subscription record. That could be done by the DBA too if things go haywire... To build a stand by based on log shipping it is necessary to get over all the data base files too. That could be also done by exposing them through some view, which in turn might take advantage of knowledge of the table structure to compress the data to be transferred. The main idea is to do all transfers through normal DB connections, so the only configuration to be done is to point the standby to the master machine... So, all this said, I'm not too familiar with either C programming or the postgres sources, but I'm willing to learn. And the project as a whole seems a bit too much to do it in one piece, so my first aim is to expose the WAL records in a system view. I would really appreciate any comments you have... Thanks, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] streamlined standby procedure
You obviously did not read further down :-) I was proposing a subscription system, where the slave can specify the oldest WAL file it is interested in, and keep that up to date as it processes them. That could cause of course trouble if a slave dies and it won't update the subscription, but that's not any different than the current setup if the archive_command starts failing constantly because the archive site is down. In either case human intervention is needed. The DB connection based approach has the advantage that you don't have to restart the server if your slave location changes, and you can have multiple slaves at the same time if you like, e.g. if you want to smoothly move over the slave to another machine. Cheers, Csaba. On Tue, 2006-02-07 at 16:18, Tom Lane wrote: Csaba Nagy [EMAIL PROTECTED] writes: The procedure should be something similar to the one available today if you do it manually. The main difference would be that the standby postmaster should connect to the primary server, and get all table data and WAL record stream through normal data base connections... This is pie-in-the-sky really. A design like that would mean that the master could *never* recycle WAL files, because it could never know when some slave would pop up demanding a copy of ancient history. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] streamlined standby procedure
On Tue, 2006-02-07 at 16:45, Tom Lane wrote: Csaba Nagy [EMAIL PROTECTED] writes: You obviously did not read further down :-) I was proposing a subscription system, where the slave can specify the oldest WAL file it is interested in, and keep that up to date as it processes them. And how is that system view going to handle subscriptions? Not THAT system view will handle the subscription... there would be this view which exposes the WAL files, which would do exactly that, expose the existing WAL files, and only those which exist. Of course it must place some kind of lock on the WAL file it currently streams so it is not recycled, but other than that this view should not be concerned with subscription issues. The subscription system would be a table in which you can insert (subscriber_id, oldest_WAL_file_name_i'm_interested_in) tuples. When recycling WAL files, this table will be consulted and only WAL files older than the oldest entry in the subscription table are allowed to be recycled. Slaves will update their subscription line after processing each WAL file, setting it to the next WAL file name they need. So the oldest WAL to be kept will actually be in sync with what the slaves really need. OK, now I start to see what you mean, i.e. if there's no subscription then all WAL files are immediately recycled, and the view can only show one entry, the current WAL. But actually that's OK, you still can see what's the current WAL file, and can subscribe starting with it. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] streamlined standby procedure
On Tue, 2006-02-07 at 16:58, Andrew Rawnsley wrote: IMHO the #1 priority in the current PITR/WAL shipping system is to make the standby able to tolerate being shut down and restarted, i.e. actually having a true standby mode and not the current method of doing it only on startup. This fits nicely in what I would like to achieve, and it might be actually a better start. While it is a trivial thing to fool postgres into staying in startup/restore mode with a restore_command that blocks until more files are available, if the machine needs to be shut down for whatever reason you have to go back to the last image and replay to the present, which isn't always convenient. Nor are you able to shut down the standby, copy it to a second instance to use for testing/development/whatever, and restart the standby. Why would you shut down the standby to copy it ? It would by nicer to be able build a standby of the standby ;-) Even nicer would be to have a stand-by which allows read only access, but I guess that's a tough call. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] sql row constructor...works!
Well, I've tested it a bit: db=# select version(); version PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) (1 row) db=# select (1,3) (2,3); ?column? -- f (1 row) db=# select (3,3) (2,3); ?column? -- f (1 row) db=# select (3,4) (2,3); ?column? -- t (1 row) It seems to me that (x,y) (a,b) means (x a AND y b) ... which is not exactly what you wanted... or I'm missing something ? Cheers, Csaba. On Wed, 2006-02-08 at 03:35, Merlin Moncure wrote: On Feb 8, 2006, at 11:17 , Merlin Moncure wrote: The proper SQL construct without row constructor is: select * from t where a = a1 and (a a1 or b= b1) and (a a1 or b b1 or c c1) order by a,b,c limit 1^ no offset necessary confused yet? This is interesting! Could you also provide the equivalent *with* a row constructor? (or did I miss that somewhere?) select * from t where (a,b,c) (a1, b1, c1) order by a,b,c limit 1; [plus full usage of key on t(a,b,c)] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] streamlined standby procedure
What do you find difficult about the current method? That's got to be the first discussion point. The main problem I have is the complexity of setup. It involves a lot of additional scripting which you have to get it right to be actually reliable. The documentation is giving a rough idea on how to do it, but it is quite some work to make it work, and you can't really tell that is reliable... Another issue is that unless you got the archive_command right in the master server from the beginning, you will have to restart the server once you decide to build your standby... the archive_command is a start-up time parameter. This could be of course alleviated by always using a stub script as archive command, and let it do nothing if you don't have a standby, and then modify it to start archiving to the right place once you start building one, or if you want to move it to another machine. But this is also not documented, and you have to figure it out for yourself. And responding to Tom's other post regarding a real standby mode, where you could stop the standby and then later resume it still in standby mode: I would actually have a good use for it tonight :-) We will migrate our application to a new version, which involves some changes in the data base. Now it would be nice to stop the standby BEFORE doing these changes, and if the migration fails for some reason, start up the standby and use it with our old application version. But if the migration succeeds, I want to start up the standby still as standby, and make it resume standby operation... rebuilding it will take half day at least. So a standby which can be isolated for a while would actually be useful. OK, now that I'm thinking a bit more about this, I could achieve this by fiddling with the restore_command so it stops delivering the logs for a while. But again it is not straightforward. The whole point of this is that starting up a standby should be as simple as pointing the standby machine to the primary server, without shell scripting gimmicks (which are OS specific and therefore hard to document in a generic way), without the need of fiddling with the primary's configuration (see archive command), without the need to restart the primary if the archive command was not right in the first place. And to make it easy to start up one more standby if needed, or isolate it for a while when doing some risky work on the primary. It's about user friendliness and flexibility. It's not that it can't do the work right now, but it's really hard to do it... Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_hba.conf alternative
I think this was discussed many times on this list, and the main conclusion was: if you don't trust your DB machine's admin, any security measure against him will be only illusory. The sysadmin can in any case access the data, you can just make this harder, you can't prevent that. So you better get admins who you trust... On Wed, 2006-02-08 at 13:34, Q Beukes wrote: Hello, Is there not some other alternative to pg_hba.conf? I have the problem where the system administrators at our company obviously have access to the whole filesystem, and our database records needs to be hidden even from them. With pg_hba.conf that is not possible, as they just change all the conf lines to trust auth and viola they have access to the database without passwords. Is there a more secure alternative to this? The perfect scenario being to deny everyone include root access to a database without a password. regards, Quintin Beukes ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] streamlined standby procedure
OK, this is news to me, I recall that last looking at the configuration docs it was start-up time, but I might be wrong. [looking up the docs] OK, citing the 8.1 online docs: 17.5.3. Archiving archive_command (string) The shell command to execute to archive a completed segment of the WAL file series. If this is an empty string (the default), WAL archiving is disabled. Any %p in the string is replaced by the absolute path of the file to archive, and any %f is replaced by the file name only. Use %% to embed an actual % character in the command. For more information see Section 23.3.1. This option can only be set at server start or in the postgresql.conf ^^^ file. It is important for the command to return a zero exit status if and only if it succeeds. Examples: archive_command = 'cp %p /mnt/server/archivedir/%f' archive_command = 'copy %p /mnt/server/archivedir/%f' # Windows It's at least confusing... it does say or in the postgresql.conf file too, but I must have overlooked that... and the only word is really confusing there. [looking at: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html] OK, this is what confused me. The annotated conf file states it's a startup time parameter. Well, good to know it's not... Actually, my needs of PITR/standby building are mostly solved by now, but it's sure not an easy ride, and I really wonder if there is any readily available script bundle to do it for a windows server... Maybe a standby-building-tutorial is all what is needed... Cheers, Csaba. Much of your difficulty seems to come from your thinking that this parameter requires a restart. It doesn't - check it out. The script need not be complex, you only need to put a wait loop in the restore script so that it waits for the next log file. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] fsutil ideas
On Fri, 2006-02-24 at 19:12, Rod Taylor wrote: On Fri, 2006-02-24 at 12:48 -0500, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: I watch for table bloat but I haven't figured out a nice way of tracking down the postgresql process with the oldest transaction running short of patching PostgreSQL to report the XID for a connection in pg_stat_activity. But I'm afraid that a long running transaction with many short queries will not even show up in pg_stat_activity. So that's not a completely reliable way of catching long running transactions... but it's true that most of the time a long running query is the problem, and that is catchable. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Automatic free space map filling
What bothers me about the TODO item is that if we have to sequentially scan indexes, are we really gaining much by not having to sequentially scan the heap? If the heap is large enough to gain from a bitmap, the index is going to be large too. Is disabling per-index cleanout for expression indexes the answer? I guess you're saying that full index scan should only be done when the index is a functional one, and use index lookup for safe indexes ? That would be a huge win for most of my vacuum-problematic tables, as I don't have any functional indexes. But I guess full index scan would still be faster if the percentage of pages changed is more than some threshold. On the other hand it would allow very frequent vacuuming even for huge tables so that situation should not occur. Autovacuum thresholds could be lowered drastically in that case... Anyway, as I remember, if you have a 20gig table, a vacuum / sequential scan is painful, but if we have to sequential scan the all indexes, that is probably just as painful. If we can't make headway there and we can't cleanout indexes without an sequential index scan, I think we should just remove the TODO item and give up on improving vacuum performance. From my POV, there must be a way to speed up vacuums on huge tables and small percentage of to-be-vacuumed tuples... a 200 million rows table with frequent updates of the _same_ record is causing me some pain right now. I would like to have that table vacuumed as often as possible, but right now it only works to do it once per week due to load problems on long-running transactions preventing vacuuming other tables. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Automatic free space map filling
Are you running 8.1? If so, you can use autovacuum and set per table thresholds (read vacuum aggressivly) and per table cost delay settings so that the performance impact is minimal. If you have tried 8.1 autovacuum and found it unhelpful, I would be curious to find out why. Yes, I'm running 8.1, and I've set up per table auto-vacuum settings :-) And I lowered the general thresholds too. Generally autovacuum is very useful from my POV, and in particular the per table settings are so. But the problem I have is not the performance impact of the vacuum itself, but the impact of the long running transaction of vacuuming big tables. I do have big tables which are frequently updated and small tables which are basically queue tables, so each inserted row will be updated a few times and then deleted. Those queue tables tend to get huge unvacuumable dead space during any long running transaction, and vacuum on the big tables is such a long running transaction. And I have a few of them, and one is in particular very busy (a task table, all activities go through that one). Now when the queue tables get 1000 times dead space compared to their normal size, I get performance problems. So tweaking vacuum cost delay doesn't buy me anything, as not vacuum per se is the performance problem, it's long run time for big tables is. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PostgreSQL Anniversary Summit, Call for Contributions
Wow, this is a great deal of burden that for sure I didn't have to do last time :-( Not sure why, maybe the laws changed or something. It is crystal clear that I have to do it this time however. I think you're overreacting guys... I would first try and go to the nearest Canadian embassy and try to get the visa. I bet in most of the cases they will just issue it without any invitation letter and the like... if not, only then worry about it ;-) I'm also citizen from one of the countries (Romania) which require visas to most of the world (or it required, the situation's relaxing in this respect), and I never had any problems getting one. Or maybe it changed after 9/11 ? Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Automatic free space map filling
Ewe. How expensive is scanning an index compared to the heap? Does anyone have figure on that in terms of I/O and time? See this post for an example: http://archives.postgresql.org/pgsql-performance/2006-02/msg00416.php For my 200 million table, scanning the pk index took ~ 4 hours. And then there are some more indexes... So if the index has to be scanned completely, that's still too much. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Automatic free space map filling
I got the impression that Csaba is looking more for multiple simultaneous vacuum more than the partial vacuum. So he rather needs Hannu Krosing's patch for simultaneous vacuum ... Well, I guess that would be a good solution to the queue table problem. The problem is that I can't deploy that patch on our production systems without being fairly sure it won't corrupt any data... and I can't rely on non-production testing either. Basically I'm waiting to see Tom saying it will fly :-) Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Where Can I Find The Code Segment For WAL Control?
Charlie, I'm currently working on a similar solution (it's true I'm only in the thinking phase). I don't have too much time to spend on it (~few hours per week, on the train during commuting), so it's not progressing too fast... Nevertheless, I would like to do a first proof-of-concept version in a contrib module with minimal coding involved. I don't have time right now to explain all the details how I would like to do it, but in essence I want to: - create a few functions which expose the WAL files and the data base files as streams; - use the archive_command framework to control when the WAL files can be recycled, in combination with a WAL subscription system to make sure the standby won't be missing WAL files (with safeguards so the primary system will not go out of WAL file system space if the standby can't keep up with the WAL stream); - create a standby manager program which only needs to know how to access the primary server in order to create the standby (by connecting to it through normal data base connections and using the above mentioned functions to stream the files); These are just the main ideas, the devil is in the details. As per answering your question, a possible starting point would be to take a look at: src/backend/postmaster/pg_arch.c This one deals with archiving WAL files, so it could give you some insight in how you can deal with the WAL files. Or just search the source tree for 'WAL' ... I guess there's quite a few places in the code where WAL is involved. Next week I'll post more detailed design (some charts I did to better understand the idea, and some more text to add details), if anybody is interested in co-working on it, otherwise I'll just keep working in my own pace until it gets done... Cheers, Csaba. On Fri, 2006-03-10 at 07:34, 王宝兵 wrote: I am now trying to develop the PG to support real-time backup.My architecture is somehow similar to the Database Mirroring technology of SQL Server 2005.The server end of the system is consisted of two DB servers one is the Principal server,the other is the Mirror server.whenever the Principal flushes its log buffer to the local log file,it must send the content of its buffer to the Mirror simultaneously.After the mirror receives the buffer,it write to its own log file and send a response message to the Principal,then the Mirror redo/undo its log.By this way, we can guarantee the database instances of the two servers identical. But now I encounter a problem.I don't know where the functions to control log buffer are.Which code segment may I refer to? I have just participated the PG project for a short time,and I will appreciate your help very much! Look forward to you all! Charlie Wang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Where Can I Find The Code Segment For WAL Control?
[Please use reply to all so the list is CC-d] Charlie, I guess what you're after is to make sure the WAL buffers are shipped to the stand-by at the same time as they are committed to disk. In any other case your desire to have the stand-by EXACTLY in sync with the primary server will not gonna work. But that would mean that the communication to the stand-by will become a performance bottleneck, as all transactions are only finished after the WAL records for them are synced to the disk. So if you want your stand-by completely in sync with your primary, you will want that the transactions finish only after their WAL records are pushed to the stand-by too... and then if the communication to the stand-by fails, all your transactions will wait after it, possibly causing the primary to stop working properly. So now you have another point of failure, and instead of making the setup safer, you make it unsafer. What I want to say is that it is likely not feasible to keep the stand-by completely in sync. In practice it is enough to keep the standby NEARLY in sync with the primary server. That means you will ship the WAL records asynchronously, i.e. after they are written to the disk, and in a separate thread. What I'm after is to have a thread which starts streaming the current WAL file, and keeps streaming it as it grows. I'm not completely sure how I'll implement that, but I guess it will need to do a loop and transfer whatever records are available, and then sleep a few seconds if it reaches the end. It must be prepared to stumble upon partially written WAL records, and sleep on those too. On the stand-by end, the current partial WAL will not be used unless the stand-by is fired up... So I'm after a solution which makes sure the stand-by is as up to date as possible, with a few seconds allowed gap in normal operation, and possibly more if the communication channel has bandwidth problems and the server is very busy. Usually if the server crashes, than there are worse problems than the few seconds/minutes worth of lost transactions. To name one, if the server crashes you will have for sure at least a few minutes of downtime. At least for our application, downtime in a busy period is actually worse than the lost data (that we can recover from other logs)... Cheers, Csaba. On Sun, 2006-03-12 at 02:50, 王宝兵 wrote: Csaba: Firstly I must thank you for your help.Some of our designs are identical except the following: - create a standby manager program which only needs to know how to Access the primary server in order to create the standby (by connecting To it through normal data base connections and using the above mentioned Functions to stream the files); In my opinion,if we create a standby manager program and run it as a daemon process,it will check the state of the WAL files of the Principal every few seconds.But there is a risk for data lost.For an instance,if the Principal has flushed its log buffer to the disk and the dirty data are also flushed immediately,but the standby manager program is running in its interval.Then the Principal fails.In this situation,the Principal has updated its database but the log segment hasn't been sent to the Mirror,because the time point for the standby manager program to check the WAL files hasn't come.And then these data are lost. I think this situation will happen very probably in a big cooperation and it s very serious. Perhaps I have misunderstood your opinion.If that,I apologize. Charlie Wang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Automatically setting work_mem
My own thoughts about the problems with our work_mem arrangement are that the real problem is the rule that we can allocate work_mem per sort or hash operation; this makes the actual total memory use per backend pretty unpredictable for nontrivial queries. I don't know how to fix this though. The planner needs to know the work_mem that will be used for any one of these operations in order to estimate costs, so simply trying to divide up work_mem among the operations of a completed plan tree is not going to improve matters. I know this is not right to the point related to what is discussed in this thread, and that it would need some serious work, but how about a mechanism to allow plans some flexibility at run-time ? What I mean is not to do all the decisions at plan time, but include some branches in the plan, and execute one branch or the other depending on actual parameter values, current statistics, current memory available, ... (name here other run-time resources). This would make a lot more feasible to long-term cache query plans. For e.g. you wouldn't have to worry too much about changing statistics if at runtime you can check them again... and you could put decision points based on current memory resources. Of course it still must be a balance between the number of the decision points (which ultimately means the size of the plan) and robustness against changing conditions, i.e. branches should only go in for conditions likely to change. Is this completely not feasible with current postgres architecture ? I have no idea how the planning/runtime works internally. It worths a look at how apache Derby does with query planning, where a planned query is actually a compiled Java class, i.e. the executable byte code which will run to fetch the results, created and compiled by the planner... interesting approach, allows for lots of flexibility at run-time, but probably won't work with C :-) Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.2 planning features
- Postgres intrinsic log-shipping replication (we have one to contribute) Are you saying you have a working WAL-shipping based portable (means working well on all platforms) replication already done ? Cause I was looking into implementing just this one :-) Do you have some details how it works ? Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Summary of coverity bugs
I wanted to make a link to the cvsweb interface but my XSLT skills wern't up to that apparently. The following XSL fragment will do the trick for the cvsweb link: trth align=leftFile/thtd xsl:element name=a xsl:attribute name=hrefhttp://developer.postgresql.org/cvsweb.cgi/xsl:value-of select=file//xsl:attribute xsl:value-of select=file/ /xsl:element /td/tr Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] error-free disabling of individual child partition
ALTER TABLE childN ALTER INHERITS DROP (parent); ALTER TABLE childN ALTER INHERITS ADD (parent); Wouldn't it be possible to allow the ADD/DROP to happen in the same statement, like: ALTER TABLE childN ALTER INHERITS DROP crt_parent ADD new_parent; or: ALTER TABLE childN ALTER INHERITS DROP crt_parent, ALTER INHERITS ADD new_parent; That would still make it one statement, but more explicit. And it would eliminate the need for parenthesis (I assume they were needed for supplying more than 1 table to inherit/disinherit). Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
[...] There has to be a more linear way of handling this scenario. So vacuum the table often. Good advice, except if the table is huge :-) Here we have for example some tables which are frequently updated but contain 100 million rows. Vacuuming that takes hours. And the dead row candidates are the ones which are updated again and again and looked up frequently... A good solution would be a new type of vacuum which does not need to do a full table scan but can clean the pending dead rows without that... I guess then I could vacuum really frequently those tables. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Good advice, except if the table is huge :-) ... Then the table shouldn't be designed to be huge. That represents a design error. [snip] This demonstrates that archival material and active data should be kept separately. They have different access patterns; kludging them into the same table turns out badly. Well, then please help me find a better design cause I can't see one... what we have here is a big membership table of email lists. When there's a sendout then the memberships of the affected group are heavily read/updated, otherwise they are idle. None of the memberships is archive data, they are all active data... the only problem is that they are so many. Is it so hard to believe that 100 million rows is all active data, but only used in bursts once per week (that's an example, some groups are more active, others less) ? Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
I suppose you have a table memberships (user_id, group_id) or something like it ; it should have as few columns as possible ; then try regularly clustering on group_id (maybe once a week) so that all the records for a particular group are close together. Getting the members of a group to send them an email should be faster (less random seeks). It is like this, and some more bookkeeping data which must be there... we could split the table for smaller records or for updatable/stable fields, but at the end of the day it doesn't make much sense, usually all the data is needed and I wonder if more big/shallow tables instead of one big/wider makes sense... Regularly clustering is out of question as it would render the system unusable for hours. There's no 0 activity hour we could use for such stuff. There's always something happening, only the overall load is smaller at night... Thanks, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Let me ask a question, you have this hundred million row table. OK, how much of that table is read/write? Would it be posible to divide the table into two (or more) tables where one is basically static, only infrequent inserts and deletes, and the other is highly updated? Well, all of it is read write... some of the data might be updated less frequently, but there's no way I would know which part of the data is that. Logically is just the same type of data... so unless I find a way to continuously move back and forth the data between an archive table and the live table, based on how active the groups are, I can't imagine any other way of partitioning it. And that would also mean some quite big load given the pretty high dynamics of the groups. The big thing in performance is the amount of disk I/O, if you have a smaller active table with only a single index, then you may be able to cut your disk I/O time really down. The smaller the row size, the more rows fit into a block. The fewer blocks the less dissk I/O. The less disk I/O the bbetter the performance. I agree, but it is quite hard to achieve that when the data set is both big AND the partitioning criteria is highly dynamic. Not to mention that deleting from that table is also a PITA performance-wise, so I wonder how well the continuous back and forth between the active and inactive table would do. Also, and anyone listening correct me if I'm wrong, you NEED to vacuum frequently because the indexes grow and vacuuming them doesnt remove everything, sometimes a REINDEX or a drop/recreate is the only way to get performance back. So if you wait too long between vacuums, your indexes grow and spread across more disk blocks than they should and thus use more disk I/O to search and/or shared memory to cache. This is nice in theory, but kills performance. I vacuum the big tables only overnight, otherwise the server is sluggish. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
You said the activity comes in bursts per group, so the obvious partitioning would be per group. If you have too many groups to have one partition per group you could try to find some modulo or other rule to spread them into separate partitions. This could be a solution... but then I'm not sure how well would do queries which need the first 10 records based on some criteria which does not include the group id. I guess limit queries across the union of the partitions don't work too well for now, and we do have such queries. I'm pretty sure we could work this out, but it would need some big refactoring of our current code which is not that simple... and it must work well with oracle too. We do have systems on Oracle too. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Well, the only thing left is to cluster the database. There are a couple ways to do this, one switch to a platform that supports clustering or create an API to wrap multiple databases. If your queries are simple and limited, you could create an HTTP/XML service that wraps a number of postgresql databases, issues a query across all databases, merges multiple query sets, and returns one homoginous stream. Our queries are not simple nor limited :-) We have a big variety of users, with a big variety of needs and talented user story writers who have imaginative minds... so the same code must work in quite a few ways, and some of the resulting queries are dynamically created. It's a tough job to optimize all the queries we have. Inserts would be handled by hash to machine weighted by number of records on each machine. Updates and deletes would have two keys, machine and ID. Such a setup might work for us but I fear it would be a major PITA to make it reliable and to maintain it. It's not like I can say let's allot a month of work for trying out a clustering solution, but I'm not sure if it will work fine at the end. We still have enough features to develop, the DB is something to solve part of the problem, not to keep us busy... the Oracle systems were there first, the application works more or less fine on them (with occasional need to optimize here and there). Supporting Postgres was a side-project to see if it works, and it works decently, so we deployed some systems on it. Both of the DBs have their quirks, and I can voice here the ones I don't like in Postgres... and then some developer might do something about it or not, and I find that OK. If my mind wouldn't refuse so categorically to learn C style programming (tried that and gave up), I would probably scratch my itches. I did it for smaller scaled ones, like truncating timestamp(0) instead of rounding so that it is consistent with what Oracle does, but that was just a one file modification... I simply don't find it fun to browse C code, compared to how easy is to understand Java code which I work with here. So unless somebody ports Postgres to Java, I'll further need to voice my itches here in the hope that they'll be solved by others... sorry for the long rant. It sounds like you have a big problem and you need a big solution. Well, Postgres does a decent job as it is. The problem is under peek load, sometimes it gets bogged down and the usual things like vacuum will not help immediately. I think a few more features like the dead space map for quick vacuum and even something like the original post's proposition would make postgres fly under heavy load too... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
These sort of solutions, IMHO, don't show how good PostgreSQL is, but show where it is very lacking. We all know Postgres is lacking; some of us try to improve it (some with more success than others). People who know the current limitations but like the capabilities, try to find workarounds to the problems. What surprises me is that, if you have such a low opinion of Postgres, you still use it. Alvaro, I understand your reaction, you're on the developer side... but please try to understand us mortals who can't write good C code too: some of us like what postgres already offers, but we would also like to not need those workarounds. And the only thing we can do is make big noise so somebody from the other side (sour side) will notice it and at one point do something about it... the noise here by no means means we have a low opinion about postgres. On the contrary, we appreciate enough postgres and it's developers to ask for what we would like to see, and we are sure there are some gifted developers out there who can program those features we scream about... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
First of all, I base my assumptions on what I recall to have read on this list, as I didn't try yet partitioning using inheritance. It's not trivial to set up and I didn't have the time to play with it yet. So I wouldn't know for sure that it won't work fine with our application, and that will only change when I'll get a few days to experiment. The experimentation will include the migration of existing data to the partitioned schema, which will be probably the most difficult part of it due to the size of the tables which need partitioning... You would query the parent (no union). Do you need order by's ? Without order by it is currently no problem. It's clear to me that partitioning by inheritance is transparent to the application, what worries me is that our application likely has a few queries which will be equivalent to a union when planning, and I fear bad performance there. An I need order by on all queries with limit. The few exceptions where I wouldn't need order by are when I want to delete/update chunk-wise, but that's not supported right now... another feature I made noise about ;-) [snip] Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Usually it gets really bad if you *don't* run vacuum continuously, maybe hopeing to do it in slower times at night. For high-update db you have to run it continuously, maybe having some 5-15 sec pauses between runs. And how much I/O does this take? Surprisingly its mostly WAL traffic, the heap/index pages themselves are often not yet synced to disk by time of vacuum, so no additional traffic there. If you had made 5 updates per page and then vacuum it, then you make effectively 1 extra WAL write meaning 20% increase in WAL traffic. Is this also holding about read traffic ? I thought vacuum will make a full table scan... for big tables a full table scan is always badly influencing the performance of the box. If the full table scan would be avoided, then I wouldn't mind running vacuum in a loop... In fact I think that it would make sense to replace the whole current vacuum stuff with a background thread which does that continuously using a dead space map. That could be a heap sorted by tuple deletion time, and always cleaned up up to the oldest running transaction's start time... there would be no need for any other autovacuum then. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum row?
On Sun, 2006-06-25 at 05:29, Alvaro Herrera wrote: Mark Woodward wrote: I originally suggested a methodology for preserving MVCC and everyone is confusing it as update in place, this isnot what I intended. It doesn't make sense, but maybe vacuuming a page would. Naturally, it would need to wholly scan all the indexes to clean'em up, so it's probably not a good idea in general. But a version of vacuum which does normal index scans when vacuuming just a small percent of a huge table would make sense wouldn't it ? So you don't need to make full scans of the vacuumed indexes but look up the entries based on the vacuumed key. There were discussions about this I think, and the objection was that it might be that an index scan might miss index entries, in particular when badly behaved user defined functions are involved. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Table clustering idea
I think one of the issues might have been: how will you handle other indexes on the table when you can no longer point them at an item (since items will need to move to maintain an IOT). I guess you shouldn't allow any other indexes. That's a perfectly acceptable compromise I think... it would be still very useful for big and narrow tables which would benefit from being clustered. The other concern is how would you do sequential scans on the table if items are allowed to move ? I think some other DBs have a facility to make a fast index scan which is essentially a sequential scan of the index file, something like that would be needed here too. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Three weeks left until feature freeze
Does our CVS setup cater for seggregated rights like this? Or would that be done on a trust basis? No, I don't believe you can do this with CVS at all. We'd need something like SVN/WebDAV to be able to grant write access just to specific parts of the tree to different people. It is possible using CVS, by carefully managing file system permissions and assigning different permissions to the OS users of the different committers. I guess it's also possible using commit scripts... but I don't think it worths the effort as long as there is a regular backup of the CVS tree... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Three weeks left until feature freeze
On Thu, 2006-07-13 at 15:29, Stephen Frost wrote: It's not the PostgreSQL project's problem, that's true, but it certainly becomes an issue for distributions. Java as a PL ends up being a pretty odd case.. If there isn't anything in the PL code itself which forces a dependency beyond gcj then it might be possible to distribute it. Also allowing the PL to use a different JVM shouldn't be a problem so long as nothing is distributed which depends on the alternate JVM. The GPL is all about distribution and so I'm not sure that it would actually be a problem for an end-user to use Sun's JVM with GPL'd Java code. Now I'm completely confused... what GPL code ? Is PL/Java licensed under the GPL ? Or what GPL code do you talk about ? The PL/Java code is likely only dependent on the JVM specification, which does not put any restriction on how you must license your code, so PL/Java can be licensed in any way the author wants, including BSD. The distribution part is also no problem as I see it, as only the build tools are not BSD, and they are available for free (including the Sun JDK) and they don't restrict what should be the license of the code you compile. This can only be a problem for purists like GPL zealots or perhaps debian, otherwise is not that hard to download and install the SUN JDK on a build machine... you don't need to distribute the JDK, only the runtime JVM, which you actually can do (including again the Sun runtime). So I can't see problems again from the packager point of view... except purists might put a separate pl/Java module in some non-free repository given the dependency on some non-free runtime... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Three weeks left until feature freeze
On Thu, 2006-07-13 at 17:03, Tom Lane wrote: [...] I don't know what other people who do core development feel about that --- but I dislike the idea that when someone changes such an API, the buildfarm will go all red because there's only one person with the ability to fix PL/Java. But the alternative is that nothing is going red, and the PL stays broken until somebody notices it which might be too late to easily know which change broke it. Wouldn't it be possible to separate the red/green lights for the core and for the PLs ? So the core stays green and the PLs go red... and stay red until the PL maintainer fixes things. And I don't believe there's only one man who knows good Java around... once PL/Java gets in the core I'm pretty sure there will be a lot of people using it and caring about it. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Progress bar updates
We already have EXPLAIN ANALYZE. Perhaps the right way to do this is something that provides similar output. I could see something that looks like EXPLAIN for the parts that have not yet executed, something reasonable to show progress of the currently active part of the plan (current time, rows, loops), and EXPLAIN ANALYZE output for the parts which have been completed. Now this is something that would really help testing a system, by dynamically seeing the plans of queries which run too long. That combined with the ability to see the values of bind parameters would be a useful debug aid. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Transaction Speed and real time database
[snip] Suppose that every body say me that POStgres is to slow for real time databases, then I will be very full trying to resolve this problems with postgres, don't think that? I think you didn't understand correctly: postgres is not slow, it is just not suitable for real RT applications because of a few reasons, which in fact make other data bases also not suitable for this purpose. The main concern is that a RT application usually needs predictable response times, possibly with guaranties for upper bounds of response times... and most data bases which are transactional and offer concurrent access won't give you such guaranties, due to locking issues. The question is, your application is really RT in the proper sense of the word, or it is just an OLTP application which needs to be fast but won't cause a nuclear explosion if one response in 100 will be slower than expected... in that case postgres might be good for you. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Transaction Speed and real time database
[please use reply to all, otherwise you'll have what you just had: the guy who you write goes home for the weekend and all the rest of the people on the list who would answer you won't know there is soemthing to answer...] On Fri, 2006-07-21 at 13:39, moises wrote: Sorry if I can't explain me clear. I want to use an extension of postgres, (because I'm very clear that postgres is not a real time database.) I want to modify some modules of postgres for this convenience, for example, add some time constrains to transactions, I know that is a very hard work, Our Applications are 100% real time works, controlling industrial process, and not OLAP or OLTP applications. My question is how many fast are postgres in transaction actually? Because maybe we don't need to work, in looking for more speed, just constrains and planners. I have to admit RT is for me just a vague idea I still remember from some courses, but I'm not sure if RT and transactional is compatible at all... do you really need a transactional data base for RT applications ? Cause postgres is just that, an excellent transactional DB, which BTW is fast enough for transactional work (where you expect that sometimes operations fail due to others working in parallel). I'm not sure in industrial RT applications you could afford failures due to concurrency... and in that case you would be better off using something non-transactional - but this is just my limited understanding of the problem. Others on this list might know better... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Transaction Speed and real time database
[snip] OTOH, one has to be very careful to not mix terms here. In industrial (production floor) applications, the term 'real time database' refers to soemthing completely different than a relational, transactional DB. But relational and transactional are orthogonal, they don't imply/require each other... most of the roadblocks you mentioned (including vacuum) is part of postgres transactional design and a non-transactional DB won't have that overhead. Your input enforces my thinking that the transactionality of the DB is the real roadblock... which means postgres will never really be an RT application in the proper sense of the word. Because of the features of a full-fledged relational database engine, engineers often wish they had one of those instead ;-). Usually, we solve this with some sort of streaming 'frontend', which buffers the data flow (usually to disk) until it's inserted into the database. This lowers the real-time requirement to 'has to be fast enough on average'. We have several of these types of applications in production at various customers, some for 6+ years continuously (using PostgreSQL 7.0!). This sounds the most reasonable approach :-) Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Better name/syntax for online index creation
Semantically, NOLOCK (or whatever word) seems most closely associated with CREATE INDEX, so having it as third word would make sense, and it would be quite easy to parse for psql. Strictly speaking, however, it would have to be NOLOCKLY in that case. :-) In this case CREATE INDEX CONCURRENTLY ... sounds better to me, although the whole feature sounds nice any way you will finally call it ;-) Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Forcing current WAL file to be archived
OK, offset added to TODO item. What would the offset give us? The last offset could be remembered by the external program, and it only has to transfer from the last offset to the new one. It allows incremental streaming of the WAL files... of course the external program will be a lot more complex than the current shell scripts which can be used for WAL archiving... The problems I see with this is if in this case the normal postgres WAL archiving won't conflict with this streaming ? And if yes, wouldn't it be better to have a separate mechanism for the stream based archiving ? I mean what will happen if postgres successfully archives the WAL file and then moves it out of way before the streaming process finishes with it, the streaming process will have a problem... A few months ago I spent some time thinking about a solution where a WAL based standby could be built using only normal data base connections to the master server, and one of the ideas was to create a WAL subscription mechanism where the standby subscribes for getting WAL files, and updates it's subscription status with the last processed WAL file after each processed file. The master can then recycle the WAL files only after they were confirmed by all current subscriptions... and to avoid excessive WAL file bloat if a slave goes offline, the subscription could be canceled automatically if it gets too much behind. If this mechanism is in place, it would be also nice if the slave could ask for the WAL records to be streamed on a normal data base connection. The function which would do it could be smart enough to stream the current WAL file too up to the current offset and then wait for new records. The slave would invoke the function for each WAL file it needs to transfer, and then when finished it would update it's subscription status and continue with the next one. The streaming function should not update the subscription status as this way the slave can ask for the file again if something goes wrong with the transfer. The third thing needed to create a facility for one-connection-standby building is to be able to stream the OS files of the DB through a DB connection - I guess that can be done with a relatively simple C function... With all these things in place, a program could be written which would run on the standby machine and completely automatically set up the standby, only needing a simple connection string to the master... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Forcing current WAL file to be archived
The problems I see with this is if in this case the normal postgres WAL archiving won't conflict with this streaming ? You are not forced to use it if your shell scripts do conflict. What I envisioned, was that the current WAL archiving shell script would just do some CRC check over the WAL's already shipped, or as we currently use rsync to do the actual shipping this is what happens automatically. Hmm, that sounds pretty smart... the archive process rsyncing over the file which was previously streamed... I guess this will mean very little overhead (in fact it only means the WAL archive to be read once more than absolutely necessary, and the CPU power to compute the CRCs). And if yes, wouldn't it be better to have a separate mechanism for the stream based archiving ? why separate ? I'm a great believer in doing the minimum useful change, at least in systems used in production. We already have a working solution for full file shipping, so why not just augment it with streaming the currently-written-to file. That's good so, I also have a working script, so I'm also not very motivated to do anything more complicated... but 6 months ago I would have been really glad to have a stand-alone program which I could install along postgres on the slave, point it to the master, and get a working WAL shipping based stand-by. Instead I spent a few days setting up our standby scripts and testing it under load... and never being certain it really works and it won't break exactly when I need it most... I mean what will happen if postgres successfully archives the WAL file and then moves it out of way before the streaming process finishes with it, the streaming process will have a problem... This should not happen. your streaming process should be smart enought to guarantee that. OK, true, the streaming script should always stream only the current file. If the last offset was from a previous WAL, it can be safely reset to 0, and stream the new WAL from the beginning. So the streaming script needs to remember the last WAL and offset, not just the offset. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree
[snip] (In fact, it's trivial to see how user-defined functions that are mislabeled immutable could make this fail.) So retail vacuum without any cross-check that you got all the index tuples is a scary proposition IMHO. Wouldn't work to restrict that kind of vacuum to only tables which have no indexes using user defined functions ? That would mean a very small restriction I guess, probably 99.9% of the indexes won't use user defined functions... I actually wonder if such a vacuum would be useful for my scenario, where I have some pretty big tables, and update a relatively small percentage of it. Would it be faster to run such a vacuum against the current one ? One example would be a ~100 million table where I have 1-4 million updates per day. Could I run vacuum multiple times a day for this table and expect that individual runs are relatively fast ? Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] GUC with units, details
[snip] Forcing people to use a specific casing scheme is just going to lead to confusion and user frustration. If there's not a very solid I guess nobody will force people to use the units at all. *functional* argument for it, we shouldn't do it. Wanting to enforce a convention that people rarely use isn't a good reason. But if you implement a new feature, history shows that it will stay like that forever. So if in 5 years everybody will use the ISO stuff, and postgres will want to do the same, then the users you don't want to confuse now will be forced to change their config files or be completely confused. Or it will be as with everything else, an early arbitrary decision sets everything in stone. And I do find confusing all these ambiguous meanings of K,G etc., and I think ISO is the right way to clear out the confusion at the cost of some inconvenience until the users get used to it. For postgres that would mean no user resistance anyway, as the possibility of specifying the unit is new, so who knows about it must have read the docs first, and the docs must specify the units you can use. Just my 2c. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GUC with units, details
On Thu, 2006-07-27 at 17:57, Peter Eisentraut wrote: Florian G. Pflug wrote: Rounding up would have the advantage that you could just specify 0 in the config file, and have postgres use the smallest value possible. In most algebras, dividing zero by something is still zero, so there'd be no need to round anything. I think he was refering to silently apply the minimum allowed if the value is less than that... a lot of the settings have a minimum allowable value. The question is if this can qualify as rounding :-) Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [CORE] Attack against postgresql.org ...
On Fri, 2006-07-28 at 17:37, Tom Lane wrote: Marc G. Fournier [EMAIL PROTECTED] writes: The attacking IP, from the logs, appears to be 87.230.6.96 ... Perhaps a complaint to their ISP is in order --- RIPE suggests [EMAIL PROTECTED] That looks 1 level too high, the immediate source seems to be http://www.ehost.pl/onas.php They could probably act faster and more at the source... down on the page from the link above you can find [EMAIL PROTECTED] for complaints. Cheers, Csaba. $ nslookup 87.230.6.96 Server: 192.168.1.4 Address:192.168.1.4#53 Non-authoritative answer: 96.6.230.87.in-addr.arpaname = vpsdws.xip.pl. Authoritative answers can be found from: 6.230.87.in-addr.arpa nameserver = dns.hosteurope.de. 6.230.87.in-addr.arpa nameserver = dns2.hosteurope.de. dns.hosteurope.de internet address = 80.237.128.156 dns2.hosteurope.de internet address = 80.237.129.61 $ whois xip.pl [Querying whois.dns.pl] [whois.dns.pl] % This is the NASK WHOIS Server. % This server provides information only for PL domains. % For more info please see http://www.dns.pl/english/whois.html Domain object: domain: xip.pl registrant's handle: dinz5du40 (CORPORATE) nservers: ns1.ehost.pl.[80.237.184.22] ns2.ehost.pl.[83.149.119.142] created:2003.10.06 last modified: 2005.09.19 registrar: Dinfo Systemy Internetowe ul. Mostowa 5 43-300 Bielsko-Biala Polska/Poland +48.33 8225471 [EMAIL PROTECTED] option: the domain name has not option Subscribers Contact object: company: eHost s.c. organization: eHost.pl street: Cichockiego 13/6 city: 24-100 Pulawy location: PL handle: dinz5du40 phone: +48.50253 last modified: 2004.11.03 registrar: Dinfo Systemy Internetowe ul. Mostowa 5 43-300 Bielsko-Biala Polska/Poland +48.33 8225471 [EMAIL PROTECTED] Technical Contact: company: eHost s.c. organization: eHost.pl street: Cichockiego 13/6 city: 24-100 Pulawy location: PL handle: dinz5du40 phone: +48.50253 last modified: 2004.11.03 registrar: Dinfo Systemy Internetowe ul. Mostowa 5 43-300 Bielsko-Biala Polska/Poland +48.33 8225471 [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Connection limit and Superuser
On Mon, 2006-07-31 at 15:00, Martijn van Oosterhout wrote: On Mon, Jul 31, 2006 at 08:47:38AM -0400, Rod Taylor wrote: It appears that the superuser does not have connection limit enforcement. I think this should be changed. So if some admin process goes awry and uses up all the connection slots, how does the admin get in to see what's happening? If there's a limit you're not really superuser, are you? I thought there is a limit for super-users too... citation from: http://www.postgresql.org/docs/8.1/static/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS max_connections (integer) Determines the maximum number of concurrent connections to the database server. The default is typically 100, but may be less if your kernel settings will not support it (as determined during initdb). This parameter can only be set at server start. Increasing this parameter may cause PostgreSQL to request more System V shared memory or semaphores than your operating system's default configuration allows. See Section 16.4.1 for information on how to adjust those parameters, if necessary. superuser_reserved_connections (integer) Determines the number of connection slots that are reserved for connections by PostgreSQL superusers. At most max_connections connections can ever be active simultaneously. Whenever the number of active concurrent connections is at least max_connections minus superuser_reserved_connections, new connections will be accepted only for superusers. The default value is 2. The value must be less than the value of max_connections. This parameter can only be set at server start. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Connection limit and Superuser
Nevermind, I realized now that you're talking about a different setting. I thought there is a limit for super-users too... citation from: http://www.postgresql.org/docs/8.1/static/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. Now wait a minute, is there some risk of lockup if I kill a backend ? Cause I do that relatively often (say 20 times a day, when some web users time out but their query keeps running). Should I rather not do it ? Thanks, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] pg_terminate_backend
You didn't answer the original question: is killing SIGTERM a backend known/suspected to be dangerous ? And if yes, what's the risk (pointers to discussions would be nice too). statement_timeout is your friend. I know, but unfortunately I can't use it. I did try to use statement_timeout and it worked out quite bad (due to our usage scenario). Some of the web requests which time out on the web should still go through... and we have activities which should not observe statement timeout at all, i.e. they must finish however long that takes. I know it would be possible to use a different user with it's own statement timeout for those requests, but that means we have to rewrite a lot of code which is not possible immediately, and our admins would resist to add even more configuration (additional users=additional connection pool+caches and all to be configured). We also can fix the queries so no timeout happens in the first place, but that will take us even more time. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: You didn't answer the original question: is killing SIGTERM a backend ^^^ Nevermind, I don't do that. I do 'kill backend_pid' without specifying the signal, and I'm sufficiently unfamiliar with the unix signal names to have confused them. Is a plain kill still dangerous ? Thanks, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_terminate_backend
Stuck? You have not shown us a case where SIGTERM rather than SIGINT is necessary or appropriate. It seems to me the above is assuming the existence of unknown backend bugs, exactly the same thing you think I shouldn't be assuming ... I do know a case where a plain kill will seem to be stucked: on vacuum of a big table. I guess when it starts an index's cleanup scan it will insist to finish it before stopping. I'm not sure if that's the cause, but I have seen delays of 30 minutes for killing a vacuum... it's true that finally it always did die... but it's also true that I have 'kill -9'-ed it before because I thought it's stucked. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_terminate_backend
man kill says the default is SIGTERM. OK, so that means I do use it... is it known to be dangerous ? I thought till now that it is safe to use. What about select pg_cancel_backend() ? Thanks, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL performance enhancement when query
Constantin, What binding are you using ? We here use Java+JDBC, and we were able to get stable query plans by forcing server side prepared statements (using PGStatement#setPrepareThreshold with 1 as the threshold), where the query is prepared without knowing the parameter values. This can backfire too, but for our purposes it was the right thing (probably sacrificing some performance, but getting a stable system). The plans in this case are made to work with guessed mean values for the estimates, and that's usually resulting in a stable plan, so once you got it right it will stay like that. Cheers, Csaba. On Mon, 2006-08-07 at 22:02, Constantin Teodorescu wrote: Hello all, hope you are remembering me, some years ago I've designed the PgAccess , the Tcl/Tk visual interface to PostgreSQL. Thought you haven't received any news from me, I continued working with PostgreSQL, being involved in very big projects in Romania. Right now, the national identification of the cows, sheep, goats and pigs in Romania runs on PostgreSQL on a very big database. Once again , I had to thank you all for keeping up maintaining and improving PostgreSQL. My message to all of you is related to this big project (a government sustained project) and some performance issues. Very few words about the database: approx. 60 tables, 30 of them containing 10 millions to 50 millions records , the whole database is approx 40 Gb size ! In order to get a good performance, the database is operated on a dual XEON with 6 Gb RAM IBM x235 server, the database with tables and indexes carefully distributed on 6 different SCSI disks, in different tablespaces in such a manner to allow parallelizing reads and HDD head movements on different devices when joining those big tables. We have tuned every possible parameter in config file, we have reorganized queries, analyzing explains in order to get the best results for all big queries and we succeeded most of the time. But we have encountered some problems. Due to constant updates and inserts into the database, it's size is growing continuously. Of course we are doing DAILY the needed maintaince, vacuums, analyzes and backups. Due to permanent changes in database size and statistics there are queries that sometimes change their execution plan, badly choosing another plan and executing those queries in 2,3 minutes instead of 10 seconds, the usual execution time since the query plan is switched. We have done any effort in changing subselects and the query sentence in order to force using some indexes, continuously watching the explain results. We have faced yesterday with such a problem with a query that switched the query plan to a very bad one, almost putting the whole system down. The only way that we have succeeded to make it work again was by using the SET ENABLE_MERGE_JOIN to OFF. For the moment it works but in our opinion this is NOT the best approach to guide the planner to a better query-plan variant. Our suggestion would be : extending the EXPLAIN and SELECT commands like that: EXPLAIN VARIANTS SELECT .. (and so on) that will display the different query plans analyzed by the planner and their estimated time values , not just the best guess . assuming that the EXPLAIN VARIANTS will show 3 or 4 different query plans, the database manager will be able to experiment, to test, and to decide by himself what is THE BEST PLAN FOR ME, instead of letting postgresql planner to to that. Doing this, we would be able to clearly specify then in the SELECT statement the version of the query-plan that would be used in execution like in the following example: SELECT (very big and complex query) ... USING PLAN 3; Specifying the desired plan could be of course, different. I realise that it would be probably better that the query-plan will guess the right and optimal plan. I agree that this can be done be tweaking parameters and costs BUT THIS CAN TAKE A LOT OF TIME, much more than a couple of tests on the real database. An experimented database admin can detect much easier the appropriate plan and force the executor to select that one that he desires. In our opinion, this would be the simplest and the most non-intrusive method of manual choosing another query plan rather than indirectly setting ON or OFFS various parameters that could affect badly other queries. First of all, it's assumed that the query planner HAS ALREADY evaluated different variants and it decides to use one based upon the statistics informations of the involved tables and costs for various types of access. Unfortunately, due to a very difficult adjustment of those costs and timings of the HDD performance, IO transfer speeds, PostgreSQL is choosing sometimes a wrong plan. If we would have the power of choosing and experimenting different plans with SELECT USING PLAN that-one we can select than the
Re: [HACKERS] PostgreSQL performance enhancement when query
On Tue, 2006-08-08 at 12:36, Constantin Teodorescu wrote: We have tried PGStatement#setPrepareThreshold with 1 as the threshold but it's not a good solution. Actually is worst. Considering that you have 5 different query plans, you are selecting approx. random one of them, not taking into account the statistics. Wrong, you'll select _the same_ plan, that's what matters. If it's not the plan you wanted, you have to rewrite the query, and try again, but once you got the plan you wanted, it's pretty much you'll get always the same plan. So you only need to test as long as you get the right query to trigger the right plan... but of course this requires that your queries are so constructed to always be OK with that plan, regardless the parameter values. Usually this means a suboptimal plan, but stable execution times. If you need to give hints to the DB based on the parameter values and choose different plans for different parameter values, then you basically do the job of the planner in your application, and I guess sooner or later you'll make wrong choices too. Some hinting mechanism would be good for cases where the developer really know better how the data is laid out (e.g. forcing the use of a specific access method for one table in a complex join), but that forcing a complete plan is probably not good. Even the hinting is only a workaround for the planner fixes which will cannot make it to the stable version... On the daydreaming part, how about a 2 phase planner ? Modus operandi: Phase 1: compile and cache plan decision tree: - collect all reasonable plans without taking into account the parameter values; - check the parameter bounds where each plan is the fastest; - compile a decision tree which based on the parameter values chooses one plan or the other; - cache this plan decision tree; - there's no need to cache plans which will always loose to some other plan no matter what parameter values you give (to limit the size of the decision tree); Phase 2: run the decision tree to chose the best cached plan for the parameter values; You could use variables coming from the statistics system in the decision tree so it doesn't have to be recalculated too often on statistics changes. With a system like this, you could at system startup make the decision tree for all your frequently used queries and have fast planning at runtime which is optimized for the parameter values (takes the decision tree from the cache, runs it with the current parameters). Or just store the whole thing in a system table... or tweak the decision tree manually... This is actually not addressing the plan stability issue, but if manual tweaking would be allowed, it would... Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] An Idea for planner hints
On Tue, 2006-08-08 at 22:14, Tom Lane wrote: So some kind of override for statistical guesses doesn't seem completely silly to me. But it needs to be declarative information that's stored somewhere out of view of the actual SQL queries. IMHO anyway. The real problem is that sometimes there's no way to get a better plan without some code change in the planner. And given the postgres release policy, that might be as far as 1 year away for a normal user... of course it's open source, you can patch, but would I trust a patch which is not tested by the community ? So mostly I can't wait for code changes, and then a generic tool to fix _now_ the one bad query which brings my system down would be nice. This is why hints would be nice, to quick-fix immediate problems. Of course they can and would be abused, as anything else. On the planner improvements part, would it be possible to save statistics about join criteria between tables ? I'm not sure where that would belong, but I guess it would be possible to have a special kind of ANALYZE which analyzes multiple tables and their correlations... this way the user would not need to hard-code the statistics hints, but the system could generate them. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] V3 protocol vs INSERT/UPDATE RETURNING
1. Define it as a feature not a bug. People do occasionally ask for UPDATE foo ... LIMIT 1 after all. But this is a pretty klugy way of getting that, and the arguments that say allowing LIMIT on updating queries would be a bad idea haven't lost their force. Being one of those who was asking for an UPDATE/DELETE with limit, I would be very glad if this would be implemented... it would be a big help for batch-processing data in OLTP environment (no long running queries allowed). I still don't see why would nondeterminism be generally a bad thing when there are applications which don't care about that... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Prepared statements considered harmful
- Redefine prepared to mean parsed rather than parsed and planned. How about prepared means really prepared... in the sense of parsed, analyzed all sensible plans, and save a meta-plan which based on current statistics and parameter values chooses one of the considered (and cached) plans ? That would be immune both to statistics changes and parameter value changes in certain limits. It would be also a lot more complex too than a simple plan... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Prepared statements considered harmful
On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote: Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy: How about prepared means really prepared... in the sense of parsed, analyzed all sensible plans, and save a meta-plan which based on current statistics and parameter values chooses one of the considered (and cached) plans ? I don't think this could solve one particularly frequent problem which is that pattern matching queries don't get along with prepared plans if the search pattern isn't known at planning time. Why not ? I specifically said you would prepare a few sensible plans based on statistics/expected variations of the statistics, and parameter value ranges which would trigger different plans. So for the like query case you could save 2 plans, one for the indexable case, one for the not indexable case. Then at runtime you choose the proper one based on the pattern value. The meta-plan I mentioned would be a collection of plans with rules to choose the right one at run time based on parameter values and perhaps the current statistics. This of course would need a lot more preparation time than just prepare one plan, but that's why you want to do it upfront and then cache the results. A central plan repository mentioned in other posts would fit nicely here... and you could use prepared plans for non-parameterized queries too by simply considering the constants as parameters, to increase the chances for a prepared plan reuse - this of course for complex enough queries. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Prepared statements considered harmful
On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote: OK, why don't you work out an example. Let's look at this query: SELECT * FROM t1 WHERE a LIKE $1; What two plans would you prepare? if substring($1 from 1 for 1) != '%' then use plan 1 (see below); else use plan 2 (see below); end if; Save both plans from below with the meta-plan from above, and call it a prepared plan. cnagy=# create table t1 (a text); CREATE TABLE cnagy=# insert into t1 select round(1000 * random()) from generate_series(1,1); INSERT 0 1 cnagy=# create index idx_t1_a on t1 (a); CREATE INDEX cnagy=# analyze verbose t1; INFO: analyzing public.t1 INFO: t1: scanned 55 of 55 pages, containing 1 live rows and 0 dead rows; 3000 rows in sample, 1 estimated total rows ANALYZE cnagy=# explain select a from t1 where a like '121%'; QUERY PLAN Bitmap Heap Scan on t1 (cost=2.06..27.63 rows=10 width=10) Filter: (a ~~ '121%'::text) - Bitmap Index Scan on idx_t1_a (cost=0.00..2.06 rows=10 width=0) Index Cond: ((a = '121'::text) AND (a '122'::text)) (4 rows) cnagy=# explain select a from t1 where a like '%121'; QUERY PLAN -- Seq Scan on t1 (cost=0.00..180.00 rows=80 width=10) Filter: (a ~~ '%121'::text) (2 rows) Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Prepared statements considered harmful
On Thu, 2006-08-31 at 15:49, Peter Eisentraut wrote: Note that plan 1 can only be created if you know the actual value for $1. Why would that be so ? The plan can contain functions of $1 (both constants in plan 1 are a function of $1). Cheers, Csaba ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Prepared statements considered harmful
Neither of these problems have anything to do with statistics getting stale. ... and the second one would benefit from a meta-plan facility which puts some meta-plan nodes on top of specific plans to dispatch based on parameter values at runtime. Incidentally, the dispatch could check the statistics assumptions too. If you don't need to do the planning for each execution, you could afford to check the assumptions for each execution instead... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] FE/BE protocol vs. parameterized queries
Although I don't have a clear opinion myself, I sometimes read on this list that people are using prepared statements to get safe, stable plans, i.e. plans that don't depend on the specific parameter input. I definitely want the possibility of getting stable plans. That's only possible if the planner does NOT take into account any parameter values. If the statistics get quicker out of date than it's practical to run analyze, but the plans would stay stable, it's better not to have parameter values taken into account. If you change that, I don't think they will be happy at all. I suggest leaving it as-is for 8.2. I think the user (i.e. driver) should be able to tell the backend, if they want planning for the first bind, or right at prepare. That would be nice. We would probably use all 3 forms: - unnamed statement: prepare based on constant parameters; - named statement: prepare based on the first set of parameter values; - named statement: prepare generic plan without considering parameter values; Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Opinion wanted on UUID/GUID datatype output formats.
select format_uuid(mypk,'format2') from tbluuid; and then get: 6b13c5a1-afb4-dcf5-ce8f-8b4656b6c93c How about instead of fixed formats, you allow a format string using the diverse parts of the GUID a la time formatting functions ? Then everybody can format it as they want. Just an idea. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Incrementally Updated Backup
True, but running several dozen instances on a single machine will require a lot more memory (or, conversely, each individual database gets a lot less memory to use). Of course, this is all hand-waving right now... it'd be interesting to see which approach was actually better. I'm running 4 WAL logging standby clusters on a single machine. While the load on the master servers occasionally goes up to 60, the load on the standby machine have never climbed above 5. Of course when the master servers are all loaded, the standby gets behind with the recovery... but eventually it gets up to date again. I would be very surprised if it would get less behind if I would use it in the 1 by 1 scenario. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] DROP FUNCTION IF EXISTS
Not being an expert, but to me it looks like the client_encoding being set to UTF8 but the data being sent is something other than UTF8. I've seen this happen on Linux when connecting with PuTTY from Windows (and then psql from the linux machine) and having the wrong encoding set in PuTTY. I'd double and triple-check the client-side stuff first ;-) I have seen another strange occurrence of such errors... I'm using linux with UTF8 client encoding, and psql gives me such errors: dbval=# select 1; ERROR: column 1 does not exist The full story is that I typed 'ü' (u-umlaut if it won't render correctly) and backspace before the '1'. I guess the backspace will delete byte-wise and will so fail to delete properly multi-byte characters. I have no idea if this is a problem of psql or some other problem, and it was not annoying enough to report it... BTW, the space in 1 is something I was not able to copy-paste from the psql command line into the mail, so I'm pretty sure it's a byte code which is invalid UTF8. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Block B-Tree concept
And we're back to routine REINDEX I guess :-(. This doesn't seem like a satisfactory answer. If the reindex works online, it could be a satisfactory solution. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Block B-Tree concept
I think you build a whole new index named something like .temp-reindex and then as the last step of the second transaction delete the old idnex and rename the new index. That would require getting exclusive lock on the table. Just out of curiosity, creating a new index concurrently (or online, whatever you call it) doesn't require to set an exclusive lock on the table ? I thought it would, at least swiftly at the end of the operation, after all it's modifying the table... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [OT][HACKERS] DROP FUNCTION IF EXISTS
Backspace deletes character-wise, as long as you have LANG set correctly. Check LANG and the LC_* environment variables. OK, you're right: $ echo $LANG [EMAIL PROTECTED] # show client_encoding ; client_encoding - UTF8 (1 row) But then I wonder why the client encoding is set to UTF-8 ? I did not fiddle at all with this AFAIK, and I guess psql is the one setting this... OTOH, I'm accessing the DB box via ssh, and my local box has: cnagy echo $LANG en_US.UTF-8 So it might be some strange interaction between my local locale, the server's locale and ssh... BTW, I tried if the same problem happens on the server command line (via ssh), and it does (while, for completeness, it works fine on my local box): [EMAIL PROTECTED]:~$ date -sh: date: command not found ^ this here is the result of 'ü + backspace' So I guess this is not postgres related, and therefore off topic for this list... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_dump exclusion switches and functions/types
[Snip explanations] Comments? Would it be reasonable to include one more switch: 'include dependencies' ? That would work like this: * first consider all to be included objects (possibly limited by the include switches); * if dependencies are included, add all dependent objects, plus non-schema objects (which arguably can be considered as dependencies for the whole data base); * remove all objects targeted by exclude switches; This way you won't have any dependency on the ordering, and you could consider all non-schema objects as dependencies, so they will only be included if dependencies are to be included. Excluding dependencies would be the default if any switches are specified, including otherwise (not sure how much is this of the principle of least surprise, but it would be backwards compatible). The scenario I most care about is to be able to make a complete data base dump (including non-schema objects) while excluding only a few tables. If I understood your explanations correctly, this would not be easily possible with the current implementation. Note that I have a patch (kludge ?) on the 8.1 pg_dump which does exactly this, it would be a pity if I would need to patch the 8.2 one again to do that... Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_dump exclusion switches and functions/types
We are two months past feature freeze ... adding entirely new features to pg_dump is *not* on the table for 8.2. Ok, clear. The scenario I most care about is to be able to make a complete data base dump (including non-schema objects) while excluding only a few tables. Isn't this the same as Kris' complaint? Why do you need additional dependency analysis to do the above? Well, I obviously didn't understand well the complete feature as it is implemented. Now, is what I want (see above) possible with the new feature, or if I exclude some tables I implicitly loose some other things too from the dump which normally would be there ? This is my only concern... Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_dump exclusion switches and functions/types
On Mon, 2006-10-09 at 16:24, Tom Lane wrote: I think we've agreed that if you use some exclusion switches, but not any inclusion switches, then only the specific objects matching your switches are excluded. CVS HEAD gets this wrong, but I'm going to work on it today. Cool, that makes it cover my use case and some more. Thanks, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Index Tuning Features
The above process can be performed without tool support, but its clear that further automation will help greatly here. I foresee that the development of both server-side and tools will take more than one release. Discussion of tool support can begin once we have agreed server-side capability. If it came to automated tools, wouldn't fit in this discussion to give some performance requirement limits to the RECOMMEND tool ? In a workload not all queries are real time or high priority, and such a lesser impact index can help enough sometimes to meet the requirements, compared to a high impact index which would make the query fly. Example: inserting in a table must be real time, reporting can be taken offline... So it would be nice to have a recommendation tool which can take into account the performance requirements of the individual queries, possibly making the right compromises to meat all requirements for all queries. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Synchronous Log Shipping Replication
On Tue, 2008-09-09 at 20:59 +0200, Zeugswetter Andreas OSB sIT wrote: All in all a useful streamer seems like a lot of work. I mentioned some time ago an alternative idea of having the slave connect through a normal SQL connection and call a function which streams the WAL file from the point requested by the slave... wouldn't that be feasible ? All the connection part would be already there, only the streaming function should be implemented. It even could use SSL connections if needed. Then you would have one normal backend per slave, and they should access either the files directly or possibly some shared area where the WAL is buffered for this purpose... the streaming function could also take care of signaling the up-to-dateness of the slaves in case of synchronous replication. There could also be some system table infrastructure to track the slaves. There could also be some functions to stream the files of the DB through normal backends, so a slave could be bootstrapped all the way from copying the files through a simple postgres backend connection... that would make for the easiest possible setup of a slave: configure a connection to the master, and hit run... and last but not least the same interface could be used by a PITR repository client for archiving the WAL stream and occasional file system snapshots. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:23 +0300, Heikki Linnakangas wrote: I'd imagine that even if applying the WAL on the slave is blocked, it's still streamed from the master to the slave, and in case of failover the slave will fast-forward before starting up as the new master. Which begs the question: what happens with a query which is running on the slave in the moment when the slave switches from recovery mode and starts up ? Should the running queries be canceled if they are blocking applying of WAL, to allow start-up, or let them finish ? Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote: One problem with this, BTW, is that if there's a continuous stream of medium-length transaction in the slave, each new snapshot taken will prevent progress in the WAL replay, so the WAL replay will advance in baby steps, and can fall behind indefinitely. Why would it fall behind indefinitely ? It only should fall behind to the blocking horizon, which should be the start of the longest currently running transaction... which should be continually advancing and not too far in the past if there are only medium length transactions involved. Isn't normal WAL recovery also doing baby-steps, one WAL record a time ? ;-) Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 16:19 +0300, Heikki Linnakangas wrote: Well, yes, but you can fall behind indefinitely that way. Imagine that each transaction on the slave lasts, say 10 minutes, with a new transaction starting every 5 minutes. On the master, there's a table that's being vacuumed (or HOT-updated) frequently, say after each transaction for simplicity. What can happen is that every transaction that finishes on the slave will only let the WAL replay advance by one XID before blocking on the snapshot of the next slave transaction. The WAL replay will advance at a rate of 0.2 TPM, while the master is generating 1.0 TPM. Aha, now I see where I was mistaken... I thought in terms of time and not transaction IDs. So the time distance between the slave transactions does not matter at all, only the distance in recovered XIDs matter for the blocking horizon... and if the WAL recovery is blocked, the blocking horizon is stalled as well, so the next transaction on the slave will in fact require the same blocking horizon as all currently running ones. Now I got it... and that means in fact that if you have continuously overlapping small transactions, the blocking horizon could be even blocked forever, as there'll always be a query running, and the new queries will always have the snapshot of the currently running ones because WAL recovery is stalled... or at least that's what I understand from the whole thing... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:33 +0200, Dimitri Fontaine wrote: What would forbid the slave to choose to replay all currently lagging WALs each time it's given the choice to advance a little? Well now that I think I understand what Heikki meant, I also think the problem is that there's no choice at all to advance, because the new queries will simply have the same snapshot as currently running ones as long as WAL reply is blocked... further blocking the WAL reply. When saying this I suppose that the snapshot is in fact based on the last recovered XID, and not on any slave-local XID. In that case once WAL recovery is blocked, the snapshot is stalled too, further blocking WAL recovery, and so on... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote: If you request a block, we check to see whether there is a lookaside copy of it prior to the tuple removals. We then redirect the block request to a viewpoint relation's block. Each viewpoint gets a separate relfilenode. We do the switcheroo while holding cleanup lock on block. Wouldn't it make sense to also have a hint bit on the pages which are copied away ? Then instead of looking up a hash table, you first would look up that bit, and if not set you won't look up the hash table at all. Then when you clean up the lookaside copies you clear those bits too... That would probably perform somewhat better for reading than always looking up a potentially big hash table, and the cost of setting the hint is probably a lot less than copying away the page in the first place. Resetting the hint bit might be a bit more expensive. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 12:31 +0100, Richard Huxton wrote: There was a suggestion (Simon - from you?) of a transaction voluntarily restricting itself to a set of tables. While thinking about how easy it would be for the DBA to specify the set of tables a single query is accessing, first I thought that it should be straight enough to look at the query itself for that. Then I thought what about views, rules, triggers, user functions etc. ? All those have the potential to access more than you see in the query itself. And then the actually interesting question: what will the slave do with views, rules, triggers ? I guess triggers are out of the question to be executed, what about rules ? Probably must be also ignored... user functions will probably get errors if they try to update something... Views should probably function correctly. So in any case the functionality available for querying slaves would be less than for the primary. This is probably good enough for most purposes... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
I think that enabling long-running queries this way is both low-hanging fruit (or at least medium-height-hanging ;) ) and also consistent to PostgreSQL philosophy of not replication effort. As an example we trust OS's file system cache and don't try to write our own. I have again questions (unfortunately I only have questions usually): * how will the buffers keep 2 different versions of the same page ? * how will you handle the creation of snapshots ? I guess there's no portable and universal API for that (just guessing), or there is some POSIX thing which is supported or not by the specific FS ? So if the FS is not supporting it, you skip the snapshot step ? And if there's no universal API, will it be handled by plugins providing a specified API for snapshotting the FS ? I hope my continuous questioning is not too annoying... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote: * how will the buffers keep 2 different versions of the same page ? As the FS snapshot is mounted as a different directory, it will have it's own buffer pages. Lack of knowledge about this shows my ignorance about the implementation of the page buffers... * how will you handle the creation of snapshots ? probably an external command, possibly shell script. similar to current archive_command for wal copying maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command' [snip] Yes, the simplest one being external command. As FS snapshots are supposed to happen not-too often, at least not every second, just having external commands may be enough. You could restrict the creation of snapshots to some minimum amount of time between them, and maybe also restrict the maximum number of concurrent snapshots possible. Then if the time limit (as calculated from the last open snapshot) is currently not met, any new query could reuse that last snapshot. The time intervals do not need to be evenly distributed BTW, it could be a function of the already opened snapshots, like increase the minimum interval exponentially with the number of already opened snapshots. That would help to catch more long running queries to just a few snapshots. I hope my continuous questioning is not too annoying... On the contrary, much appreciated. :) Ok, then I'll continue :-) I would like to see this feature succeed, but there's slim chance I'll ever code well in C... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers