Re: [HACKERS] Implicit casts with generic arrays
> For example in 8.2 this is mapped to array_prepend: > > regression=# select 'x'::text || array['aa','bb','cc']; >?column? > -- > {x,aa,bb,cc} > (1 row) > > but with the experimental code you get textcat: > > catany=# select 'x'::text || array['aa','bb','cc']; > ?column? > - > x{aa,bb,cc} > (1 row) This is what I would have expected || to give, and not what 8.2 does. So disregarding the rest of the argument I think that array_[pre|ap]pend should have other operators. Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Implicit casts with generic arrays
> For example in 8.2 this is mapped to array_prepend: > > regression=# select 'x'::text || array['aa','bb','cc']; >?column? > -- > {x,aa,bb,cc} > (1 row) > > but with the experimental code you get textcat: > > catany=# select 'x'::text || array['aa','bb','cc']; > ?column? > - > x{aa,bb,cc} > (1 row) This is what I would have expected || to give, and not what 8.2 does. So disregarding the rest of the argument I think that array_[pre|ap]pend should have other operators. Andreas I thing so current behave is more intuitive and practical. Result x{aa,bb,cc} is nonsens. Array concation have to have higher priority than text concation. Pavel ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] TOAST usage setting
> > No, you misunderstood. Bruce was suggesting changing the target to 512. > > That means if a row is wider than ~2k, toaster will try to toast until > > the base row is > > ~512 bytes. I would not do that part for 8.3. > > OK, what do you suggest for 8.3? Attached are my suggestion > to use 512 and a 4k chunk size, which I think means that 2.7k > is the worst values that has a loss of around 25%. Oh, so I misunderstood you also. You are suggesting two changes: TOAST_TUPLES_PER_PAGE 16 EXTERN_TUPLES_PER_PAGE 2 While I agree, that 2 might be a good compromise with low risc for now, I think that toasting all rows down to ~512 bytes is too narrowly targeted at not reading wider columns. When suggesting a new target, I interpreted you like so: #define TOAST_TUPLES_PER_PAGE 4 #define TOAST_TUPLE_TARGET \ MAXALIGN_DOWN((BLCKSZ - \ MAXALIGN(sizeof(PageHeaderData) + (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ / 16) So we would only toast rows wider than 2k, but once toasting, toast the base row down to 512. My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all else would need extensive performance testing. #define TOAST_TUPLES_PER_PAGE 4 My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but: Split data wider than a page into page sized chunks as long as they fill whole pages. Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. This would not waste more space than currently, but improve performance for very wide columns. I can try to do a patch if you think that is a good idea, can't do a lot of testing though. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Controlling Load Distributed Checkpoints
I'm again looking at way the GUC variables work in load distributed checkpoints patch. We've discussed them a lot already, but I don't think they're still quite right. Write-phase --- I like the way the write-phase is controlled in general. Writes are throttled so that we spend the specified percentage of checkpoint interval doing the writes. But we always write at a specified minimum rate to avoid spreading out the writes unnecessarily when there's little work to do. The original patch uses bgwriter_all_max_pages to set the minimum rate. I think we should have a separate variable, checkpoint_write_min_rate, in KB/s, instead. Nap phase - This is trickier. The purpose of the sleep between writes and fsyncs is to give the OS a chance to flush the pages to disk in it's own pace, hopefully limiting the affect on concurrent activity. The sleep shouldn't last too long, because any concurrent activity can be dirtying and writing more pages, and we might end up fsyncing more than necessary which is bad for performance. The optimal delay depends on many factors, but I believe it's somewhere between 0-30 seconds in any reasonable system. In the current patch, the duration of the sleep between the write and sync phases is controlled as a percentage of checkpoint interval. Given that the optimal delay is in the range of seconds, and checkpoint_timeout can be up to 60 minutes, the useful values of that percentage would be very small, like 0.5% or even less. Furthermore, the optimal value doesn't depend that much on the checkpoint interval, it's more dependent on your OS and memory configuration. We should therefore give the delay as a number of seconds instead of as a percentage of checkpoint interval. Sync phase -- This is also tricky. As with the nap phase, we don't want to spend too much time fsyncing, because concurrent activity will write more dirty pages and we might just end up doing more work. And we don't know how much work an fsync performs. The patch uses the file size as a measure of that, but as we discussed that doesn't necessarily have anything to do with reality. fsyncing a 1GB file with one dirty block isn't any more expensive than fsyncing a file with a single block. Another problem is the granularity of an fsync. If we fsync a 1GB file that's full of dirty pages, we can't limit the affect on other activity. The best we can do is to sleep between fsyncs, but sleeping more than a few seconds is hardly going to be useful, no matter how bad an I/O storm each fsync causes. Because of the above, I'm thinking we should ditch the checkpoint_sync_percentage variable, in favor of: checkpoint_fsync_period # duration of the fsync phase, in seconds checkpoint_fsync_delay # max. sleep between fsyncs, in milliseconds In all phases, the normal bgwriter activities are performed: lru-cleaning and switching xlog segments if archive_timeout expires. If a new checkpoint request arrives while the previous one is still in progress, we skip all the delays and finish the previous checkpoint as soon as possible. GUC summary and suggested default values checkpoint_write_percent = 50 # % of checkpoint interval to spread out writes checkpoint_write_min_rate = 1000 # minimum I/O rate to write dirty buffers at checkpoint (KB/s) checkpoint_nap_duration = 2 # delay between write and sync phase, in seconds checkpoint_fsync_period = 30# duration of the sync phase, in seconds checkpoint_fsync_delay = 500# max. delay between fsyncs I don't like adding that many GUC variables, but I don't really see a way to tune them automatically. Maybe we could just hard-code the last one, it doesn't seem that critical, but that still leaves us 4 variables. Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] [RFC] GSoC Work on readonly queries done so far
Hi This is a description of the GSoC work I've so for on readonly support for PITR slaves. I'm looking for any kind of comments on this - I want to make sure that I work in a direction that the community approves. Work done so far: - .) Added a new GUC operational_mode, which can be set to either readwrite or readonly. If it is set to readwrite (the default), postgres behaves as usual. All the following changes are only in effect if operational_mode is set to readonly. .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR) if postgre is not in readwrite mode. This macro protects the following functions to make sure that no writes occur in readonly mode. SimpleLruWritePage, SLruPhysicalWritePage EndPrepare, FinishPreparedTransaction XLogInsert, XLogWrite, ShutdownXLog CreateCheckpoint MarkBufferDirty. .) All transactions are set to readonly mode (An implicit SET TRANSACTION READONLY), and are not allowed to do SET TRANSACTION READWRITE. .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay is started, and it takes over that role that bgwriter play in the shutdown process. .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that is considered to be "later" than any other xid. .) A global ReadOnlySnapshot is maintained in shared memory. This is copied into backend local memory by GetReadonlySnapshotData (which replaces GetSnapshotData in readonly mode). .) Crash recovery is not performed in readonly mode - instead, postgres PANICs, and tells the DBA to restart in readwrite mode. Archive recovery of course *will* be allowed, but I'm not that far yet. Open Problems: -- .) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome, because callers usually call MarkBufferDirty from within a critical section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g. happens with my patch if you call nextval() in readonly mode. Does anyone see a better solution then adding checks into all callers that are not otherwise protected from being called in readonly mode? .) Since the slaves needs to track an Snapshot in shared memory, it cannot resize that snapshot to accomodate however many concurrent transactions might have been running on the master. My current plan is to detect if that global snapshot overflows, and to lock out readonly queries on the slave (and therefore remove the need of keeping the snapshot current) until the number of active xids on the master has dropped below max_connections on the slave. A warning will be written to the postgres log that suggest that the DBA increases the max_connections value on the slave. Please tell me what you think about this approach, and especially if you see any problems that I overlooked. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Controlling Load Distributed Checkpoints
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > GUC summary and suggested default values > > checkpoint_write_percent = 50 # % of checkpoint interval to > spread out writes > checkpoint_write_min_rate = 1000 # minimum I/O rate to write dirty > buffers at checkpoint (KB/s) I don't understand why this is a min_rate rather than a max_rate. > checkpoint_nap_duration = 2 # delay between write and sync phase, > in seconds Not a comment on the choice of guc parameters, but don't we expect useful values of this to be much closer to 30 than 0? I understand it might not be exactly 30. Actually, it's not so much whether there's any write traffic to the data files during the nap that matters, it's whether there's more traffic during the nap than during the 30s or so prior to the nap. As long as it's a steady-state condition it shouldn't matter how long we wait, should it? > checkpoint_fsync_period = 30 # duration of the sync phase, in seconds > checkpoint_fsync_delay = 500 # max. delay between fsyncs -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Implicit casts with generic arrays
Joe Conway <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> In the long run maybe we should choose some other name for the >> array_append and array_prepend operators to avoid the confusion with >> concatenation. It seems to me that "concatenation" normally implies >> "stringing together similar objects", which these two operators >> definitely don't do, and so you could argue that || was a bad name >> for them from the get-go. > Originally I saw this situation as as requiring the concatenation > operator per SQL 2003: Maybe I am missing something, but the only such construct I see in SQL2003 is concatenation of arrays of equal rank. There is nothing corresponding to array_prepend or array_append. I do have a plan B if people don't want to rename the operators, though. It looks to me like we could eliminate the conflict if we invented a new polymorphic pseudotype called "anynonarray" or some such, which would act like anyelement *except* it would not match an array. Then, declaring the capturing operators as text||anynonarray and anynonarray||text would prevent them from matching any case where either side was known to be an array type. But they would (I think) still win out in cases such as scalar || 'unknown literal'. The end result would be that concatenations involving a known-array value would be array concatenation, but you could force them to be text concatenation, if that's what you wanted, by explicitly casting the array value(s) to text. I was a bit hesitant to propose this since I couldn't immediately think of any other use-case for such a pseudotype. It's not a huge amount of added code (cf. anyenum) but it's definitely a visible wart on the type system. Comments? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Implicit casts with generic arrays
Tom Lane wrote: > I do have a plan B if people don't want to rename the operators, though. > It looks to me like we could eliminate the conflict if we invented a new > polymorphic pseudotype called "anynonarray" or some such, which would > act like anyelement *except* it would not match an array. Then, > declaring the capturing operators as text||anynonarray and > anynonarray||text would prevent them from matching any case where either > side was known to be an array type. But they would (I think) still win > out in cases such as scalar || 'unknown literal'. The end result would > be that concatenations involving a known-array value would be array > concatenation, but you could force them to be text concatenation, if > that's what you wanted, by explicitly casting the array value(s) to text. > > I was a bit hesitant to propose this since I couldn't immediately think > of any other use-case for such a pseudotype. It's not a huge amount of > added code (cf. anyenum) but it's definitely a visible wart on the type > system. Comments? On the contrary, I would think that it fits nicely to "close the loop" on the anyarray/anyelement feature set. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] TOAST usage setting
Zeugswetter Andreas ADI SD wrote: > > > > No, you misunderstood. Bruce was suggesting changing the target to > 512. > > > That means if a row is wider than ~2k, toaster will try to toast > until > > > the base row is > > > ~512 bytes. I would not do that part for 8.3. > > > > OK, what do you suggest for 8.3? Attached are my suggestion > > to use 512 and a 4k chunk size, which I think means that 2.7k > > is the worst values that has a loss of around 25%. > > Oh, so I misunderstood you also. You are suggesting two changes: > TOAST_TUPLES_PER_PAGE 16 > EXTERN_TUPLES_PER_PAGE2 Right. > While I agree, that 2 might be a good compromise with low risc for now, > I think > that toasting all rows down to ~512 bytes is too narrowly targeted at > not reading wider columns. Well, it is summarized here: http://momjian.us/expire/TOAST/SUMMARY.html It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed like a good compromise. > When suggesting a new target, I interpreted you like so: > #define TOAST_TUPLES_PER_PAGE 4 > #define TOAST_TUPLE_TARGET\ > MAXALIGN_DOWN((BLCKSZ - \ > MAXALIGN(sizeof(PageHeaderData) + > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ > / 16) > So we would only toast rows wider than 2k, but once toasting, toast the > base row down to 512. That is certainly not my intent, and I don't see how you would get the 2k number from that macro. I think you are looking at 8.2 and not CVS HEAD. CVS HEAD has: #define TOAST_TUPLE_TARGET TOAST_TUPLE_THRESHOLD > My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all > else would need extensive performance testing. > #define TOAST_TUPLES_PER_PAGE 4 > > My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but: > Split data wider than a page into page sized chunks as long as they fill > whole pages. > Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. > This would not waste more space than currently, but improve performance > for very wide columns. > > I can try to do a patch if you think that is a good idea, can't do a lot > of testing though. None of this spliting is going to happen for 8.3. The question what changes we can make for 8.3, if any. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Controlling Load Distributed Checkpoints
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > GUC summary and suggested default values > > checkpoint_write_percent = 50 # % of checkpoint interval to > spread out > writes > checkpoint_write_min_rate = 1000 # minimum I/O rate to write dirty > buffers at checkpoint (KB/s) > checkpoint_nap_duration = 2 # delay between write and sync phase, > in > seconds > checkpoint_fsync_period = 30 # duration of the sync phase, in seconds > checkpoint_fsync_delay = 500 # max. delay between fsyncs > I don't like adding that many GUC variables, but I don't really see a > way to tune them automatically. If we don't know how to tune them, how will the users know? Having to add that many variables to control one feature says to me that we don't understand the feature. Perhaps what we need is to think about how it can auto-tune itself. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Implicit casts with generic arrays
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I do have a plan B if people don't want to rename the operators, though. >> It looks to me like we could eliminate the conflict if we invented a new >> polymorphic pseudotype called "anynonarray" or some such, which would >> act like anyelement *except* it would not match an array. >> ... >> I was a bit hesitant to propose this since I couldn't immediately think >> of any other use-case for such a pseudotype. It's not a huge amount of >> added code (cf. anyenum) but it's definitely a visible wart on the type >> system. Comments? > On the contrary, I would think that it fits nicely to "close the loop" > on the anyarray/anyelement feature set. OK, I'll go code this up and verify that it behaves like I think it will... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] TOAST usage setting
"Bruce Momjian" <[EMAIL PROTECTED]> writes: > Well, it is summarized here: > > http://momjian.us/expire/TOAST/SUMMARY.html > > It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed > like a good compromise. Is this still testing with all data fitting in RAM? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TOAST usage setting
> > While I agree, that 2 might be a good compromise with low risc for > > now, I think that toasting all rows down to ~512 bytes is too narrowly > > targeted at not reading wider columns. > > Well, it is summarized here: > > http://momjian.us/expire/TOAST/SUMMARY.html > > It made non-TOAST access 2x faster, but TOAST 7x slower, and > that seemed like a good compromise. Yes, my argument was that I expect that in the up to 1-2k range more use cases will suffer the 7x slowdown, than see the speedup. But the ratio certainly is hard to judge, and you may well be right. e.g. for me TOAST_TUPLES_PER_PAGE 8 would be ok, I have base row widths of ~700 in 2 tables that would suffer if further toasted, but none in the 1k - 2k range. I wonder whether this threshold isn't rather absolute, thus the 32k pagesize users should probably use 32. > > When suggesting a new target, I interpreted you like so: > > #define TOAST_TUPLES_PER_PAGE 4 > > #define TOAST_TUPLE_TARGET \ > > MAXALIGN_DOWN((BLCKSZ - \ > >MAXALIGN(sizeof(PageHeaderData) + > > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ > > / 16) > > So we would only toast rows wider than 2k, but once toasting, toast > > the base row down to 512. > > That is certainly not my intent, and I don't see how you > would get the 2k number from that macro. I think you are the ~2k come from TOAST_TUPLE_THRESHOLD > looking at 8.2 and not CVS HEAD. CVS HEAD has: > > #define TOAST_TUPLE_TARGET TOAST_TUPLE_THRESHOLD Nope, I meant what I said. To only change the target you would replace above macro for TOAST_TUPLE_TARGET. But I also don't see how this would be good. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] elog.c logic bug?
I have just been staring for some time at the logic in src/backend/utils/error/elog.c:send_message_to_server_log(), which contains this fragment near the end: /* Write to stderr, if enabled */ if ((Log_destination & LOG_DESTINATION_STDERR) || whereToSendOutput == DestDebug) { #ifdef WIN32 /* * In a win32 service environment, there is no usable stderr. Capture * anything going there and write it to the eventlog instead. * * If stderr redirection is active, it's ok to write to stderr because * that's really a pipe to the syslogger process. Unless we're in the * postmaster, and the syslogger process isn't started yet. */ if ((!Redirect_stderr || am_syslogger || (!IsUnderPostmaster && SysLoggerPID==0)) && pgwin32_is_service()) write_eventlog(edata->elevel, buf.data); else #endif fprintf(stderr, "%s", buf.data); } /* If in the syslogger process, try to write messages direct to file */ if (am_syslogger) write_syslogger_file(buf.data, buf.len); ISTM that this is a bug - the last statement should be inside the STDERR block above, the last part of which would then read: /* If in the syslogger process, try to write messages direct to file */ if (am_syslogger) write_syslogger_file(buf.data, buf.len); else fprintf(stderr, "%s", buf.data); If not I have missed something - why would the syslogger be trying to write to its output (possibly for the second time) regardless of what Log_destination is set to? cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Florian G. Pflug wrote: Work done so far: - .) Don't start autovacuum and bgwriter. Do table stats used by the planner get replicated on a PITR slave? I assume so, but if not, you would need autovac to do analyzes. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] elog.c logic bug?
Andrew Dunstan <[EMAIL PROTECTED]> writes: > If not I have missed something - why would the syslogger be trying to > write to its output (possibly for the second time) regardless of what > Log_destination is set to? You're mistaken: within the syslogger process, stderr doesn't point to the same place as the target file (it's normally the same as the original postmaster stderr). The reason the code is set up to try to write both stderr and the target file is to maximize the chance that an internally generated error in syslogger will get reported *someplace*. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote: >.) Since the slaves needs to track an Snapshot in shared memory, it cannot > resize that snapshot to accomodate however many concurrent transactions > might have been running on the master. My current plan is to detect if > that global snapshot overflows, and to lock out readonly queries on the > slave (and therefore remove the need of keeping the snapshot current) > until the number of active xids on the master has dropped below > max_connections on the slave. A warning will be written to the postgres > log that suggest that the DBA increases the max_connections value on > the slave. > If we did lock the slave while waiting for transactions to complete on the master, we'd need to document some stronger warnings against idle transactions so that administrators could notice and correct the problem. Are you referring to the size of the xip array being a problem? Would it help to tie the size of the xip array to max_connections? I understand that max_connections might be greater on the master, but maybe something similar? Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Matthew T. O'Connor wrote: Florian G. Pflug wrote: Work done so far: - .) Don't start autovacuum and bgwriter. Do table stats used by the planner get replicated on a PITR slave? I assume so, but if not, you would need autovac to do analyzes. Yes - everything that get wal-logged on the master gets replicated to the slave. In my design, it isn't possible to do "analyze" on the slave, because all datafiles are strictly readonly (well, with the small exception of hit-bit updates actually). greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Jeff Davis wrote: On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote: .) Since the slaves needs to track an Snapshot in shared memory, it cannot resize that snapshot to accomodate however many concurrent transactions might have been running on the master. My current plan is to detect if that global snapshot overflows, and to lock out readonly queries on the slave (and therefore remove the need of keeping the snapshot current) until the number of active xids on the master has dropped below max_connections on the slave. A warning will be written to the postgres log that suggest that the DBA increases the max_connections value on the slave. If we did lock the slave while waiting for transactions to complete on the master, we'd need to document some stronger warnings against idle transactions so that administrators could notice and correct the problem. It's not exactly locking until it complete on the master, it's locking the slave until we reach a position in the wal on the slave with less than max_connections concurrent transactions. But yes, I agree, this will need to be documented. Are you referring to the size of the xip array being a problem? Would it help to tie the size of the xip array to max_connections? I understand that max_connections might be greater on the master, but maybe something similar? Thats what I currently do - the xip array on the slave is sized to hold max_connections entries (Actually, it's max_connections + max_prepared_xacts I think). The problem occurs exactly if those values are set too small on the slave - and since shared mem objects are not resizeable, I don't see how the slave can handle an xip overflow gracefully other than by not publishing the information in shared memory as long as it doesn't fit there. On a further thinking - maybe locking out transactions isn't even necessary - they would just continue to see the old global snapshot, so time wouldn't advance for them until the number of concurrent transactions decreases again. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Implicit casts with generic arrays
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: Tom Lane wrote: In the long run maybe we should choose some other name for the array_append and array_prepend operators to avoid the confusion with concatenation. It seems to me that "concatenation" normally implies "stringing together similar objects", which these two operators definitely don't do, and so you could argue that || was a bad name for them from the get-go. Originally I saw this situation as as requiring the concatenation operator per SQL 2003: Maybe I am missing something, but the only such construct I see in SQL2003 is concatenation of arrays of equal rank. There is nothing corresponding to array_prepend or array_append. Well, I've never claimed to be particularly good at interpreting the SQL spec, but as an example... ::= || ::= ::= ::= ::= ::= Doesn't this mean that array concatenation should include things like: || e.g. ARRAY[1,2,3] || 42 ? I do have a plan B if people don't want to rename the operators, though. It looks to me like we could eliminate the conflict if we invented a new polymorphic pseudotype called "anynonarray" or some such, which would act like anyelement *except* it would not match an array. Then, declaring the capturing operators as text||anynonarray and anynonarray||text would prevent them from matching any case where either side was known to be an array type. But they would (I think) still win out in cases such as scalar || 'unknown literal'. The end result would be that concatenations involving a known-array value would be array concatenation, but you could force them to be text concatenation, if that's what you wanted, by explicitly casting the array value(s) to text. That sounds reasonable to me. Joe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Controlling Load Distributed Checkpoints
On Wed, 6 Jun 2007, Tom Lane wrote: If we don't know how to tune them, how will the users know? I can tell you a good starting set for them to on a Linux system, but you first have to let me know how much memory is in the OS buffer cache, the typical I/O rate the disks can support, how many buffers are expected to be written out by BGW/other backends at heaviest load, and the current setting for /proc/sys/vm/dirty_background_ratio. It's not a coincidence that there are patches applied to 8.3 or in the queue to measure all of the Postgres internals involved in that computation; I've been picking away at the edges of this problem. Getting this sort of tuning right takes that level of information about the underlying system. If there's a way to internally auto-tune the values this patch operates on (which I haven't found despite months of trying), it would be in the form of some sort of measurement/feedback loop based on how fast data is being written out. There really are way too many things involved to try and tune it based on anything else; the underlying OS/hardware mechanisms that determine how this will go are complicated enough that it might as well be a black box for most people. One of the things I've been fiddling with the design of is a testing program that simulates database activity at checkpoint time under load. I think running some tests like that is the most straightforward way to generate useful values for these tunables; it's much harder to try and determine them from within the backends because there's so much going on to keep track of. I view the LDC mechanism as being in the same state right now as the background writer: there are a lot of complicated knobs to tweak, they all do *something* useful for someone, and eliminating them will require a data-collection process across a much wider sample of data than can be collected quickly. If I had to make a guess how this will end up, I'd expect there to be more knobs in LDC than everyone would like for the 8.3 release, along with fairly verbose logging of what is happening at checkpoint time (that's why I've been nudging development in that area, along with making logs easier to aggregate). Collect up enough of that information, then you're in a position to talk about useful automatic tuning--right around the 8.4 timeframe I suspect. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 19:25 +0200, Florian G. Pflug wrote: > Thats what I currently do - the xip array on the slave is sized to > hold max_connections entries (Actually, it's max_connections + > max_prepared_xacts I think). The problem occurs exactly if those > values are set too small on the slave - and since shared mem > objects are not resizeable, I don't see how the slave can handle > an xip overflow gracefully other than by not publishing the > information in shared memory as long as it doesn't fit there. > That seems like a very minor issue then. It's not unreasonable to expect that the PITR slave is configured very similarly to the master. You may even want to require it (if there are other reasons, too). > On a further thinking - maybe locking out transactions isn't even > necessary - they would just continue to see the old global snapshot, > so time wouldn't advance for them until the number of concurrent > transactions decreases again. That sounds better than locking out all reads. Perhaps still a warning in the logs though. If the situation you describe happens, the administrator probably needs to know about it. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Controlling Load Distributed Checkpoints
On Wed, 6 Jun 2007, Heikki Linnakangas wrote: The original patch uses bgwriter_all_max_pages to set the minimum rate. I think we should have a separate variable, checkpoint_write_min_rate, in KB/s, instead. Completely agreed. There shouldn't be any coupling with the background writer parameters, which may be set for a completely different set of priorities than the checkpoint has. I have to look at this code again to see why it's a min_rate instead of a max, that seems a little weird. Nap phase: We should therefore give the delay as a number of seconds instead of as a percentage of checkpoint interval. Again, the setting here should be completely decoupled from another GUC like the interval. My main complaint with the original form of this patch was how much it tried to syncronize the process with the interval; since I don't even have a system where that value is set to something, because it's all segment based instead, that whole idea was incompatible. The original patch tried to spread the load out as evenly as possible over the time available. I much prefer thinking in terms of getting it done as quickly as possible while trying to bound the I/O storm. And we don't know how much work an fsync performs. The patch uses the file size as a measure of that, but as we discussed that doesn't necessarily have anything to do with reality. fsyncing a 1GB file with one dirty block isn't any more expensive than fsyncing a file with a single block. On top of that, if you have a system with a write cache, the time an fsync takes can greatly depend on how full it is at the time, which there is no way to measure or even model easily. Is there any way to track how many dirty blocks went into each file during the checkpoint write? That's your best bet for guessing how long the fsync will take. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] elog.c logic bug?
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: If not I have missed something - why would the syslogger be trying to write to its output (possibly for the second time) regardless of what Log_destination is set to? You're mistaken: within the syslogger process, stderr doesn't point to the same place as the target file (it's normally the same as the original postmaster stderr). The reason the code is set up to try to write both stderr and the target file is to maximize the chance that an internally generated error in syslogger will get reported *someplace*. OK, thanks, I'll try to make that a bit clearer in a comment. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] msvc, build and install with cygwin in the PATH
Magnus Hagander wrote: >Hannes Eder wrote: >> Is it worth doing this the "Perl-way" and using File::Find? If so, I can >> work an a patch for that. >> > It's certainly cleaner that way, but I don't find it a major issue. But I'd > rather see that fix than the other one. Here we go. See attached patch. Your comments are welcome. Hannes. *** ..\pgsql-cvshead\src\tools\msvc\Install.pm Mo Mai 14 16:36:10 2007 --- src\tools\msvc\Install.pm Mi Jun 6 20:39:47 2007 *** *** 10,15 --- 10,18 use Carp; use File::Basename; use File::Copy; + use File::Find; + use File::Glob; + use File::Spec; use Exporter; our (@ISA,@EXPORT_OK); *** *** 99,104 --- 102,142 print "\n"; } + sub FindFiles + { + my $spec = shift; + my $nonrecursive = shift; + my $pat = basename($spec); + my $dir = dirname($spec); + + if ($dir eq '') { $dir = '.'; } + + -d $dir || croak "Could not list directory $dir: $!\n"; + + if ($nonrecursive) + { + return glob($spec); + } + + # borrowed from File::DosGlob + # escape regex metachars but not glob chars + $pat =~ s:([].+^\-\${}[|]):\\$1:g; + # and convert DOS-style wildcards to regex + $pat =~ s/\*/.*/g; + $pat =~ s/\?/.?/g; + + $pat = '^' . $pat . '\z'; + + my @res; + find( + { + wanted => sub { /$pat/s && push (@res, File::Spec->canonpath($File::Find::name)); } + }, + $dir + ); + return @res; + } + sub CopySetOfFiles { my $what = shift; *** *** 106,126 my $target = shift; my $silent = shift; my $norecurse = shift; - my $D; - my $subdirs = $norecurse?'':'/s'; print "Copying $what" unless ($silent); ! open($D, "dir /b $subdirs $spec |") || croak "Could not list $spec\n"; ! while (<$D>) { - chomp; next if /regress/; # Skip temporary install in regression subdir ! my $tgt = $target . basename($_); print "."; ! my $src = $norecurse?(dirname($spec) . '/' . $_):$_; ! copy($src, $tgt) || croak "Could not copy $src: $!\n"; } ! close($D); print "\n"; } --- 144,161 my $target = shift; my $silent = shift; my $norecurse = shift; print "Copying $what" unless ($silent); ! ! foreach (FindFiles($spec, $norecurse)) { next if /regress/; # Skip temporary install in regression subdir ! my $src = $_; ! my $tgt = $target . basename($src); print "."; ! copy($src, $tgt) || croak "Could not copy $src to $tgt: $!\n"; } ! print "\n"; } *** *** 371,395 { my $target = shift; my $nlspath = shift; - my $D; print "Installing NLS files..."; EnsureDirectories($target, "share/locale"); ! open($D,"dir /b /s nls.mk|") || croak "Could not list nls.mk\n"; ! while (<$D>) { - chomp; s/nls.mk/po/; my $dir = $_; next unless ($dir =~ /([^\\]+)\\po$/); my $prgm = $1; $prgm = 'postgres' if ($prgm eq 'backend'); - my $E; - open($E,"dir /b $dir\\*.po|") || croak "Could not list contents of $_\n"; ! while (<$E>) { - chomp; my $lang; next unless /^(.*)\.po/; $lang = $1; --- 406,425 { my $target = shift; my $nlspath = shift; print "Installing NLS files..."; EnsureDirectories($target, "share/locale"); ! ! foreach (FindFiles("nls.mk")) { s/nls.mk/po/; my $dir = $_; next unless ($dir =~ /([^\\]+)\\po$/); my $prgm = $1; $prgm = 'postgres' if ($prgm eq 'backend'); ! foreach (FindFiles("$dir\\*.po", 1)) { my $lang; next unless /^(.*)\.po/; $lang = $1; *** *** 401,409 && croak("Could not run msgfmt on $dir\\$_"); print "."; } - close($E); } ! close($D); print "\n"; } --- 431,438 && croak("Could not run msgfmt on $dir\\$_"); print "."; } } ! print "\n"; } ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Implicit casts with generic arrays
Joe Conway <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Maybe I am missing something, but the only such construct I see in >> SQL2003 is concatenation of arrays of equal rank. There is nothing >> corresponding to array_prepend or array_append. > Well, I've never claimed to be particularly good at interpreting the SQL > spec, but as an example... > ::= > || > ::= > ::= > ::= > ::= > ::= > > Doesn't this mean that array concatenation should include things like: > || No, because the first syntax rule for that is 1) The declared type of shall be an array type. However, assuming that the anynonarray idea works out, we can do that and not worry about touching the array operators. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote: > Florian G. Pflug wrote: > > Work done so far: > > - > > .) Don't start autovacuum and bgwriter. > > Do table stats used by the planner get replicated on a PITR slave? I > assume so, but if not, you would need autovac to do analyzes. The replication is an exact block-level replication of the master. We can't write very much at all on the slave. So if a query runs slow because of lack of stats you'd need to run ANALYZE on the master, which would then propagate the stats to the slave which could then use them. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Implicit casts with generic arrays
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> It looks to me like we could eliminate the conflict if we invented a new >> polymorphic pseudotype called "anynonarray" or some such, which would >> act like anyelement *except* it would not match an array. > ... > On the contrary, I would think that it fits nicely to "close the loop" > on the anyarray/anyelement feature set. OK, I hacked this together and it seems to behave at least as reasonably as 8.2 does. "8.3" here means HEAD + anynonarray + capturing concat operators. I used integer as an example of a type for which 8.2 has an implicit cast to text, and point as an example of a type for which it doesn't: Expression 8.3 8.2 text || texttext concat text concat text || 'unknown' text concat text concat text || text[] array concatarray concat text || non-text array error error text || non-text scalar text concat text concat [1] integer || integer error text concat integer || 'unknown'text concat text concat integer || integer[]array concatarray concat integer || non-integer arrayerror error integer || non-integer scalar error text concat [1] point || point error error point || 'unknown' text concat 'array value must start ...' point || point[]array concatarray concat point || non-point arrayerror error point || non-point scalar error error text[] || text[]array concatarray concat text[] || 'unknown' error error text[] || non-text arrayerror error text[] || non-text scalar error error [1] for types for which 8.2 has an implicit cast to text, else it fails. These are: bigint smallint integer oid real double precision numeric date time without time zone time with time zone timestamp without time zone timestamp with time zone interval (I was interested to find that there were cases where 8.2 would come out with the dreaded "array value must start with "{" or dimension information" error.) I think that the above chart is pretty defensible; the only cases that fail now where they worked before are concatenations where neither side is either text or an unadorned string literal. Frankly, I think this: catany=# select 3 || 0.4; ERROR: operator does not exist: integer || numeric is way preferable to this: regression=# select 3 || 0.4; ?column? -- 30.4 (1 row) which is what 8.2 does --- if you want text concatenation you should make at least *some* effort to signal that, like casting one side to text or at least quoting it. Run-together concatenations like catany=# select 'sin(' || 2 || ')'; ?column? -- sin(2) (1 row) will work as long as at least one of the first two concatenated items is textual or an unadorned string literal. Barring objections I'll clean this up and commit it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Simon Riggs wrote: > On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote: > > Florian G. Pflug wrote: > > > Work done so far: > > > - > > > .) Don't start autovacuum and bgwriter. > > > > Do table stats used by the planner get replicated on a PITR slave? I > > assume so, but if not, you would need autovac to do analyzes. > > The replication is an exact block-level replication of the master. We > can't write very much at all on the slave. Hmm, something to keep in mind is forcing cache invals when the master causes them (for example relation cache, catalog caches and plan caches). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote: > .) Added a new GUC operational_mode, which can be set to either > readwrite or readonly. If it is set to readwrite (the default), > postgres behaves as usual. All the following changes are only > in effect if operational_mode is set to readonly. Do we need this? We are already InArchiveRecovery. > .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR) > if postgre is not in readwrite mode. This macro protects the > following functions to make sure that no writes occur in > readonly mode. > SimpleLruWritePage, SLruPhysicalWritePage > EndPrepare, FinishPreparedTransaction > XLogInsert, XLogWrite, ShutdownXLog > CreateCheckpoint > MarkBufferDirty. These are Asserts? > .) All transactions are set to readonly mode (An implicit > SET TRANSACTION READONLY), and are not allowed to do > SET TRANSACTION READWRITE. OK > .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay > is started, and it takes over that role that bgwriter play in the > shutdown process. Autovacuum -> understood. What does bgreplay do? Why not just start bgwriter earlier and disable some of its other functionality while InRecovery? > .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that > is considered to be "later" than any other xid. So you are bumping FirstNormalTransactionId up by one for this? You're assuming then that we will "freeze" replay while we run a query? Otherwise doing this will mean the snapshot changes as a query executes. > .) A global ReadOnlySnapshot is maintained in shared memory. This is > copied into backend local memory by GetReadonlySnapshotData (which > replaces GetSnapshotData in readonly mode). >.) Crash recovery is not performed in readonly mode - instead, postgres > PANICs, and tells the DBA to restart in readwrite mode. Archive > recovery of course *will* be allowed, but I'm not that far yet. This is the very heart of the matter. This isn't just a technical issue, it goes to the heart of the use case for this feature. Can we recover while running queries? If not, how much time will we spend in replay mode v query mode? Will we be able to run long running queries *and* maintain a reasonable time to recover? Is this a mechanism for providing HA and additional query capacity, or is it just a mechanism for additional query capacity only? Those are open questions to which I don't have any answers yet myself. Will we switch back and forth between replay and query mode. Do we connect to the master, or to the slave? If we connect to the slave will we accept new queries when in replay mode and pause them before we switch back to query mode. > Open Problems: > -- >.) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome, > because callers usually call MarkBufferDirty from within a critical > section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g. > happens with my patch if you call nextval() in readonly mode. > Does anyone see a better solution then adding checks into > all callers that are not otherwise protected from being called > in readonly mode? Do we need to do this at all? >.) Since the slaves needs to track an Snapshot in shared memory, it cannot > resize that snapshot to accomodate however many concurrent transactions > might have been running on the master. My current plan is to detect if > that global snapshot overflows, and to lock out readonly queries on the > slave (and therefore remove the need of keeping the snapshot current) > until the number of active xids on the master has dropped below > max_connections on the slave. A warning will be written to the postgres > log that suggest that the DBA increases the max_connections value on > the slave. Sized according to max_connections on the master? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 17:14 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote: > > > Florian G. Pflug wrote: > > > > Work done so far: > > > > - > > > > .) Don't start autovacuum and bgwriter. > > > > > > Do table stats used by the planner get replicated on a PITR slave? I > > > assume so, but if not, you would need autovac to do analyzes. > > > > The replication is an exact block-level replication of the master. We > > can't write very much at all on the slave. > > Hmm, something to keep in mind is forcing cache invals when the master > causes them (for example relation cache, catalog caches and plan > caches). Many things will need to work radically differently. Best we think of this as Research rather than Development. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Alvaro Herrera wrote: Simon Riggs wrote: On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote: Florian G. Pflug wrote: Work done so far: - .) Don't start autovacuum and bgwriter. Do table stats used by the planner get replicated on a PITR slave? I assume so, but if not, you would need autovac to do analyzes. The replication is an exact block-level replication of the master. We can't write very much at all on the slave. Hmm, something to keep in mind is forcing cache invals when the master causes them (for example relation cache, catalog caches and plan caches). Perhaps if you are as PITR master and you have active readonly slaves then there should be a WAL record to note plan invalidations, etc? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] How do we create the releases?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Tuesday, June 05, 2007 10:28:58 +0300 Devrim GÜNDÜZ <[EMAIL PROTECTED]> wrote: > > Hi Marc, > > Is there a written procedure about creating tarballs? I'd like to start > working on 8.3 RPMs and I want to know what I should to to create a > tarball. Just a script ... relevant bits: /usr/bin/cvs -d /cvsroot -q export -rREL7_3_19 pgsql cd pgsql ./configure cd doc/src gmake postgres.tar.gz mv postgres.tar.gz .. gmake postgres.tar.gz mv postgres.tar.gz .. gmake man.tar.gz mv man.tar.gz .. cd sgml gmake HISTORY INSTALL mv -f HISTORY INSTALL ../../.. cd ../../.. gmake split-dist=yes dist gmake maintainer-clean - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGZzuZ4QvfyHIvDvMRAij1AKDWTwj6DZWthxvuEIZE6P1PoGDHcgCg2tV/ IR2CdNIokfs363FGifAklUY= =tcIx -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index
Joshua D. Drake wrote: Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Assume the following: index on: (id, adate) constraint CHECK(adate > '01-01-2007' AND adate < '04-01-2007'); The planner will not use the index listed above. For what? select adate from parent where adate = '01-25-2007' That's unsurprising. Searching with only a lower-order index column value seldom wins, 'cause you've got to scan the entire index. The constraint is irrelevant to this. I guess where I got confused is: http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html And explicitly: A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. Considering the paragraph from the documentation above, should we change the documentation? Joshua D. Drake Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: >> I guess where I got confused is: >> >> http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html >> >> And explicitly: >> >> A multicolumn B-tree index can be used with query conditions that >> involve any subset of the index's columns, but the index is most >> efficient when there are constraints on the leading (leftmost) columns. > Considering the paragraph from the documentation above, should we change > the documentation? That statement seems perfectly accurate to me. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: I guess where I got confused is: http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html And explicitly: A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. Considering the paragraph from the documentation above, should we change the documentation? That statement seems perfectly accurate to me. O.k. then perhaps I am being dense, but that statement says to me that the planner should be able to use the right element of a composite index but that it will not always do so. Considering an index of a,b if I search for b I would expect that the planner could use the index. Assuming of course that the planner would use the same index if it was just b. Further, I would expect a smaller chance of it using b if the index was a,c,b but that it "might" still use it. Is that not the case? Should I expect that even in the simplest of cases that we will not use an index unless it is *the* leftmost element? Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> That statement seems perfectly accurate to me. > Considering an index of a,b if I search for b I would expect that the > planner could use the index. It can. Whether it will think that's a good idea is another question entirely, and one that seems a bit beyond the scope of the discussion you're mentioning. Try forcing the issue with enable_seqscan, and see what sort of estimated and actual costs you get ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
On Wed, 2007-06-06 at 22:36 +0100, Simon Riggs wrote: > > .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that > > is considered to be "later" than any other xid. > > So you are bumping FirstNormalTransactionId up by one for this? > > You're assuming then that we will "freeze" replay while we run a query? > Otherwise doing this will mean the snapshot changes as a query executes. Is it possible to put a normal xmax for the snapshot? It wouldn't be a real transaction on the slave, and also the master will use that ID for a real transaction itself. However, I don't see a real problem on the slave because it would only be used for the purpose of the snapshot we need at that moment. Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Vacuuming anything zeroes shared table stats
Is vacuuming any table supposed to zero the statistics for all shared tables? Doesn't that have implications for autovacuum? The example below is in 8.2.4 but I'm seeing similar behavior in 8.1.9 and 8.3devel. Additionally, in 8.3devel doing anything that queries or modifies a shared table seems to zero the statistics for all shared tables. test=> select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del test-> from pg_stat_all_tables test-> where relid in (select oid from pg_class where relisshared) test-> order by relname; relname | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del --+--+--+---+---+--- pg_auth_members | 25 |3 | 1 | 0 | 1 pg_authid|7 | 40 | 0 | 0 | 0 pg_database |2 |7 | 0 | 0 | 0 pg_pltemplate|2 |0 | 0 | 0 | 0 pg_shdepend |0 |4 | 2 | 0 | 2 pg_shdescription |2 |0 | 0 | 0 | 0 pg_tablespace|2 |0 | 0 | 0 | 0 pg_toast_1260|1 |0 | 0 | 0 | 0 pg_toast_1262|1 |0 | 0 | 0 | 0 pg_toast_2396|1 |0 | 0 | 0 | 0 (10 rows) test=> vacuum foo; VACUUM test=> select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del test-> from pg_stat_all_tables test-> where relid in (select oid from pg_class where relisshared) test-> order by relname; relname | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del --+--+--+---+---+--- pg_auth_members |0 |0 | 0 | 0 | 0 pg_authid|0 |0 | 0 | 0 | 0 pg_database |1 |0 | 0 | 0 | 0 pg_pltemplate|0 |0 | 0 | 0 | 0 pg_shdepend |0 |0 | 0 | 0 | 0 pg_shdescription |0 |0 | 0 | 0 | 0 pg_tablespace|0 |0 | 0 | 0 | 0 pg_toast_1260|0 |0 | 0 | 0 | 0 pg_toast_1262|0 |0 | 0 | 0 | 0 pg_toast_2396|0 |0 | 0 | 0 | 0 (10 rows) -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuuming anything zeroes shared table stats
Michael Fuhr wrote: > Is vacuuming any table supposed to zero the statistics for all > shared tables? Huh, certainly not. In any case, I think the problem may be related to the fact that stats for shared tables are kept in a separate hash from regular tables. I'll investigate the issue tomorrow -- thanks for reporting. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "The Gord often wonders why people threaten never to come back after they've been told never to return" (www.actsofgord.com) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [COMMITTERS] pgsql: Create a GUC parametertemp_tablespacesthat allows selection of
On 6/4/07, Tom Lane <[EMAIL PROTECTED]> wrote: Perhaps a reasonable compromise could work like this: at the first point in a transaction where a temp file is created, choose a random list element, and thereafter advance cyclically for the duration of that transaction. This ensures within-transaction spread-out while still having some randomness across backends. The reason I'm thinking per-transaction is that we could tie this to setting up a cached list of tablespace OIDs, which would avoid the overhead of repeat parsing and tablespace validity checking. We had rejected using a long-lived cache because of the problem of tablespaces getting dropped, but I think one that lasts only across a transaction would be OK. And the reason I'm thinking a cache is important is that if you really want to get any win from this idea, you need to spread the temp files across tablespaces *per file*, which is not the way it works now. ok. are you doing this? or can i prepare a patch that implements this? i guess we can allocate the memory for the list in TopTransactionContext. -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc
On Sat, Jun 02, 2007 at 01:37:19PM +, Tasneem Memon wrote: > We can make the system ask the user as to what membership degree s/he wants > to get the values, but we don?t want to make the system interactive, where a > user gives a membership degree value of his/her choice. These operators are > supposed to work just like the other operators in SQL.. you just put them in > the query and get a result. I have put 0.8 because all the case studies I > have made for the NEAR, 0.8 seems to be the best choice.. 0.9 narrows the > range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, > these values will no more seem to be irrelevant when we haven?t got any > values till the md 0.8, so the operator fetches them when they are the > NEARest. While having them function just like any other operator is good, it seems like you're making quite a bit of an assumption for the user; namely that you know what their data looks like better than they might. Is it not possible that someone would come along with a dataset that looks different enough from your test cases so that the values you picked wouldn't work? -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpqZoDw3txQR.pgp Description: PGP signature
[HACKERS] is_array_type vs type_is_array
Was there some change in functionality reason for renaming is_array_type to type_is_array? It broke compilation of fulldisjunctions, which I build and run regression tests on in my sandbox to keep it getting too horribly broken with respect to current HEAD. I got it to build and pass its regression tests by adding this: + #if !defined(is_array_type) && defined(type_is_array) + #define is_array_type(x) type_is_array(x) + #endif to the beginning of the one file which uses is_array_type. Is this reasonable to send back to the fulldisjunctions maintainer, or is there some subtle change that prompted the name change to make uses of this function immediately apparent? -- Ducharme's Axiom: If you view your problem closely enough you will recognize yourself as part of the problem. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] is_array_type vs type_is_array
Jeremy Drake <[EMAIL PROTECTED]> writes: > Was there some change in functionality reason for renaming is_array_type > to type_is_array? Just to sync style with type_is_enum ... there were more of the latter than the former. > It broke compilation of fulldisjunctions, Sorry, but we change internal APIs every day, and twice on Sundays. Deal with it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] is_array_type vs type_is_array
On Thu, 7 Jun 2007, Tom Lane wrote: > Jeremy Drake <[EMAIL PROTECTED]> writes: > > Was there some change in functionality reason for renaming is_array_type > > to type_is_array? > > Just to sync style with type_is_enum ... there were more of the latter > than the former. OK, so it is safe to just #define one to the other, right? > > It broke compilation of fulldisjunctions, > > Sorry, but we change internal APIs every day, and twice on Sundays. > Deal with it. This is why I build fulldisjunctions in my sandbox, because when I decided to try it out one time, I could not get it to compile. After much effort getting it happy with HEAD and sending the changes back to the maintainer of fulldisjunctions, I thought a good thing for me to contribute is to make sure it continues to compile cleanly against HEAD and send patches when it breaks. I just wanted to make sure that the functionality of this function did not change in some way that I did not see before sending a patch to the maintainer of fulldisjunctions. "Deal with it" was not the response I was expecting, as that is exactly what I am trying to do ;) -- It is generally agreed that "Hello" is an appropriate greeting because if you entered a room and said "Goodbye," it could confuse a lot of people. -- Dolph Sharp, "I'm O.K., You're Not So Hot" ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org