Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Zeugswetter Andreas ADI SD

 For example in 8.2 this is mapped to array_prepend:
 
 regression=# select 'x'::text || array['aa','bb','cc'];
?column?
 --
  {x,aa,bb,cc}
 (1 row)
 
 but with the experimental code you get textcat:
 
 catany=# select 'x'::text || array['aa','bb','cc'];
   ?column?
 -
  x{aa,bb,cc}
 (1 row)

This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Pavel Stehule


 For example in 8.2 this is mapped to array_prepend:

 regression=# select 'x'::text || array['aa','bb','cc'];
?column?
 --
  {x,aa,bb,cc}
 (1 row)

 but with the experimental code you get textcat:

 catany=# select 'x'::text || array['aa','bb','cc'];
   ?column?
 -
  x{aa,bb,cc}
 (1 row)

This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas



I thing so current behave is more intuitive and practical. Result
x{aa,bb,cc} is nonsens. Array concation have to have higher priority
than text concation.

Pavel

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD

  No, you misunderstood. Bruce was suggesting changing the target to
512.
  That means if a row is wider than ~2k, toaster will try to toast
until 
  the base row is
  ~512 bytes. I would not do that part for 8.3. 
 
 OK, what do you suggest for 8.3?  Attached are my suggestion 
 to use 512 and a 4k chunk size, which I think means that 2.7k 
 is the worst values that has a loss of around 25%.

Oh, so I misunderstood you also. You are suggesting two changes:
TOAST_TUPLES_PER_PAGE   16
EXTERN_TUPLES_PER_PAGE  2

While I agree, that 2 might be a good compromise with low risc for now,
I think
that toasting all rows down to ~512 bytes is too narrowly targeted at
not reading wider columns.

When suggesting a new target, I interpreted you like so:
#define TOAST_TUPLES_PER_PAGE   4
#define TOAST_TUPLE_TARGET  \
MAXALIGN_DOWN((BLCKSZ - \
   MAXALIGN(sizeof(PageHeaderData) +
(TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
  / 16)
So we would only toast rows wider than 2k, but once toasting, toast the
base row down to 512.  

My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
else would need extensive performance testing.
#define TOAST_TUPLES_PER_PAGE   4

My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
Split data wider than a page into page sized chunks as long as they fill
whole pages.
Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
This would not waste more space than currently, but improve performance
for very wide columns.

I can try to do a patch if you think that is a good idea, can't do a lot
of testing though.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Heikki Linnakangas
I'm again looking at way the GUC variables work in load distributed 
checkpoints patch. We've discussed them a lot already, but I don't think 
they're still quite right.


Write-phase
---
I like the way the write-phase is controlled in general. Writes are 
throttled so that we spend the specified percentage of checkpoint 
interval doing the writes. But we always write at a specified minimum 
rate to avoid spreading out the writes unnecessarily when there's little 
work to do.


The original patch uses bgwriter_all_max_pages to set the minimum rate. 
I think we should have a separate variable, checkpoint_write_min_rate, 
in KB/s, instead.


Nap phase
-
This is trickier. The purpose of the sleep between writes and fsyncs is 
to give the OS a chance to flush the pages to disk in it's own pace, 
hopefully limiting the affect on concurrent activity. The sleep 
shouldn't last too long, because any concurrent activity can be dirtying 
and writing more pages, and we might end up fsyncing more than necessary 
which is bad for performance. The optimal delay depends on many factors, 
but I believe it's somewhere between 0-30 seconds in any reasonable system.


In the current patch, the duration of the sleep between the write and 
sync phases is controlled as a percentage of checkpoint interval. Given 
that the optimal delay is in the range of seconds, and 
checkpoint_timeout can be up to 60 minutes, the useful values of that 
percentage would be very small, like 0.5% or even less. Furthermore, the 
optimal value doesn't depend that much on the checkpoint interval, it's 
more dependent on your OS and memory configuration.


We should therefore give the delay as a number of seconds instead of as 
a percentage of checkpoint interval.


Sync phase
--
This is also tricky. As with the nap phase, we don't want to spend too 
much time fsyncing, because concurrent activity will write more dirty 
pages and we might just end up doing more work.


And we don't know how much work an fsync performs. The patch uses the 
file size as a measure of that, but as we discussed that doesn't 
necessarily have anything to do with reality. fsyncing a 1GB file with 
one dirty block isn't any more expensive than fsyncing a file with a 
single block.


Another problem is the granularity of an fsync. If we fsync a 1GB file 
that's full of dirty pages, we can't limit the affect on other activity. 
The best we can do is to sleep between fsyncs, but sleeping more than a 
few seconds is hardly going to be useful, no matter how bad an I/O storm 
each fsync causes.


Because of the above, I'm thinking we should ditch the 
checkpoint_sync_percentage variable, in favor of:

checkpoint_fsync_period # duration of the fsync phase, in seconds
checkpoint_fsync_delay  # max. sleep between fsyncs, in milliseconds


In all phases, the normal bgwriter activities are performed: 
lru-cleaning and switching xlog segments if archive_timeout expires. If 
a new checkpoint request arrives while the previous one is still in 
progress, we skip all the delays and finish the previous checkpoint as 
soon as possible.



GUC summary and suggested default values

checkpoint_write_percent = 50 		# % of checkpoint interval to spread out 
writes
checkpoint_write_min_rate = 1000	# minimum I/O rate to write dirty 
buffers at checkpoint (KB/s)
checkpoint_nap_duration = 2 		# delay between write and sync phase, in 
seconds

checkpoint_fsync_period = 30# duration of the sync phase, in seconds
checkpoint_fsync_delay = 500# max. delay between fsyncs

I don't like adding that many GUC variables, but I don't really see a 
way to tune them automatically. Maybe we could just hard-code the last 
one, it doesn't seem that critical, but that still leaves us 4 variables.


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 GUC summary and suggested default values
 
 checkpoint_write_percent = 50 # % of checkpoint interval to 
 spread out writes
 checkpoint_write_min_rate = 1000  # minimum I/O rate to write dirty
 buffers at checkpoint (KB/s)

I don't understand why this is a min_rate rather than a max_rate.


 checkpoint_nap_duration = 2   # delay between write and sync phase, 
 in seconds

Not a comment on the choice of guc parameters, but don't we expect useful
values of this to be much closer to 30 than 0? I understand it might not be
exactly 30.

Actually, it's not so much whether there's any write traffic to the data files
during the nap that matters, it's whether there's more traffic during the nap
than during the 30s or so prior to the nap. As long as it's a steady-state
condition it shouldn't matter how long we wait, should it?

 checkpoint_fsync_period = 30  # duration of the sync phase, in seconds
 checkpoint_fsync_delay = 500  # max. delay between fsyncs

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 In the long run maybe we should choose some other name for the
 array_append and array_prepend operators to avoid the confusion with
 concatenation.  It seems to me that concatenation normally implies
 stringing together similar objects, which these two operators
 definitely don't do, and so you could argue that || was a bad name
 for them from the get-go.

 Originally I saw this situation as as requiring the concatenation 
 operator per SQL 2003:

Maybe I am missing something, but the only such construct I see in
SQL2003 is concatenation of arrays of equal rank.  There is nothing
corresponding to array_prepend or array_append.

I do have a plan B if people don't want to rename the operators, though.
It looks to me like we could eliminate the conflict if we invented a new
polymorphic pseudotype called anynonarray or some such, which would
act like anyelement *except* it would not match an array.  Then,
declaring the capturing operators as text||anynonarray and
anynonarray||text would prevent them from matching any case where either
side was known to be an array type.  But they would (I think) still win
out in cases such as scalar || 'unknown literal'.  The end result would
be that concatenations involving a known-array value would be array
concatenation, but you could force them to be text concatenation, if
that's what you wanted, by explicitly casting the array value(s) to text.

I was a bit hesitant to propose this since I couldn't immediately think
of any other use-case for such a pseudotype.  It's not a huge amount of
added code (cf. anyenum) but it's definitely a visible wart on the type
system.  Comments?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Alvaro Herrera
Tom Lane wrote:

 I do have a plan B if people don't want to rename the operators, though.
 It looks to me like we could eliminate the conflict if we invented a new
 polymorphic pseudotype called anynonarray or some such, which would
 act like anyelement *except* it would not match an array.  Then,
 declaring the capturing operators as text||anynonarray and
 anynonarray||text would prevent them from matching any case where either
 side was known to be an array type.  But they would (I think) still win
 out in cases such as scalar || 'unknown literal'.  The end result would
 be that concatenations involving a known-array value would be array
 concatenation, but you could force them to be text concatenation, if
 that's what you wanted, by explicitly casting the array value(s) to text.
 
 I was a bit hesitant to propose this since I couldn't immediately think
 of any other use-case for such a pseudotype.  It's not a huge amount of
 added code (cf. anyenum) but it's definitely a visible wart on the type
 system.  Comments?

On the contrary, I would think that it fits nicely to close the loop
on the anyarray/anyelement feature set.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Bruce Momjian
Zeugswetter Andreas ADI SD wrote:
 
   No, you misunderstood. Bruce was suggesting changing the target to
 512.
   That means if a row is wider than ~2k, toaster will try to toast
 until 
   the base row is
   ~512 bytes. I would not do that part for 8.3. 
  
  OK, what do you suggest for 8.3?  Attached are my suggestion 
  to use 512 and a 4k chunk size, which I think means that 2.7k 
  is the worst values that has a loss of around 25%.
 
 Oh, so I misunderstood you also. You are suggesting two changes:
 TOAST_TUPLES_PER_PAGE 16
 EXTERN_TUPLES_PER_PAGE2

Right.

 While I agree, that 2 might be a good compromise with low risc for now,
 I think
 that toasting all rows down to ~512 bytes is too narrowly targeted at
 not reading wider columns.

Well, it is summarized here:

http://momjian.us/expire/TOAST/SUMMARY.html

It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
like a good compromise.

 When suggesting a new target, I interpreted you like so:
 #define TOAST_TUPLES_PER_PAGE 4
 #define TOAST_TUPLE_TARGET\
   MAXALIGN_DOWN((BLCKSZ - \
  MAXALIGN(sizeof(PageHeaderData) +
 (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
 / 16)
 So we would only toast rows wider than 2k, but once toasting, toast the
 base row down to 512.  

That is certainly not my intent, and I don't see how you would get the
2k number from that macro.  I think you are looking at 8.2 and not CVS
HEAD.  CVS HEAD has:

#define TOAST_TUPLE_TARGET  TOAST_TUPLE_THRESHOLD

 My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
 else would need extensive performance testing.
 #define TOAST_TUPLES_PER_PAGE 4
 
 My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
 Split data wider than a page into page sized chunks as long as they fill
 whole pages.
 Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
 This would not waste more space than currently, but improve performance
 for very wide columns.
 
 I can try to do a patch if you think that is a good idea, can't do a lot
 of testing though.

None of this spliting is going to happen for 8.3.  The question what
changes we can make for 8.3, if any.


-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 GUC summary and suggested default values
 
 checkpoint_write_percent = 50 # % of checkpoint interval to 
 spread out 
 writes
 checkpoint_write_min_rate = 1000  # minimum I/O rate to write dirty 
 buffers at checkpoint (KB/s)
 checkpoint_nap_duration = 2   # delay between write and sync phase, 
 in 
 seconds
 checkpoint_fsync_period = 30  # duration of the sync phase, in seconds
 checkpoint_fsync_delay = 500  # max. delay between fsyncs

 I don't like adding that many GUC variables, but I don't really see a 
 way to tune them automatically.

If we don't know how to tune them, how will the users know?  Having to
add that many variables to control one feature says to me that we don't
understand the feature.

Perhaps what we need is to think about how it can auto-tune itself.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I do have a plan B if people don't want to rename the operators, though.
 It looks to me like we could eliminate the conflict if we invented a new
 polymorphic pseudotype called anynonarray or some such, which would
 act like anyelement *except* it would not match an array.
 ...
 I was a bit hesitant to propose this since I couldn't immediately think
 of any other use-case for such a pseudotype.  It's not a huge amount of
 added code (cf. anyenum) but it's definitely a visible wart on the type
 system.  Comments?

 On the contrary, I would think that it fits nicely to close the loop
 on the anyarray/anyelement feature set.

OK, I'll go code this up and verify that it behaves like I think it will...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Well, it is summarized here:

   http://momjian.us/expire/TOAST/SUMMARY.html

 It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
 like a good compromise.

Is this still testing with all data fitting in RAM?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD

  While I agree, that 2 might be a good compromise with low risc for 
  now, I think that toasting all rows down to ~512 bytes is too
narrowly 
  targeted at not reading wider columns.
 
 Well, it is summarized here:
 
   http://momjian.us/expire/TOAST/SUMMARY.html
 
 It made non-TOAST access 2x faster, but TOAST 7x slower, and 
 that seemed like a good compromise.

Yes, my argument was that I expect that in the up to 1-2k range more use
cases will suffer the 7x slowdown, than see the speedup. But the ratio
certainly is hard to judge, and you may well be right.

e.g. for me TOAST_TUPLES_PER_PAGE 8 would be ok, I have base row widths
of ~700 in 2 tables that would suffer if further toasted, but none in
the 1k - 2k range.

I wonder whether this threshold isn't rather absolute, thus the 32k
pagesize users should probably use 32.

  When suggesting a new target, I interpreted you like so:
  #define TOAST_TUPLES_PER_PAGE   4
  #define TOAST_TUPLE_TARGET  \
  MAXALIGN_DOWN((BLCKSZ - \
 MAXALIGN(sizeof(PageHeaderData) +
  (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
/ 16)
  So we would only toast rows wider than 2k, but once toasting, toast 
  the base row down to 512.
 
 That is certainly not my intent, and I don't see how you 
 would get the 2k number from that macro.  I think you are

the ~2k come from TOAST_TUPLE_THRESHOLD 

 looking at 8.2 and not CVS HEAD.  CVS HEAD has:
 
   #define TOAST_TUPLE_TARGET  TOAST_TUPLE_THRESHOLD

Nope, I meant what I said. To only change the target you would replace
above macro for TOAST_TUPLE_TARGET.
But I also don't see how this would be good.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] elog.c logic bug?

2007-06-06 Thread Andrew Dunstan


I have just been staring for some time at the logic in 
src/backend/utils/error/elog.c:send_message_to_server_log(), which 
contains this fragment near the end:




   /* Write to stderr, if enabled */
   if ((Log_destination  LOG_DESTINATION_STDERR) || whereToSendOutput 
== DestDebug)

   {
#ifdef WIN32

   /*
* In a win32 service environment, there is no usable stderr. 
Capture

* anything going there and write it to the eventlog instead.
*
* If stderr redirection is active, it's ok to write to stderr 
because
* that's really a pipe to the syslogger process. Unless we're 
in the

* postmaster, and the syslogger process isn't started yet.
*/
   if ((!Redirect_stderr || am_syslogger || (!IsUnderPostmaster  
SysLoggerPID==0))  pgwin32_is_service())

   write_eventlog(edata-elevel, buf.data);
   else
#endif
   fprintf(stderr, %s, buf.data);
   }

   /* If in the syslogger process, try to write messages direct to file */
   if (am_syslogger)
   write_syslogger_file(buf.data, buf.len);

ISTM that this is a bug - the last statement should be inside the STDERR 
block above, the last part of which would then read:


   /* If in the syslogger process, try to write messages direct to file */
   if (am_syslogger)
   write_syslogger_file(buf.data, buf.len);
  else
   fprintf(stderr, %s, buf.data);

If not I have missed something - why would the syslogger be trying to 
write to its output (possibly for the second time) regardless of what 
Log_destination is set to?


cheers

andrew






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] elog.c logic bug?

2007-06-06 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 If not I have missed something - why would the syslogger be trying to 
 write to its output (possibly for the second time) regardless of what 
 Log_destination is set to?

You're mistaken: within the syslogger process, stderr doesn't point to
the same place as the target file (it's normally the same as the
original postmaster stderr).  The reason the code is set up to try to
write both stderr and the target file is to maximize the chance that an
internally generated error in syslogger will get reported *someplace*.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis
On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:
.) Since the slaves needs to track an Snapshot in shared memory, it cannot
   resize that snapshot to accomodate however many concurrent transactions
   might have been running on the master. My current plan is to detect if
   that global snapshot overflows, and to lock out readonly queries on the
   slave (and therefore remove the need of keeping the snapshot current)
   until the number of active xids on the master has dropped below
   max_connections on the slave. A warning will be written to the postgres
   log that suggest that the DBA increases the max_connections value on
   the slave.
 

If we did lock the slave while waiting for transactions to complete on
the master, we'd need to document some stronger warnings against idle
transactions so that administrators could notice and correct the
problem.

Are you referring to the size of the xip array being a problem? Would it
help to tie the size of the xip array to max_connections? I understand
that max_connections might be greater on the master, but maybe something
similar? 

Regards,
Jeff Davis



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Florian G. Pflug

Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


Yes - everything that get wal-logged on the master gets replicated to
the slave. In my design, it isn't possible to do analyze on the slave,
because all datafiles are strictly readonly (well, with the small
exception of hit-bit updates actually).

greetings, Florian Pflug



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Florian G. Pflug

Jeff Davis wrote:

On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:

   .) Since the slaves needs to track an Snapshot in shared memory, it cannot
  resize that snapshot to accomodate however many concurrent transactions
  might have been running on the master. My current plan is to detect if
  that global snapshot overflows, and to lock out readonly queries on the
  slave (and therefore remove the need of keeping the snapshot current)
  until the number of active xids on the master has dropped below
  max_connections on the slave. A warning will be written to the postgres
  log that suggest that the DBA increases the max_connections value on
  the slave.


If we did lock the slave while waiting for transactions to complete on
the master, we'd need to document some stronger warnings against idle
transactions so that administrators could notice and correct the
problem.


It's not exactly locking until it complete on the master, it's locking
the slave until we reach a position in the wal on the slave with less
than max_connections concurrent transactions. But yes, I agree, this
will need to be documented.


Are you referring to the size of the xip array being a problem? Would it
help to tie the size of the xip array to max_connections? I understand
that max_connections might be greater on the master, but maybe something
similar? 


Thats what I currently do - the xip array on the slave is sized to
hold max_connections entries (Actually, it's max_connections +
max_prepared_xacts I think). The problem occurs exactly if those
values are set too small on the slave - and since shared mem
objects are not resizeable, I don't see how the slave can handle
an xip overflow gracefully other than by not publishing the
information in shared memory as long as it doesn't fit there.

On a further thinking - maybe locking out transactions isn't even
necessary - they would just continue to see the old global snapshot,
so time wouldn't advance for them until the number of concurrent
transactions decreases again.

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Joe Conway

Tom Lane wrote:

Joe Conway [EMAIL PROTECTED] writes:

Tom Lane wrote:

In the long run maybe we should choose some other name for the
array_append and array_prepend operators to avoid the confusion with
concatenation.  It seems to me that concatenation normally implies
stringing together similar objects, which these two operators
definitely don't do, and so you could argue that || was a bad name
for them from the get-go.


Originally I saw this situation as as requiring the concatenation 
operator per SQL 2003:


Maybe I am missing something, but the only such construct I see in
SQL2003 is concatenation of arrays of equal rank.  There is nothing
corresponding to array_prepend or array_append.


Well, I've never claimed to be particularly good at interpreting the SQL 
spec, but as an example...


array concatenation ::=
   array value expression 1 || array primary
array primary ::=
value expression primary ::=
nonparenthesized value expression primary ::=
unsigned value specification ::=
unsigned literal ::=
unsigned numeric literal

Doesn't this mean that array concatenation should include things like:

   array value expression || unsigned numeric literal

e.g.

  ARRAY[1,2,3] || 42
?


I do have a plan B if people don't want to rename the operators, though.
It looks to me like we could eliminate the conflict if we invented a new
polymorphic pseudotype called anynonarray or some such, which would
act like anyelement *except* it would not match an array.  Then,
declaring the capturing operators as text||anynonarray and
anynonarray||text would prevent them from matching any case where either
side was known to be an array type.  But they would (I think) still win
out in cases such as scalar || 'unknown literal'.  The end result would
be that concatenations involving a known-array value would be array
concatenation, but you could force them to be text concatenation, if
that's what you wanted, by explicitly casting the array value(s) to text.


That sounds reasonable to me.

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Greg Smith

On Wed, 6 Jun 2007, Tom Lane wrote:


If we don't know how to tune them, how will the users know?


I can tell you a good starting set for them to on a Linux system, but you 
first have to let me know how much memory is in the OS buffer cache, the 
typical I/O rate the disks can support, how many buffers are expected to 
be written out by BGW/other backends at heaviest load, and the current 
setting for /proc/sys/vm/dirty_background_ratio.  It's not a coincidence 
that there are patches applied to 8.3 or in the queue to measure all of 
the Postgres internals involved in that computation; I've been picking 
away at the edges of this problem.


Getting this sort of tuning right takes that level of information about 
the underlying system.  If there's a way to internally auto-tune the 
values this patch operates on (which I haven't found despite months of 
trying), it would be in the form of some sort of measurement/feedback loop 
based on how fast data is being written out.  There really are way too 
many things involved to try and tune it based on anything else; the 
underlying OS/hardware mechanisms that determine how this will go are 
complicated enough that it might as well be a black box for most people.


One of the things I've been fiddling with the design of is a testing 
program that simulates database activity at checkpoint time under load. 
I think running some tests like that is the most straightforward way to 
generate useful values for these tunables; it's much harder to try and 
determine them from within the backends because there's so much going on 
to keep track of.


I view the LDC mechanism as being in the same state right now as the 
background writer:  there are a lot of complicated knobs to tweak, they 
all do *something* useful for someone, and eliminating them will require a 
data-collection process across a much wider sample of data than can be 
collected quickly.  If I had to make a guess how this will end up, I'd 
expect there to be more knobs in LDC than everyone would like for the 8.3 
release, along with fairly verbose logging of what is happening at 
checkpoint time (that's why I've been nudging development in that area, 
along with making logs easier to aggregate).  Collect up enough of that 
information, then you're in a position to talk about useful automatic 
tuning--right around the 8.4 timeframe I suspect.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis
On Wed, 2007-06-06 at 19:25 +0200, Florian G. Pflug wrote:
 Thats what I currently do - the xip array on the slave is sized to
 hold max_connections entries (Actually, it's max_connections +
 max_prepared_xacts I think). The problem occurs exactly if those
 values are set too small on the slave - and since shared mem
 objects are not resizeable, I don't see how the slave can handle
 an xip overflow gracefully other than by not publishing the
 information in shared memory as long as it doesn't fit there.
 

That seems like a very minor issue then. It's not unreasonable to expect
that the PITR slave is configured very similarly to the master. You may
even want to require it (if there are other reasons, too).

 On a further thinking - maybe locking out transactions isn't even
 necessary - they would just continue to see the old global snapshot,
 so time wouldn't advance for them until the number of concurrent
 transactions decreases again.

That sounds better than locking out all reads. Perhaps still a warning
in the logs though. If the situation you describe happens, the
administrator probably needs to know about it.

Regards,
Jeff Davis



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Greg Smith

On Wed, 6 Jun 2007, Heikki Linnakangas wrote:

The original patch uses bgwriter_all_max_pages to set the minimum rate. I 
think we should have a separate variable, checkpoint_write_min_rate, in KB/s, 
instead.


Completely agreed.  There shouldn't be any coupling with the background 
writer parameters, which may be set for a completely different set of 
priorities than the checkpoint has.  I have to look at this code again to 
see why it's a min_rate instead of a max, that seems a little weird.


Nap phase:  We should therefore give the delay as a number of seconds 
instead of as a percentage of checkpoint interval.


Again, the setting here should be completely decoupled from another GUC 
like the interval.  My main complaint with the original form of this patch 
was how much it tried to syncronize the process with the interval; since I 
don't even have a system where that value is set to something, because 
it's all segment based instead, that whole idea was incompatible.


The original patch tried to spread the load out as evenly as possible over 
the time available.  I much prefer thinking in terms of getting it done as 
quickly as possible while trying to bound the I/O storm.


And we don't know how much work an fsync performs. The patch uses the file 
size as a measure of that, but as we discussed that doesn't necessarily have 
anything to do with reality. fsyncing a 1GB file with one dirty block isn't 
any more expensive than fsyncing a file with a single block.


On top of that, if you have a system with a write cache, the time an fsync 
takes can greatly depend on how full it is at the time, which there is no 
way to measure or even model easily.


Is there any way to track how many dirty blocks went into each file during 
the checkpoint write?  That's your best bet for guessing how long the 
fsync will take.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] elog.c logic bug?

2007-06-06 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  
If not I have missed something - why would the syslogger be trying to 
write to its output (possibly for the second time) regardless of what 
Log_destination is set to?



You're mistaken: within the syslogger process, stderr doesn't point to
the same place as the target file (it's normally the same as the
original postmaster stderr).  The reason the code is set up to try to
write both stderr and the target file is to maximize the chance that an
internally generated error in syslogger will get reported *someplace*.


  


OK, thanks, I'll try to make that a bit clearer in a comment.

cheers

andrew


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] msvc, build and install with cygwin in the PATH

2007-06-06 Thread Hannes Eder

Magnus Hagander wrote:
Hannes Eder wrote:
 Is it worth doing this the Perl-way and using File::Find? If so, I 
can

 work an a patch for that.

 It's certainly cleaner that way, but I don't find it a major issue. 
But I'd

 rather see that fix than the other one.

Here we go. See attached patch. Your comments are welcome.

Hannes.

*** ..\pgsql-cvshead\src\tools\msvc\Install.pm  Mo Mai 14 16:36:10 2007
--- src\tools\msvc\Install.pm   Mi Jun  6 20:39:47 2007
***
*** 10,15 
--- 10,18 
  use Carp;
  use File::Basename;
  use File::Copy;
+ use File::Find;
+ use File::Glob;
+ use File::Spec;
  
  use Exporter;
  our (@ISA,@EXPORT_OK);
***
*** 99,104 
--- 102,142 
  print \n;
  }
  
+ sub FindFiles
+ {
+ my $spec = shift;
+ my $nonrecursive = shift;
+ my $pat = basename($spec);
+ my $dir = dirname($spec);
+ 
+ if ($dir eq '') { $dir = '.'; }
+ 
+ -d $dir || croak Could not list directory $dir: $!\n;
+ 
+ if ($nonrecursive)
+ {
+ return glob($spec);
+ }
+ 
+ # borrowed from File::DosGlob
+ # escape regex metachars but not glob chars
+ $pat =~ s:([].+^\-\${}[|]):\\$1:g;
+ # and convert DOS-style wildcards to regex
+ $pat =~ s/\*/.*/g;
+ $pat =~ s/\?/.?/g;
+ 
+ $pat = '^' . $pat . '\z';
+ 
+ my @res;
+ find(
+ {
+ wanted = sub { /$pat/s  push (@res, 
File::Spec-canonpath($File::Find::name)); }
+ },
+ $dir
+ );
+ return @res;
+ }
+ 
  sub CopySetOfFiles
  {
  my $what = shift;
***
*** 106,126 
  my $target = shift;
  my $silent = shift;
  my $norecurse = shift;
- my $D;
  
- my $subdirs = $norecurse?'':'/s';
  print Copying $what unless ($silent);
! open($D, dir /b $subdirs $spec |) || croak Could not list $spec\n;
! while ($D)
  {
- chomp;
  next if /regress/; # Skip temporary install in regression subdir
! my $tgt = $target . basename($_);
  print .;
! my $src = $norecurse?(dirname($spec) . '/' . $_):$_;
! copy($src, $tgt) || croak Could not copy $src: $!\n;
  }
! close($D);
  print \n;
  }
  
--- 144,161 
  my $target = shift;
  my $silent = shift;
  my $norecurse = shift;
  
  print Copying $what unless ($silent);
! 
! foreach (FindFiles($spec, $norecurse))
  {
  next if /regress/; # Skip temporary install in regression subdir
! my $src = $_;
! my $tgt = $target . basename($src);
  print .;
! copy($src, $tgt) || croak Could not copy $src to $tgt: $!\n;
  }
! 
  print \n;
  }
  
***
*** 371,395 
  {
  my $target = shift;
  my $nlspath = shift;
- my $D;
  
  print Installing NLS files...;
  EnsureDirectories($target, share/locale);
! open($D,dir /b /s nls.mk|) || croak Could not list nls.mk\n;
! while ($D)
  {
- chomp;
  s/nls.mk/po/;
  my $dir = $_;
  next unless ($dir =~ /([^\\]+)\\po$/);
  my $prgm = $1;
  $prgm = 'postgres' if ($prgm eq 'backend');
- my $E;
- open($E,dir /b $dir\\*.po|) || croak Could not list contents of 
$_\n;
  
! while ($E)
  {
- chomp;
  my $lang;
  next unless /^(.*)\.po/;
  $lang = $1;
--- 406,425 
  {
  my $target = shift;
  my $nlspath = shift;
  
  print Installing NLS files...;
  EnsureDirectories($target, share/locale);
! 
! foreach (FindFiles(nls.mk))
  {
  s/nls.mk/po/;
  my $dir = $_;
  next unless ($dir =~ /([^\\]+)\\po$/);
  my $prgm = $1;
  $prgm = 'postgres' if ($prgm eq 'backend');
  
! foreach (FindFiles($dir\\*.po, 1))
  {
  my $lang;
  next unless /^(.*)\.po/;
  $lang = $1;
***
*** 401,409 
 croak(Could not run msgfmt on $dir\\$_);
  print .;
  }
- close($E);
  }
! close($D);
  print \n;
  }
  
--- 431,438 
 croak(Could not run msgfmt on $dir\\$_);
  print .;
  }
  }
! 
  print \n;
  }
  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Maybe I am missing something, but the only such construct I see in
 SQL2003 is concatenation of arrays of equal rank.  There is nothing
 corresponding to array_prepend or array_append.

 Well, I've never claimed to be particularly good at interpreting the SQL 
 spec, but as an example...

 array concatenation ::=
 array value expression 1 || array primary
 array primary ::=
 value expression primary ::=
 nonparenthesized value expression primary ::=
 unsigned value specification ::=
 unsigned literal ::=
  unsigned numeric literal

 Doesn't this mean that array concatenation should include things like:
 array value expression || unsigned numeric literal

No, because the first syntax rule for that is

 1) The declared type of value expression primary shall be an
array type.

However, assuming that the anynonarray idea works out, we can do that
and not worry about touching the array operators.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs
On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
 Florian G. Pflug wrote:
  Work done so far:
  -
   .) Don't start autovacuum and bgwriter. 
 
 Do table stats used by the planner get replicated on a PITR slave?  I 
 assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.

So if a query runs slow because of lack of stats you'd need to run
ANALYZE on the master, which would then propagate the stats to the slave
which could then use them. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 It looks to me like we could eliminate the conflict if we invented a new
 polymorphic pseudotype called anynonarray or some such, which would
 act like anyelement *except* it would not match an array.
 ...
 On the contrary, I would think that it fits nicely to close the loop
 on the anyarray/anyelement feature set.

OK, I hacked this together and it seems to behave at least as reasonably
as 8.2 does.  8.3 here means HEAD + anynonarray + capturing concat
operators.  I used integer as an example of a type for which 8.2 has an
implicit cast to text, and point as an example of a type for which it
doesn't:

Expression  8.3 8.2

text || texttext concat text concat
text || 'unknown'   text concat text concat
text || text[]  array concatarray concat
text || non-text array  error   error
text || non-text scalar text concat text concat [1]

integer || integer  error   text concat
integer || 'unknown'text concat text concat
integer || integer[]array concatarray concat
integer || non-integer arrayerror   error
integer || non-integer scalar   error   text concat [1]

point || point  error   error
point || 'unknown'  text concat 'array value must start ...'
point || point[]array concatarray concat
point || non-point arrayerror   error
point || non-point scalar   error   error

text[] || text[]array concatarray concat
text[] || 'unknown' error   error
text[] || non-text arrayerror   error
text[] || non-text scalar   error   error

[1] for types for which 8.2 has an implicit cast to text, else it fails.
These are:
 bigint
 smallint
 integer
 oid
 real
 double precision
 numeric
 date
 time without time zone
 time with time zone
 timestamp without time zone
 timestamp with time zone
 interval

(I was interested to find that there were cases where 8.2 would come out
with the dreaded array value must start with { or dimension
information error.)

I think that the above chart is pretty defensible; the only cases that
fail now where they worked before are concatenations where neither side
is either text or an unadorned string literal.  Frankly, I think this:

catany=# select 3 || 0.4;
ERROR:  operator does not exist: integer || numeric

is way preferable to this:

regression=# select 3 || 0.4;
 ?column?
--
 30.4
(1 row)

which is what 8.2 does --- if you want text concatenation you should
make at least *some* effort to signal that, like casting one side to
text or at least quoting it.  Run-together concatenations like

catany=# select 'sin(' || 2 || ')';
 ?column?
--
 sin(2)
(1 row)

will work as long as at least one of the first two concatenated items is
textual or an unadorned string literal.

Barring objections I'll clean this up and commit it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Alvaro Herrera
Simon Riggs wrote:
 On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
  Florian G. Pflug wrote:
   Work done so far:
   -
.) Don't start autovacuum and bgwriter. 
  
  Do table stats used by the planner get replicated on a PITR slave?  I 
  assume so, but if not, you would need autovac to do analyzes.
 
 The replication is an exact block-level replication of the master. We
 can't write very much at all on the slave.

Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs
On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:

 .) Added a new GUC operational_mode, which can be set to either
 readwrite or readonly. If it is set to readwrite (the default),
 postgres behaves as usual. All the following changes are only
 in effect if operational_mode is set to readonly.

Do we need this? We are already InArchiveRecovery.

 .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR)
 if postgre is not in readwrite mode. This macro protects the
 following functions to make sure that no writes occur in
 readonly mode.
   SimpleLruWritePage, SLruPhysicalWritePage
   EndPrepare, FinishPreparedTransaction
   XLogInsert, XLogWrite, ShutdownXLog
   CreateCheckpoint
   MarkBufferDirty.

These are Asserts?

   .) All transactions are set to readonly mode (An implicit
  SET TRANSACTION READONLY), and are not allowed to do
  SET TRANSACTION READWRITE.

OK

   .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay
  is started, and it takes over that role that bgwriter play in the
  shutdown process.

Autovacuum - understood.

What does bgreplay do? Why not just start bgwriter earlier and disable
some of its other functionality while InRecovery?

   .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
  is considered to be later than any other xid.

So you are bumping FirstNormalTransactionId up by one for this?

You're assuming then that we will freeze replay while we run a query?
Otherwise doing this will mean the snapshot changes as a query executes.

   .) A global ReadOnlySnapshot is maintained in shared memory. This is
  copied into backend local memory by GetReadonlySnapshotData (which
  replaces GetSnapshotData in readonly mode).
.) Crash recovery is not performed in readonly mode - instead, postgres
   PANICs, and tells the DBA to restart in readwrite mode. Archive
   recovery of course *will* be allowed, but I'm not that far yet.

This is the very heart of the matter. This isn't just a technical issue,
it goes to the heart of the use case for this feature. Can we recover
while running queries? If not, how much time will we spend in replay
mode v query mode? Will we be able to run long running queries *and*
maintain a reasonable time to recover? Is this a mechanism for providing
HA and additional query capacity, or is it just a mechanism for
additional query capacity only? Those are open questions to which I
don't have any answers yet myself.

Will we switch back and forth between replay and query mode. 
Do we connect to the master, or to the slave?
If we connect to the slave will we accept new queries when in replay
mode and pause them before we switch back to query mode.

 Open Problems:
 --
.) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome,
   because callers usually call MarkBufferDirty from within a critical
   section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g.
   happens with my patch if you call nextval() in readonly mode.
   Does anyone see a better solution then adding checks into
   all callers that are not otherwise protected from being called
   in readonly mode?

Do we need to do this at all?

.) Since the slaves needs to track an Snapshot in shared memory, it cannot
   resize that snapshot to accomodate however many concurrent transactions
   might have been running on the master. My current plan is to detect if
   that global snapshot overflows, and to lock out readonly queries on the
   slave (and therefore remove the need of keeping the snapshot current)
   until the number of active xids on the master has dropped below
   max_connections on the slave. A warning will be written to the postgres
   log that suggest that the DBA increases the max_connections value on
   the slave.

Sized according to max_connections on the master?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs
On Wed, 2007-06-06 at 17:14 -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
  On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
   Florian G. Pflug wrote:
Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
   
   Do table stats used by the planner get replicated on a PITR slave?  I 
   assume so, but if not, you would need autovac to do analyzes.
  
  The replication is an exact block-level replication of the master. We
  can't write very much at all on the slave.
 
 Hmm, something to keep in mind is forcing cache invals when the master
 causes them (for example relation cache, catalog caches and plan
 caches).

Many things will need to work radically differently.

Best we think of this as Research rather than Development.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.


Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).



Perhaps if you are as PITR master and you have active readonly slaves 
then there should be a WAL record to note plan invalidations, etc?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] How do we create the releases?

2007-06-06 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, June 05, 2007 10:28:58 +0300 Devrim GÜNDÜZ 
[EMAIL PROTECTED] wrote:


 Hi Marc,

 Is there a written procedure about creating tarballs? I'd like to start
 working on 8.3 RPMs and I want to know what I should to to create a
 tarball.

Just a script ... relevant bits:

/usr/bin/cvs -d /cvsroot -q export -rREL7_3_19 pgsql
cd pgsql
./configure
cd doc/src
gmake postgres.tar.gz
mv postgres.tar.gz ..
gmake postgres.tar.gz
mv postgres.tar.gz ..
gmake man.tar.gz
mv man.tar.gz ..
cd sgml
gmake HISTORY INSTALL
mv -f HISTORY INSTALL ../../..
cd ../../..
gmake split-dist=yes dist
gmake maintainer-clean

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGZzuZ4QvfyHIvDvMRAij1AKDWTwj6DZWthxvuEIZE6P1PoGDHcgCg2tV/
IR2CdNIokfs363FGifAklUY=
=tcIx
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Joshua D. Drake

Joshua D. Drake wrote:

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Assume the following:
index on: (id, adate)
constraint CHECK(adate  '01-01-2007' AND adate  '04-01-2007');
The planner will not use the index listed above.

For what?



select adate from parent where adate = '01-25-2007'


That's unsurprising.  Searching with only a lower-order index column
value seldom wins, 'cause you've got to scan the entire index.  The
constraint is irrelevant to this.


I guess where I got confused is:

http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html

And explicitly:

A multicolumn B-tree index can be used with query conditions that 
involve any subset of the index's columns, but the index is most 
efficient when there are constraints on the leading (leftmost) columns.


Considering the paragraph from the documentation above, should we change 
the documentation?


Joshua D. Drake




Sincerely,

Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster







---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 I guess where I got confused is:
 
 http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html
 
 And explicitly:
 
 A multicolumn B-tree index can be used with query conditions that 
 involve any subset of the index's columns, but the index is most 
 efficient when there are constraints on the leading (leftmost) columns.

 Considering the paragraph from the documentation above, should we change 
 the documentation?

That statement seems perfectly accurate to me.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Joshua D. Drake

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

I guess where I got confused is:

http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html

And explicitly:

A multicolumn B-tree index can be used with query conditions that 
involve any subset of the index's columns, but the index is most 
efficient when there are constraints on the leading (leftmost) columns.


Considering the paragraph from the documentation above, should we change 
the documentation?


That statement seems perfectly accurate to me.


O.k. then perhaps I am being dense, but that statement says to me that 
the planner should be able to use the right element of a composite index 
but that it will not always do so.


Considering an index of a,b if I search for b I would expect that the 
planner could use the index. Assuming of course that the planner would 
use the same index if it was just b.


Further, I would expect a smaller chance of it using b if the index was 
a,c,b but that it might still use it.


Is that not the case? Should I expect that even in the simplest of cases 
that we will not use an index unless it is *the* leftmost element?


Sincerely,

Joshua D. Drake






regards, tom lane




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 That statement seems perfectly accurate to me.

 Considering an index of a,b if I search for b I would expect that the 
 planner could use the index.

It can.  Whether it will think that's a good idea is another question
entirely, and one that seems a bit beyond the scope of the discussion
you're mentioning.

Try forcing the issue with enable_seqscan, and see what sort of
estimated and actual costs you get ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis
On Wed, 2007-06-06 at 22:36 +0100, Simon Riggs wrote:
.) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
   is considered to be later than any other xid.
 
 So you are bumping FirstNormalTransactionId up by one for this?
 
 You're assuming then that we will freeze replay while we run a query?
 Otherwise doing this will mean the snapshot changes as a query executes.

Is it possible to put a normal xmax for the snapshot?

It wouldn't be a real transaction on the slave, and also the master will
use that ID for a real transaction itself. However, I don't see a real
problem on the slave because it would only be used for the purpose of
the snapshot we need at that moment.

Regards,
Jeff Davis 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Vacuuming anything zeroes shared table stats

2007-06-06 Thread Michael Fuhr
Is vacuuming any table supposed to zero the statistics for all
shared tables?  Doesn't that have implications for autovacuum?  The
example below is in 8.2.4 but I'm seeing similar behavior in 8.1.9
and 8.3devel.  Additionally, in 8.3devel doing anything that queries
or modifies a shared table seems to zero the statistics for all
shared tables.


test= select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del
test- from pg_stat_all_tables
test- where relid in (select oid from pg_class where relisshared)
test- order by relname;
 relname  | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del 
--+--+--+---+---+---
 pg_auth_members  |   25 |3 | 1 | 0 | 1
 pg_authid|7 |   40 | 0 | 0 | 0
 pg_database  |2 |7 | 0 | 0 | 0
 pg_pltemplate|2 |0 | 0 | 0 | 0
 pg_shdepend  |0 |4 | 2 | 0 | 2
 pg_shdescription |2 |0 | 0 | 0 | 0
 pg_tablespace|2 |0 | 0 | 0 | 0
 pg_toast_1260|1 |0 | 0 | 0 | 0
 pg_toast_1262|1 |0 | 0 | 0 | 0
 pg_toast_2396|1 |0 | 0 | 0 | 0
(10 rows)

test= vacuum foo;
VACUUM

test= select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del
test- from pg_stat_all_tables
test- where relid in (select oid from pg_class where relisshared)
test- order by relname;
 relname  | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del 
--+--+--+---+---+---
 pg_auth_members  |0 |0 | 0 | 0 | 0
 pg_authid|0 |0 | 0 | 0 | 0
 pg_database  |1 |0 | 0 | 0 | 0
 pg_pltemplate|0 |0 | 0 | 0 | 0
 pg_shdepend  |0 |0 | 0 | 0 | 0
 pg_shdescription |0 |0 | 0 | 0 | 0
 pg_tablespace|0 |0 | 0 | 0 | 0
 pg_toast_1260|0 |0 | 0 | 0 | 0
 pg_toast_1262|0 |0 | 0 | 0 | 0
 pg_toast_2396|0 |0 | 0 | 0 | 0
(10 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuuming anything zeroes shared table stats

2007-06-06 Thread Alvaro Herrera
Michael Fuhr wrote:
 Is vacuuming any table supposed to zero the statistics for all
 shared tables?

Huh, certainly not.  In any case, I think the problem may be related to
the fact that stats for shared tables are kept in a separate hash from
regular tables.

I'll investigate the issue tomorrow -- thanks for reporting.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
The Gord often wonders why people threaten never to come back after they've
been told never to return (www.actsofgord.com)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [COMMITTERS] pgsql: Create a GUC parametertemp_tablespacesthat allows selection of

2007-06-06 Thread Jaime Casanova

On 6/4/07, Tom Lane [EMAIL PROTECTED] wrote:

Perhaps a reasonable compromise could work like this: at the first point
in a transaction where a temp file is created, choose a random list
element, and thereafter advance cyclically for the duration of that
transaction.  This ensures within-transaction spread-out while still
having some randomness across backends.

The reason I'm thinking per-transaction is that we could tie this to
setting up a cached list of tablespace OIDs, which would avoid the
overhead of repeat parsing and tablespace validity checking.  We had
rejected using a long-lived cache because of the problem of tablespaces
getting dropped, but I think one that lasts only across a transaction
would be OK.

And the reason I'm thinking a cache is important is that if you really
want to get any win from this idea, you need to spread the temp files
across tablespaces *per file*, which is not the way it works now.


ok. are you doing this? or can i prepare a patch that implements this?
i guess we can allocate the memory for the list in TopTransactionContext.

--
regards,
Jaime Casanova

Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning.
  Richard Cook

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc

2007-06-06 Thread Jim C. Nasby
On Sat, Jun 02, 2007 at 01:37:19PM +, Tasneem Memon wrote:
 We can make the system ask the user as to what membership degree s/he wants 
 to get the values, but we don?t want to make the system interactive, where a 
 user gives a membership degree value of his/her choice. These operators are 
 supposed to work just like the other operators in SQL.. you just put them in 
 the query and get a result. I have put 0.8 because all the case studies I 
 have made for the NEAR,  0.8 seems to be the best choice.. 0.9 narrows the 
 range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, 
 these values will no more seem to be irrelevant when we haven?t got any 
 values till the md 0.8, so the operator fetches them when they are the 
 NEARest. 
  
While having them function just like any other operator is good, it
seems like you're making quite a bit of an assumption for the user;
namely that you know what their data looks like better than they might.
Is it not possible that someone would come along with a dataset that
looks different enough from your test cases so that the values you
picked wouldn't work?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpqZoDw3txQR.pgp
Description: PGP signature


[HACKERS] is_array_type vs type_is_array

2007-06-06 Thread Jeremy Drake
Was there some change in functionality reason for renaming is_array_type
to type_is_array?  It broke compilation of fulldisjunctions, which I build
and run regression tests on in my sandbox to keep it getting too horribly
broken with respect to current HEAD.  I got it to build and pass its
regression tests by adding this:

+ #if !defined(is_array_type)  defined(type_is_array)
+ #define is_array_type(x) type_is_array(x)
+ #endif

to the beginning of the one file which uses is_array_type.  Is this
reasonable to send back to the fulldisjunctions maintainer, or is there
some subtle change that prompted the name change to make uses of this
function immediately apparent?


-- 
Ducharme's Axiom:
If you view your problem closely enough you will recognize
yourself as part of the problem.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] is_array_type vs type_is_array

2007-06-06 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 Was there some change in functionality reason for renaming is_array_type
 to type_is_array?

Just to sync style with type_is_enum ... there were more of the latter
than the former.

 It broke compilation of fulldisjunctions,

Sorry, but we change internal APIs every day, and twice on Sundays.
Deal with it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match