date:20070606

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Zeugswetter Andreas ADI SD


> For example in 8.2 this is mapped to array_prepend:
> 
> regression=# select 'x'::text || array['aa','bb','cc'];
>?column?
> --
>  {x,aa,bb,cc}
> (1 row)
> 
> but with the experimental code you get textcat:
> 
> catany=# select 'x'::text || array['aa','bb','cc'];
>   ?column?
> -
>  x{aa,bb,cc}
> (1 row)

This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Pavel Stehule



> For example in 8.2 this is mapped to array_prepend:
>
> regression=# select 'x'::text || array['aa','bb','cc'];
>?column?
> --
>  {x,aa,bb,cc}
> (1 row)
>
> but with the experimental code you get textcat:
>
> catany=# select 'x'::text || array['aa','bb','cc'];
>   ?column?
> -
>  x{aa,bb,cc}
> (1 row)

This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas



I thing so current behave is more intuitive and practical. Result
x{aa,bb,cc} is nonsens. Array concation have to have higher priority
than text concation.

Pavel

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD


> > No, you misunderstood. Bruce was suggesting changing the target to
512.
> > That means if a row is wider than ~2k, toaster will try to toast
until 
> > the base row is
> > ~512 bytes. I would not do that part for 8.3. 
> 
> OK, what do you suggest for 8.3?  Attached are my suggestion 
> to use 512 and a 4k chunk size, which I think means that 2.7k 
> is the worst values that has a loss of around 25%.

Oh, so I misunderstood you also. You are suggesting two changes:
TOAST_TUPLES_PER_PAGE   16
EXTERN_TUPLES_PER_PAGE  2

While I agree, that 2 might be a good compromise with low risc for now,
I think
that toasting all rows down to ~512 bytes is too narrowly targeted at
not reading wider columns.

When suggesting a new target, I interpreted you like so:
#define TOAST_TUPLES_PER_PAGE   4
#define TOAST_TUPLE_TARGET  \
MAXALIGN_DOWN((BLCKSZ - \
   MAXALIGN(sizeof(PageHeaderData) +
(TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
  / 16)
So we would only toast rows wider than 2k, but once toasting, toast the
base row down to 512.  

My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
else would need extensive performance testing.
#define TOAST_TUPLES_PER_PAGE   4

My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
Split data wider than a page into page sized chunks as long as they fill
whole pages.
Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
This would not waste more space than currently, but improve performance
for very wide columns.

I can try to do a patch if you think that is a good idea, can't do a lot
of testing though.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Heikki Linnakangas

I'm again looking at way the GUC variables work in load distributed 
checkpoints patch. We've discussed them a lot already, but I don't think 
they're still quite right.


Write-phase
---
I like the way the write-phase is controlled in general. Writes are 
throttled so that we spend the specified percentage of checkpoint 
interval doing the writes. But we always write at a specified minimum 
rate to avoid spreading out the writes unnecessarily when there's little 
work to do.


The original patch uses bgwriter_all_max_pages to set the minimum rate. 
I think we should have a separate variable, checkpoint_write_min_rate, 
in KB/s, instead.


Nap phase
-
This is trickier. The purpose of the sleep between writes and fsyncs is 
to give the OS a chance to flush the pages to disk in it's own pace, 
hopefully limiting the affect on concurrent activity. The sleep 
shouldn't last too long, because any concurrent activity can be dirtying 
and writing more pages, and we might end up fsyncing more than necessary 
which is bad for performance. The optimal delay depends on many factors, 
but I believe it's somewhere between 0-30 seconds in any reasonable system.


In the current patch, the duration of the sleep between the write and 
sync phases is controlled as a percentage of checkpoint interval. Given 
that the optimal delay is in the range of seconds, and 
checkpoint_timeout can be up to 60 minutes, the useful values of that 
percentage would be very small, like 0.5% or even less. Furthermore, the 
optimal value doesn't depend that much on the checkpoint interval, it's 
more dependent on your OS and memory configuration.


We should therefore give the delay as a number of seconds instead of as 
a percentage of checkpoint interval.


Sync phase
--
This is also tricky. As with the nap phase, we don't want to spend too 
much time fsyncing, because concurrent activity will write more dirty 
pages and we might just end up doing more work.


And we don't know how much work an fsync performs. The patch uses the 
file size as a measure of that, but as we discussed that doesn't 
necessarily have anything to do with reality. fsyncing a 1GB file with 
one dirty block isn't any more expensive than fsyncing a file with a 
single block.


Another problem is the granularity of an fsync. If we fsync a 1GB file 
that's full of dirty pages, we can't limit the affect on other activity. 
The best we can do is to sleep between fsyncs, but sleeping more than a 
few seconds is hardly going to be useful, no matter how bad an I/O storm 
each fsync causes.


Because of the above, I'm thinking we should ditch the 
checkpoint_sync_percentage variable, in favor of:

checkpoint_fsync_period # duration of the fsync phase, in seconds
checkpoint_fsync_delay  # max. sleep between fsyncs, in milliseconds


In all phases, the normal bgwriter activities are performed: 
lru-cleaning and switching xlog segments if archive_timeout expires. If 
a new checkpoint request arrives while the previous one is still in 
progress, we skip all the delays and finish the previous checkpoint as 
soon as possible.



GUC summary and suggested default values

checkpoint_write_percent = 50 		# % of checkpoint interval to spread out 
writes
checkpoint_write_min_rate = 1000	# minimum I/O rate to write dirty 
buffers at checkpoint (KB/s)
checkpoint_nap_duration = 2 		# delay between write and sync phase, in 
seconds

checkpoint_fsync_period = 30# duration of the sync phase, in seconds
checkpoint_fsync_delay = 500# max. delay between fsyncs

I don't like adding that many GUC variables, but I don't really see a 
way to tune them automatically. Maybe we could just hard-code the last 
one, it doesn't seem that critical, but that still leaves us 4 variables.


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Florian G. Pflug


Hi

This is a description of the GSoC work I've so for on readonly
support for PITR slaves. I'm looking for any kind of comments
on this - I want to make sure that I work in a direction that
the community approves.

Work done so far:
-
.) Added a new GUC operational_mode, which can be set to either
   readwrite or readonly. If it is set to readwrite (the default),
   postgres behaves as usual. All the following changes are only
   in effect if operational_mode is set to readonly.
.) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR)
   if postgre is not in readwrite mode. This macro protects the
   following functions to make sure that no writes occur in
   readonly mode.
 SimpleLruWritePage, SLruPhysicalWritePage
 EndPrepare, FinishPreparedTransaction
 XLogInsert, XLogWrite, ShutdownXLog
 CreateCheckpoint
 MarkBufferDirty.
 .) All transactions are set to readonly mode (An implicit
SET TRANSACTION READONLY), and are not allowed to do
SET TRANSACTION READWRITE.
 .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay
is started, and it takes over that role that bgwriter play in the
shutdown process.
 .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
is considered to be "later" than any other xid.
 .) A global ReadOnlySnapshot is maintained in shared memory. This is
copied into backend local memory by GetReadonlySnapshotData (which
replaces GetSnapshotData in readonly mode).
  .) Crash recovery is not performed in readonly mode - instead, postgres
 PANICs, and tells the DBA to restart in readwrite mode. Archive
 recovery of course *will* be allowed, but I'm not that far yet.

Open Problems:
--
  .) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome,
 because callers usually call MarkBufferDirty from within a critical
 section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g.
 happens with my patch if you call nextval() in readonly mode.
 Does anyone see a better solution then adding checks into
 all callers that are not otherwise protected from being called
 in readonly mode?
  .) Since the slaves needs to track an Snapshot in shared memory, it cannot
 resize that snapshot to accomodate however many concurrent transactions
 might have been running on the master. My current plan is to detect if
 that global snapshot overflows, and to lock out readonly queries on the
 slave (and therefore remove the need of keeping the snapshot current)
 until the number of active xids on the master has dropped below
 max_connections on the slave. A warning will be written to the postgres
 log that suggest that the DBA increases the max_connections value on
 the slave.

Please tell me what you think about this approach, and especially if you
see any problems that I overlooked.

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Gregory Stark

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

> GUC summary and suggested default values
> 
> checkpoint_write_percent = 50 # % of checkpoint interval to 
> spread out writes
> checkpoint_write_min_rate = 1000  # minimum I/O rate to write dirty
> buffers at checkpoint (KB/s)

I don't understand why this is a min_rate rather than a max_rate.

> checkpoint_nap_duration = 2   # delay between write and sync phase, 
> in seconds

Not a comment on the choice of guc parameters, but don't we expect useful
values of this to be much closer to 30 than 0? I understand it might not be
exactly 30.

Actually, it's not so much whether there's any write traffic to the data files
during the nap that matters, it's whether there's more traffic during the nap
than during the 30s or so prior to the nap. As long as it's a steady-state
condition it shouldn't matter how long we wait, should it?

> checkpoint_fsync_period = 30  # duration of the sync phase, in seconds
> checkpoint_fsync_delay = 500  # max. delay between fsyncs

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane

Joe Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> In the long run maybe we should choose some other name for the
>> array_append and array_prepend operators to avoid the confusion with
>> concatenation.  It seems to me that "concatenation" normally implies
>> "stringing together similar objects", which these two operators
>> definitely don't do, and so you could argue that || was a bad name
>> for them from the get-go.

> Originally I saw this situation as as requiring the concatenation 
> operator per SQL 2003:

Maybe I am missing something, but the only such construct I see in
SQL2003 is concatenation of arrays of equal rank.  There is nothing
corresponding to array_prepend or array_append.

I do have a plan B if people don't want to rename the operators, though.
It looks to me like we could eliminate the conflict if we invented a new
polymorphic pseudotype called "anynonarray" or some such, which would
act like anyelement *except* it would not match an array.  Then,
declaring the capturing operators as text||anynonarray and
anynonarray||text would prevent them from matching any case where either
side was known to be an array type.  But they would (I think) still win
out in cases such as scalar || 'unknown literal'.  The end result would
be that concatenations involving a known-array value would be array
concatenation, but you could force them to be text concatenation, if
that's what you wanted, by explicitly casting the array value(s) to text.

I was a bit hesitant to propose this since I couldn't immediately think
of any other use-case for such a pseudotype.  It's not a huge amount of
added code (cf. anyenum) but it's definitely a visible wart on the type
system.  Comments?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Alvaro Herrera

Tom Lane wrote:

> I do have a plan B if people don't want to rename the operators, though.
> It looks to me like we could eliminate the conflict if we invented a new
> polymorphic pseudotype called "anynonarray" or some such, which would
> act like anyelement *except* it would not match an array.  Then,
> declaring the capturing operators as text||anynonarray and
> anynonarray||text would prevent them from matching any case where either
> side was known to be an array type.  But they would (I think) still win
> out in cases such as scalar || 'unknown literal'.  The end result would
> be that concatenations involving a known-array value would be array
> concatenation, but you could force them to be text concatenation, if
> that's what you wanted, by explicitly casting the array value(s) to text.
> 
> I was a bit hesitant to propose this since I couldn't immediately think
> of any other use-case for such a pseudotype.  It's not a huge amount of
> added code (cf. anyenum) but it's definitely a visible wart on the type
> system.  Comments?

On the contrary, I would think that it fits nicely to "close the loop"
on the anyarray/anyelement feature set.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Bruce Momjian

Zeugswetter Andreas ADI SD wrote:
> 
> > > No, you misunderstood. Bruce was suggesting changing the target to
> 512.
> > > That means if a row is wider than ~2k, toaster will try to toast
> until 
> > > the base row is
> > > ~512 bytes. I would not do that part for 8.3. 
> > 
> > OK, what do you suggest for 8.3?  Attached are my suggestion 
> > to use 512 and a 4k chunk size, which I think means that 2.7k 
> > is the worst values that has a loss of around 25%.
> 
> Oh, so I misunderstood you also. You are suggesting two changes:
> TOAST_TUPLES_PER_PAGE 16
> EXTERN_TUPLES_PER_PAGE2

Right.

> While I agree, that 2 might be a good compromise with low risc for now,
> I think
> that toasting all rows down to ~512 bytes is too narrowly targeted at
> not reading wider columns.

Well, it is summarized here:

http://momjian.us/expire/TOAST/SUMMARY.html

It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
like a good compromise.

> When suggesting a new target, I interpreted you like so:
> #define TOAST_TUPLES_PER_PAGE 4
> #define TOAST_TUPLE_TARGET\
>   MAXALIGN_DOWN((BLCKSZ - \
>  MAXALIGN(sizeof(PageHeaderData) +
> (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
> / 16)
> So we would only toast rows wider than 2k, but once toasting, toast the
> base row down to 512.  

That is certainly not my intent, and I don't see how you would get the
2k number from that macro.  I think you are looking at 8.2 and not CVS
HEAD.  CVS HEAD has:

#define TOAST_TUPLE_TARGET  TOAST_TUPLE_THRESHOLD

> My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
> else would need extensive performance testing.
> #define TOAST_TUPLES_PER_PAGE 4
> 
> My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
> Split data wider than a page into page sized chunks as long as they fill
> whole pages.
> Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
> This would not waste more space than currently, but improve performance
> for very wide columns.
> 
> I can try to do a patch if you think that is a good idea, can't do a lot
> of testing though.

None of this spliting is going to happen for 8.3.  The question what
changes we can make for 8.3, if any.


-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> GUC summary and suggested default values
> 
> checkpoint_write_percent = 50 # % of checkpoint interval to 
> spread out 
> writes
> checkpoint_write_min_rate = 1000  # minimum I/O rate to write dirty 
> buffers at checkpoint (KB/s)
> checkpoint_nap_duration = 2   # delay between write and sync phase, 
> in 
> seconds
> checkpoint_fsync_period = 30  # duration of the sync phase, in seconds
> checkpoint_fsync_delay = 500  # max. delay between fsyncs

> I don't like adding that many GUC variables, but I don't really see a 
> way to tune them automatically.

If we don't know how to tune them, how will the users know?  Having to
add that many variables to control one feature says to me that we don't
understand the feature.

Perhaps what we need is to think about how it can auto-tune itself.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I do have a plan B if people don't want to rename the operators, though.
>> It looks to me like we could eliminate the conflict if we invented a new
>> polymorphic pseudotype called "anynonarray" or some such, which would
>> act like anyelement *except* it would not match an array.
>> ...
>> I was a bit hesitant to propose this since I couldn't immediately think
>> of any other use-case for such a pseudotype.  It's not a huge amount of
>> added code (cf. anyenum) but it's definitely a visible wart on the type
>> system.  Comments?

> On the contrary, I would think that it fits nicely to "close the loop"
> on the anyarray/anyelement feature set.

OK, I'll go code this up and verify that it behaves like I think it will...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Gregory Stark

"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Well, it is summarized here:
>
>   http://momjian.us/expire/TOAST/SUMMARY.html
>
> It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
> like a good compromise.

Is this still testing with all data fitting in RAM?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] TOAST usage setting

2007-06-06 Thread Zeugswetter Andreas ADI SD


> > While I agree, that 2 might be a good compromise with low risc for 
> > now, I think that toasting all rows down to ~512 bytes is too
narrowly 
> > targeted at not reading wider columns.
> 
> Well, it is summarized here:
> 
>   http://momjian.us/expire/TOAST/SUMMARY.html
> 
> It made non-TOAST access 2x faster, but TOAST 7x slower, and 
> that seemed like a good compromise.

Yes, my argument was that I expect that in the up to 1-2k range more use
cases will suffer the 7x slowdown, than see the speedup. But the ratio
certainly is hard to judge, and you may well be right.

e.g. for me TOAST_TUPLES_PER_PAGE 8 would be ok, I have base row widths
of ~700 in 2 tables that would suffer if further toasted, but none in
the 1k - 2k range.

I wonder whether this threshold isn't rather absolute, thus the 32k
pagesize users should probably use 32.

> > When suggesting a new target, I interpreted you like so:
> > #define TOAST_TUPLES_PER_PAGE   4
> > #define TOAST_TUPLE_TARGET  \
> > MAXALIGN_DOWN((BLCKSZ - \
> >MAXALIGN(sizeof(PageHeaderData) +
> > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
> >   / 16)
> > So we would only toast rows wider than 2k, but once toasting, toast 
> > the base row down to 512.
> 
> That is certainly not my intent, and I don't see how you 
> would get the 2k number from that macro.  I think you are

the ~2k come from TOAST_TUPLE_THRESHOLD 

> looking at 8.2 and not CVS HEAD.  CVS HEAD has:
> 
>   #define TOAST_TUPLE_TARGET  TOAST_TUPLE_THRESHOLD

Nope, I meant what I said. To only change the target you would replace
above macro for TOAST_TUPLE_TARGET.
But I also don't see how this would be good.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] elog.c logic bug?

2007-06-06 Thread Andrew Dunstan



I have just been staring for some time at the logic in 
src/backend/utils/error/elog.c:send_message_to_server_log(), which 
contains this fragment near the end:




   /* Write to stderr, if enabled */
   if ((Log_destination & LOG_DESTINATION_STDERR) || whereToSendOutput 
== DestDebug)

   {
#ifdef WIN32

   /*
* In a win32 service environment, there is no usable stderr. 
Capture

* anything going there and write it to the eventlog instead.
*
* If stderr redirection is active, it's ok to write to stderr 
because
* that's really a pipe to the syslogger process. Unless we're 
in the

* postmaster, and the syslogger process isn't started yet.
*/
   if ((!Redirect_stderr || am_syslogger || (!IsUnderPostmaster && 
SysLoggerPID==0)) && pgwin32_is_service())

   write_eventlog(edata->elevel, buf.data);
   else
#endif
   fprintf(stderr, "%s", buf.data);
   }

   /* If in the syslogger process, try to write messages direct to file */
   if (am_syslogger)
   write_syslogger_file(buf.data, buf.len);

ISTM that this is a bug - the last statement should be inside the STDERR 
block above, the last part of which would then read:


   /* If in the syslogger process, try to write messages direct to file */
   if (am_syslogger)
   write_syslogger_file(buf.data, buf.len);
  else
   fprintf(stderr, "%s", buf.data);

If not I have missed something - why would the syslogger be trying to 
write to its output (possibly for the second time) regardless of what 
Log_destination is set to?


cheers

andrew






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor


Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] elog.c logic bug?

2007-06-06 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> If not I have missed something - why would the syslogger be trying to 
> write to its output (possibly for the second time) regardless of what 
> Log_destination is set to?

You're mistaken: within the syslogger process, stderr doesn't point to
the same place as the target file (it's normally the same as the
original postmaster stderr).  The reason the code is set up to try to
write both stderr and the target file is to maximize the chance that an
internally generated error in syslogger will get reported *someplace*.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis

On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:
>.) Since the slaves needs to track an Snapshot in shared memory, it cannot
>   resize that snapshot to accomodate however many concurrent transactions
>   might have been running on the master. My current plan is to detect if
>   that global snapshot overflows, and to lock out readonly queries on the
>   slave (and therefore remove the need of keeping the snapshot current)
>   until the number of active xids on the master has dropped below
>   max_connections on the slave. A warning will be written to the postgres
>   log that suggest that the DBA increases the max_connections value on
>   the slave.
> 

If we did lock the slave while waiting for transactions to complete on
the master, we'd need to document some stronger warnings against idle
transactions so that administrators could notice and correct the
problem.

Are you referring to the size of the xip array being a problem? Would it
help to tie the size of the xip array to max_connections? I understand
that max_connections might be greater on the master, but maybe something
similar? 

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Florian G. Pflug


Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 


Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.


Yes - everything that get wal-logged on the master gets replicated to
the slave. In my design, it isn't possible to do "analyze" on the slave,
because all datafiles are strictly readonly (well, with the small
exception of hit-bit updates actually).

greetings, Florian Pflug



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Florian G. Pflug


Jeff Davis wrote:

On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:

   .) Since the slaves needs to track an Snapshot in shared memory, it cannot
  resize that snapshot to accomodate however many concurrent transactions
  might have been running on the master. My current plan is to detect if
  that global snapshot overflows, and to lock out readonly queries on the
  slave (and therefore remove the need of keeping the snapshot current)
  until the number of active xids on the master has dropped below
  max_connections on the slave. A warning will be written to the postgres
  log that suggest that the DBA increases the max_connections value on
  the slave.


If we did lock the slave while waiting for transactions to complete on
the master, we'd need to document some stronger warnings against idle
transactions so that administrators could notice and correct the
problem.


It's not exactly locking until it complete on the master, it's locking
the slave until we reach a position in the wal on the slave with less
than max_connections concurrent transactions. But yes, I agree, this
will need to be documented.


Are you referring to the size of the xip array being a problem? Would it
help to tie the size of the xip array to max_connections? I understand
that max_connections might be greater on the master, but maybe something
similar? 


Thats what I currently do - the xip array on the slave is sized to
hold max_connections entries (Actually, it's max_connections +
max_prepared_xacts I think). The problem occurs exactly if those
values are set too small on the slave - and since shared mem
objects are not resizeable, I don't see how the slave can handle
an xip overflow gracefully other than by not publishing the
information in shared memory as long as it doesn't fit there.

On a further thinking - maybe locking out transactions isn't even
necessary - they would just continue to see the old global snapshot,
so time wouldn't advance for them until the number of concurrent
transactions decreases again.

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Joe Conway


Tom Lane wrote:

Joe Conway <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

In the long run maybe we should choose some other name for the
array_append and array_prepend operators to avoid the confusion with
concatenation.  It seems to me that "concatenation" normally implies
"stringing together similar objects", which these two operators
definitely don't do, and so you could argue that || was a bad name
for them from the get-go.


Originally I saw this situation as as requiring the concatenation 
operator per SQL 2003:


Maybe I am missing something, but the only such construct I see in
SQL2003 is concatenation of arrays of equal rank.  There is nothing
corresponding to array_prepend or array_append.


Well, I've never claimed to be particularly good at interpreting the SQL 
spec, but as an example...


 ::=
|| 
 ::=
 ::=
 ::=
 ::=
 ::=


Doesn't this mean that array concatenation should include things like:

|| 

e.g.

  ARRAY[1,2,3] || 42
?


I do have a plan B if people don't want to rename the operators, though.
It looks to me like we could eliminate the conflict if we invented a new
polymorphic pseudotype called "anynonarray" or some such, which would
act like anyelement *except* it would not match an array.  Then,
declaring the capturing operators as text||anynonarray and
anynonarray||text would prevent them from matching any case where either
side was known to be an array type.  But they would (I think) still win
out in cases such as scalar || 'unknown literal'.  The end result would
be that concatenations involving a known-array value would be array
concatenation, but you could force them to be text concatenation, if
that's what you wanted, by explicitly casting the array value(s) to text.


That sounds reasonable to me.

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Greg Smith


On Wed, 6 Jun 2007, Tom Lane wrote:


If we don't know how to tune them, how will the users know?


I can tell you a good starting set for them to on a Linux system, but you 
first have to let me know how much memory is in the OS buffer cache, the 
typical I/O rate the disks can support, how many buffers are expected to 
be written out by BGW/other backends at heaviest load, and the current 
setting for /proc/sys/vm/dirty_background_ratio.  It's not a coincidence 
that there are patches applied to 8.3 or in the queue to measure all of 
the Postgres internals involved in that computation; I've been picking 
away at the edges of this problem.


Getting this sort of tuning right takes that level of information about 
the underlying system.  If there's a way to internally auto-tune the 
values this patch operates on (which I haven't found despite months of 
trying), it would be in the form of some sort of measurement/feedback loop 
based on how fast data is being written out.  There really are way too 
many things involved to try and tune it based on anything else; the 
underlying OS/hardware mechanisms that determine how this will go are 
complicated enough that it might as well be a black box for most people.


One of the things I've been fiddling with the design of is a testing 
program that simulates database activity at checkpoint time under load. 
I think running some tests like that is the most straightforward way to 
generate useful values for these tunables; it's much harder to try and 
determine them from within the backends because there's so much going on 
to keep track of.


I view the LDC mechanism as being in the same state right now as the 
background writer:  there are a lot of complicated knobs to tweak, they 
all do *something* useful for someone, and eliminating them will require a 
data-collection process across a much wider sample of data than can be 
collected quickly.  If I had to make a guess how this will end up, I'd 
expect there to be more knobs in LDC than everyone would like for the 8.3 
release, along with fairly verbose logging of what is happening at 
checkpoint time (that's why I've been nudging development in that area, 
along with making logs easier to aggregate).  Collect up enough of that 
information, then you're in a position to talk about useful automatic 
tuning--right around the 8.4 timeframe I suspect.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis

On Wed, 2007-06-06 at 19:25 +0200, Florian G. Pflug wrote:
> Thats what I currently do - the xip array on the slave is sized to
> hold max_connections entries (Actually, it's max_connections +
> max_prepared_xacts I think). The problem occurs exactly if those
> values are set too small on the slave - and since shared mem
> objects are not resizeable, I don't see how the slave can handle
> an xip overflow gracefully other than by not publishing the
> information in shared memory as long as it doesn't fit there.
> 

That seems like a very minor issue then. It's not unreasonable to expect
that the PITR slave is configured very similarly to the master. You may
even want to require it (if there are other reasons, too).

> On a further thinking - maybe locking out transactions isn't even
> necessary - they would just continue to see the old global snapshot,
> so time wouldn't advance for them until the number of concurrent
> transactions decreases again.

That sounds better than locking out all reads. Perhaps still a warning
in the logs though. If the situation you describe happens, the
administrator probably needs to know about it.

Regards,
Jeff Davis

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-06 Thread Greg Smith


On Wed, 6 Jun 2007, Heikki Linnakangas wrote:

The original patch uses bgwriter_all_max_pages to set the minimum rate. I 
think we should have a separate variable, checkpoint_write_min_rate, in KB/s, 
instead.


Completely agreed.  There shouldn't be any coupling with the background 
writer parameters, which may be set for a completely different set of 
priorities than the checkpoint has.  I have to look at this code again to 
see why it's a min_rate instead of a max, that seems a little weird.


Nap phase:  We should therefore give the delay as a number of seconds 
instead of as a percentage of checkpoint interval.


Again, the setting here should be completely decoupled from another GUC 
like the interval.  My main complaint with the original form of this patch 
was how much it tried to syncronize the process with the interval; since I 
don't even have a system where that value is set to something, because 
it's all segment based instead, that whole idea was incompatible.


The original patch tried to spread the load out as evenly as possible over 
the time available.  I much prefer thinking in terms of getting it done as 
quickly as possible while trying to bound the I/O storm.


And we don't know how much work an fsync performs. The patch uses the file 
size as a measure of that, but as we discussed that doesn't necessarily have 
anything to do with reality. fsyncing a 1GB file with one dirty block isn't 
any more expensive than fsyncing a file with a single block.


On top of that, if you have a system with a write cache, the time an fsync 
takes can greatly depend on how full it is at the time, which there is no 
way to measure or even model easily.


Is there any way to track how many dirty blocks went into each file during 
the checkpoint write?  That's your best bet for guessing how long the 
fsync will take.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] elog.c logic bug?

2007-06-06 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
If not I have missed something - why would the syslogger be trying to 
write to its output (possibly for the second time) regardless of what 
Log_destination is set to?



You're mistaken: within the syslogger process, stderr doesn't point to
the same place as the target file (it's normally the same as the
original postmaster stderr).  The reason the code is set up to try to
write both stderr and the target file is to maximize the chance that an
internally generated error in syslogger will get reported *someplace*.


  


OK, thanks, I'll try to make that a bit clearer in a comment.

cheers

andrew


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] msvc, build and install with cygwin in the PATH

2007-06-06 Thread Hannes Eder


Magnus Hagander wrote:
>Hannes Eder wrote:
>> Is it worth doing this the "Perl-way" and using File::Find? If so, I 
can

>> work an a patch for that.
>>
> It's certainly cleaner that way, but I don't find it a major issue. 
But I'd

> rather see that fix than the other one.

Here we go. See attached patch. Your comments are welcome.

Hannes.

*** ..\pgsql-cvshead\src\tools\msvc\Install.pm  Mo Mai 14 16:36:10 2007
--- src\tools\msvc\Install.pm   Mi Jun  6 20:39:47 2007
***
*** 10,15 
--- 10,18 
  use Carp;
  use File::Basename;
  use File::Copy;
+ use File::Find;
+ use File::Glob;
+ use File::Spec;
  
  use Exporter;
  our (@ISA,@EXPORT_OK);
***
*** 99,104 
--- 102,142 
  print "\n";
  }
  
+ sub FindFiles
+ {
+ my $spec = shift;
+ my $nonrecursive = shift;
+ my $pat = basename($spec);
+ my $dir = dirname($spec);
+ 
+ if ($dir eq '') { $dir = '.'; }
+ 
+ -d $dir || croak "Could not list directory $dir: $!\n";
+ 
+ if ($nonrecursive)
+ {
+ return glob($spec);
+ }
+ 
+ # borrowed from File::DosGlob
+ # escape regex metachars but not glob chars
+ $pat =~ s:([].+^\-\${}[|]):\\$1:g;
+ # and convert DOS-style wildcards to regex
+ $pat =~ s/\*/.*/g;
+ $pat =~ s/\?/.?/g;
+ 
+ $pat = '^' . $pat . '\z';
+ 
+ my @res;
+ find(
+ {
+ wanted => sub { /$pat/s && push (@res, 
File::Spec->canonpath($File::Find::name)); }
+ },
+ $dir
+ );
+ return @res;
+ }
+ 
  sub CopySetOfFiles
  {
  my $what = shift;
***
*** 106,126 
  my $target = shift;
  my $silent = shift;
  my $norecurse = shift;
- my $D;
  
- my $subdirs = $norecurse?'':'/s';
  print "Copying $what" unless ($silent);
! open($D, "dir /b $subdirs $spec |") || croak "Could not list $spec\n";
! while (<$D>)
  {
- chomp;
  next if /regress/; # Skip temporary install in regression subdir
! my $tgt = $target . basename($_);
  print ".";
! my $src = $norecurse?(dirname($spec) . '/' . $_):$_;
! copy($src, $tgt) || croak "Could not copy $src: $!\n";
  }
! close($D);
  print "\n";
  }
  
--- 144,161 
  my $target = shift;
  my $silent = shift;
  my $norecurse = shift;
  
  print "Copying $what" unless ($silent);
! 
! foreach (FindFiles($spec, $norecurse))
  {
  next if /regress/; # Skip temporary install in regression subdir
! my $src = $_;
! my $tgt = $target . basename($src);
  print ".";
! copy($src, $tgt) || croak "Could not copy $src to $tgt: $!\n";
  }
! 
  print "\n";
  }
  
***
*** 371,395 
  {
  my $target = shift;
  my $nlspath = shift;
- my $D;
  
  print "Installing NLS files...";
  EnsureDirectories($target, "share/locale");
! open($D,"dir /b /s nls.mk|") || croak "Could not list nls.mk\n";
! while (<$D>)
  {
- chomp;
  s/nls.mk/po/;
  my $dir = $_;
  next unless ($dir =~ /([^\\]+)\\po$/);
  my $prgm = $1;
  $prgm = 'postgres' if ($prgm eq 'backend');
- my $E;
- open($E,"dir /b $dir\\*.po|") || croak "Could not list contents of 
$_\n";
  
! while (<$E>)
  {
- chomp;
  my $lang;
  next unless /^(.*)\.po/;
  $lang = $1;
--- 406,425 
  {
  my $target = shift;
  my $nlspath = shift;
  
  print "Installing NLS files...";
  EnsureDirectories($target, "share/locale");
! 
! foreach (FindFiles("nls.mk"))
  {
  s/nls.mk/po/;
  my $dir = $_;
  next unless ($dir =~ /([^\\]+)\\po$/);
  my $prgm = $1;
  $prgm = 'postgres' if ($prgm eq 'backend');
  
! foreach (FindFiles("$dir\\*.po", 1))
  {
  my $lang;
  next unless /^(.*)\.po/;
  $lang = $1;
***
*** 401,409 
&& croak("Could not run msgfmt on $dir\\$_");
  print ".";
  }
- close($E);
  }
! close($D);
  print "\n";
  }
  
--- 431,438 
&& croak("Could not run msgfmt on $dir\\$_");
  print ".";
  }
  }
! 
  print "\n";
  }
  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane

Joe Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Maybe I am missing something, but the only such construct I see in
>> SQL2003 is concatenation of arrays of equal rank.  There is nothing
>> corresponding to array_prepend or array_append.

> Well, I've never claimed to be particularly good at interpreting the SQL 
> spec, but as an example...

>  ::=
>  || 
>  ::=
>  ::=
>  ::=
>  ::=
>  ::=
>  

> Doesn't this mean that array concatenation should include things like:
>  || 

No, because the first syntax rule for that is

 1) The declared type of  shall be an
array type.

However, assuming that the anynonarray idea works out, we can do that
and not worry about touching the array operators.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
> Florian G. Pflug wrote:
> > Work done so far:
> > -
> >  .) Don't start autovacuum and bgwriter. 
> 
> Do table stats used by the planner get replicated on a PITR slave?  I 
> assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.

So if a query runs slow because of lack of stats you'd need to run
ANALYZE on the master, which would then propagate the stats to the slave
which could then use them. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Implicit casts with generic arrays

2007-06-06 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> It looks to me like we could eliminate the conflict if we invented a new
>> polymorphic pseudotype called "anynonarray" or some such, which would
>> act like anyelement *except* it would not match an array.
> ...
> On the contrary, I would think that it fits nicely to "close the loop"
> on the anyarray/anyelement feature set.

OK, I hacked this together and it seems to behave at least as reasonably
as 8.2 does.  "8.3" here means HEAD + anynonarray + capturing concat
operators.  I used integer as an example of a type for which 8.2 has an
implicit cast to text, and point as an example of a type for which it
doesn't:

Expression  8.3 8.2

text || texttext concat text concat
text || 'unknown'   text concat text concat
text || text[]  array concatarray concat
text || non-text array  error   error
text || non-text scalar text concat text concat [1]

integer || integer  error   text concat
integer || 'unknown'text concat text concat
integer || integer[]array concatarray concat
integer || non-integer arrayerror   error
integer || non-integer scalar   error   text concat [1]

point || point  error   error
point || 'unknown'  text concat 'array value must start ...'
point || point[]array concatarray concat
point || non-point arrayerror   error
point || non-point scalar   error   error

text[] || text[]array concatarray concat
text[] || 'unknown' error   error
text[] || non-text arrayerror   error
text[] || non-text scalar   error   error

[1] for types for which 8.2 has an implicit cast to text, else it fails.
These are:
 bigint
 smallint
 integer
 oid
 real
 double precision
 numeric
 date
 time without time zone
 time with time zone
 timestamp without time zone
 timestamp with time zone
 interval

(I was interested to find that there were cases where 8.2 would come out
with the dreaded "array value must start with "{" or dimension
information" error.)

I think that the above chart is pretty defensible; the only cases that
fail now where they worked before are concatenations where neither side
is either text or an unadorned string literal.  Frankly, I think this:

catany=# select 3 || 0.4;
ERROR:  operator does not exist: integer || numeric

is way preferable to this:

regression=# select 3 || 0.4;
 ?column?
--
 30.4
(1 row)

which is what 8.2 does --- if you want text concatenation you should
make at least *some* effort to signal that, like casting one side to
text or at least quoting it.  Run-together concatenations like

catany=# select 'sin(' || 2 || ')';
 ?column?
--
 sin(2)
(1 row)

will work as long as at least one of the first two concatenated items is
textual or an unadorned string literal.

Barring objections I'll clean this up and commit it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Alvaro Herrera

Simon Riggs wrote:
> On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
> > Florian G. Pflug wrote:
> > > Work done so far:
> > > -
> > >  .) Don't start autovacuum and bgwriter. 
> > 
> > Do table stats used by the planner get replicated on a PITR slave?  I 
> > assume so, but if not, you would need autovac to do analyzes.
> 
> The replication is an exact block-level replication of the master. We
> can't write very much at all on the slave.

Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs

On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:

> .) Added a new GUC operational_mode, which can be set to either
> readwrite or readonly. If it is set to readwrite (the default),
> postgres behaves as usual. All the following changes are only
> in effect if operational_mode is set to readonly.

Do we need this? We are already InArchiveRecovery.

> .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR)
> if postgre is not in readwrite mode. This macro protects the
> following functions to make sure that no writes occur in
> readonly mode.
>   SimpleLruWritePage, SLruPhysicalWritePage
>   EndPrepare, FinishPreparedTransaction
>   XLogInsert, XLogWrite, ShutdownXLog
>   CreateCheckpoint
>   MarkBufferDirty.

These are Asserts?

>   .) All transactions are set to readonly mode (An implicit
>  SET TRANSACTION READONLY), and are not allowed to do
>  SET TRANSACTION READWRITE.

OK

>   .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay
>  is started, and it takes over that role that bgwriter play in the
>  shutdown process.

Autovacuum -> understood.

What does bgreplay do? Why not just start bgwriter earlier and disable
some of its other functionality while InRecovery?

>   .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
>  is considered to be "later" than any other xid.

So you are bumping FirstNormalTransactionId up by one for this?

You're assuming then that we will "freeze" replay while we run a query?
Otherwise doing this will mean the snapshot changes as a query executes.

>   .) A global ReadOnlySnapshot is maintained in shared memory. This is
>  copied into backend local memory by GetReadonlySnapshotData (which
>  replaces GetSnapshotData in readonly mode).
>.) Crash recovery is not performed in readonly mode - instead, postgres
>   PANICs, and tells the DBA to restart in readwrite mode. Archive
>   recovery of course *will* be allowed, but I'm not that far yet.

This is the very heart of the matter. This isn't just a technical issue,
it goes to the heart of the use case for this feature. Can we recover
while running queries? If not, how much time will we spend in replay
mode v query mode? Will we be able to run long running queries *and*
maintain a reasonable time to recover? Is this a mechanism for providing
HA and additional query capacity, or is it just a mechanism for
additional query capacity only? Those are open questions to which I
don't have any answers yet myself.

Will we switch back and forth between replay and query mode. 
Do we connect to the master, or to the slave?
If we connect to the slave will we accept new queries when in replay
mode and pause them before we switch back to query mode.

> Open Problems:
> --
>.) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome,
>   because callers usually call MarkBufferDirty from within a critical
>   section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g.
>   happens with my patch if you call nextval() in readonly mode.
>   Does anyone see a better solution then adding checks into
>   all callers that are not otherwise protected from being called
>   in readonly mode?

Do we need to do this at all?

>.) Since the slaves needs to track an Snapshot in shared memory, it cannot
>   resize that snapshot to accomodate however many concurrent transactions
>   might have been running on the master. My current plan is to detect if
>   that global snapshot overflows, and to lock out readonly queries on the
>   slave (and therefore remove the need of keeping the snapshot current)
>   until the number of active xids on the master has dropped below
>   max_connections on the slave. A warning will be written to the postgres
>   log that suggest that the DBA increases the max_connections value on
>   the slave.

Sized according to max_connections on the master?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Simon Riggs

On Wed, 2007-06-06 at 17:14 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:
> > > Florian G. Pflug wrote:
> > > > Work done so far:
> > > > -
> > > >  .) Don't start autovacuum and bgwriter. 
> > > 
> > > Do table stats used by the planner get replicated on a PITR slave?  I 
> > > assume so, but if not, you would need autovac to do analyzes.
> > 
> > The replication is an exact block-level replication of the master. We
> > can't write very much at all on the slave.
> 
> Hmm, something to keep in mind is forcing cache invals when the master
> causes them (for example relation cache, catalog caches and plan
> caches).

Many things will need to work radically differently.

Best we think of this as Research rather than Development.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2007-06-06 at 12:17 -0400, Matthew T. O'Connor wrote:

Florian G. Pflug wrote:

Work done so far:
-
 .) Don't start autovacuum and bgwriter. 
Do table stats used by the planner get replicated on a PITR slave?  I 
assume so, but if not, you would need autovac to do analyzes.

The replication is an exact block-level replication of the master. We
can't write very much at all on the slave.


Hmm, something to keep in mind is forcing cache invals when the master
causes them (for example relation cache, catalog caches and plan
caches).



Perhaps if you are as PITR master and you have active readonly slaves 
then there should be a WAL record to note plan invalidations, etc?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] How do we create the releases?

2007-06-06 Thread Marc G. Fournier

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

- --On Tuesday, June 05, 2007 10:28:58 +0300 Devrim GÜNDÜZ 
<[EMAIL PROTECTED]> wrote:

>
> Hi Marc,
>
> Is there a written procedure about creating tarballs? I'd like to start
> working on 8.3 RPMs and I want to know what I should to to create a
> tarball.

Just a script ... relevant bits:

/usr/bin/cvs -d /cvsroot -q export -rREL7_3_19 pgsql
cd pgsql
./configure
cd doc/src
gmake postgres.tar.gz
mv postgres.tar.gz ..
gmake postgres.tar.gz
mv postgres.tar.gz ..
gmake man.tar.gz
mv man.tar.gz ..
cd sgml
gmake HISTORY INSTALL
mv -f HISTORY INSTALL ../../..
cd ../../..
gmake split-dist=yes dist
gmake maintainer-clean

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGZzuZ4QvfyHIvDvMRAij1AKDWTwj6DZWthxvuEIZE6P1PoGDHcgCg2tV/
IR2CdNIokfs363FGifAklUY=
=tcIx
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Joshua D. Drake


Joshua D. Drake wrote:

Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

Assume the following:
index on: (id, adate)
constraint CHECK(adate > '01-01-2007' AND adate < '04-01-2007');
The planner will not use the index listed above.

For what?



select adate from parent where adate = '01-25-2007'


That's unsurprising.  Searching with only a lower-order index column
value seldom wins, 'cause you've got to scan the entire index.  The
constraint is irrelevant to this.


I guess where I got confused is:

http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html

And explicitly:

A multicolumn B-tree index can be used with query conditions that 
involve any subset of the index's columns, but the index is most 
efficient when there are constraints on the leading (leftmost) columns.


Considering the paragraph from the documentation above, should we change 
the documentation?


Joshua D. Drake




Sincerely,

Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster







---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> I guess where I got confused is:
>> 
>> http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html
>> 
>> And explicitly:
>> 
>> A multicolumn B-tree index can be used with query conditions that 
>> involve any subset of the index's columns, but the index is most 
>> efficient when there are constraints on the leading (leftmost) columns.

> Considering the paragraph from the documentation above, should we change 
> the documentation?

That statement seems perfectly accurate to me.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Joshua D. Drake


Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

I guess where I got confused is:

http://www.postgresql.org/docs/8.1/static/indexes-multicolumn.html

And explicitly:

A multicolumn B-tree index can be used with query conditions that 
involve any subset of the index's columns, but the index is most 
efficient when there are constraints on the leading (leftmost) columns.


Considering the paragraph from the documentation above, should we change 
the documentation?


That statement seems perfectly accurate to me.


O.k. then perhaps I am being dense, but that statement says to me that 
the planner should be able to use the right element of a composite index 
but that it will not always do so.


Considering an index of a,b if I search for b I would expect that the 
planner could use the index. Assuming of course that the planner would 
use the same index if it was just b.


Further, I would expect a smaller chance of it using b if the index was 
a,c,b but that it "might" still use it.


Is that not the case? Should I expect that even in the simplest of cases 
that we will not use an index unless it is *the* leftmost element?


Sincerely,

Joshua D. Drake






regards, tom lane




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: Composite index planner issues Was: Re: [HACKERS] Constraint exclusion oddity with composite index

2007-06-06 Thread Tom Lane

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> That statement seems perfectly accurate to me.

> Considering an index of a,b if I search for b I would expect that the 
> planner could use the index.

It can.  Whether it will think that's a good idea is another question
entirely, and one that seems a bit beyond the scope of the discussion
you're mentioning.

Try forcing the issue with enable_seqscan, and see what sort of
estimated and actual costs you get ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far

2007-06-06 Thread Jeff Davis

On Wed, 2007-06-06 at 22:36 +0100, Simon Riggs wrote:
> >   .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
> >  is considered to be "later" than any other xid.
> 
> So you are bumping FirstNormalTransactionId up by one for this?
> 
> You're assuming then that we will "freeze" replay while we run a query?
> Otherwise doing this will mean the snapshot changes as a query executes.

Is it possible to put a normal xmax for the snapshot?

It wouldn't be a real transaction on the slave, and also the master will
use that ID for a real transaction itself. However, I don't see a real
problem on the slave because it would only be used for the purpose of
the snapshot we need at that moment.

Regards,
Jeff Davis 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Vacuuming anything zeroes shared table stats

2007-06-06 Thread Michael Fuhr

Is vacuuming any table supposed to zero the statistics for all
shared tables?  Doesn't that have implications for autovacuum?  The
example below is in 8.2.4 but I'm seeing similar behavior in 8.1.9
and 8.3devel.  Additionally, in 8.3devel doing anything that queries
or modifies a shared table seems to zero the statistics for all
shared tables.


test=> select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del
test-> from pg_stat_all_tables
test-> where relid in (select oid from pg_class where relisshared)
test-> order by relname;
 relname  | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del 
--+--+--+---+---+---
 pg_auth_members  |   25 |3 | 1 | 0 | 1
 pg_authid|7 |   40 | 0 | 0 | 0
 pg_database  |2 |7 | 0 | 0 | 0
 pg_pltemplate|2 |0 | 0 | 0 | 0
 pg_shdepend  |0 |4 | 2 | 0 | 2
 pg_shdescription |2 |0 | 0 | 0 | 0
 pg_tablespace|2 |0 | 0 | 0 | 0
 pg_toast_1260|1 |0 | 0 | 0 | 0
 pg_toast_1262|1 |0 | 0 | 0 | 0
 pg_toast_2396|1 |0 | 0 | 0 | 0
(10 rows)

test=> vacuum foo;
VACUUM

test=> select relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del
test-> from pg_stat_all_tables
test-> where relid in (select oid from pg_class where relisshared)
test-> order by relname;
 relname  | seq_scan | idx_scan | n_tup_ins | n_tup_upd | n_tup_del 
--+--+--+---+---+---
 pg_auth_members  |0 |0 | 0 | 0 | 0
 pg_authid|0 |0 | 0 | 0 | 0
 pg_database  |1 |0 | 0 | 0 | 0
 pg_pltemplate|0 |0 | 0 | 0 | 0
 pg_shdepend  |0 |0 | 0 | 0 | 0
 pg_shdescription |0 |0 | 0 | 0 | 0
 pg_tablespace|0 |0 | 0 | 0 | 0
 pg_toast_1260|0 |0 | 0 | 0 | 0
 pg_toast_1262|0 |0 | 0 | 0 | 0
 pg_toast_2396|0 |0 | 0 | 0 | 0
(10 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Vacuuming anything zeroes shared table stats

2007-06-06 Thread Alvaro Herrera

Michael Fuhr wrote:
> Is vacuuming any table supposed to zero the statistics for all
> shared tables?

Huh, certainly not.  In any case, I think the problem may be related to
the fact that stats for shared tables are kept in a separate hash from
regular tables.

I'll investigate the issue tomorrow -- thanks for reporting.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [COMMITTERS] pgsql: Create a GUC parametertemp_tablespacesthat allows selection of

2007-06-06 Thread Jaime Casanova


On 6/4/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Perhaps a reasonable compromise could work like this: at the first point
in a transaction where a temp file is created, choose a random list
element, and thereafter advance cyclically for the duration of that
transaction.  This ensures within-transaction spread-out while still
having some randomness across backends.

The reason I'm thinking per-transaction is that we could tie this to
setting up a cached list of tablespace OIDs, which would avoid the
overhead of repeat parsing and tablespace validity checking.  We had
rejected using a long-lived cache because of the problem of tablespaces
getting dropped, but I think one that lasts only across a transaction
would be OK.

And the reason I'm thinking a cache is important is that if you really
want to get any win from this idea, you need to spread the temp files
across tablespaces *per file*, which is not the way it works now.


ok. are you doing this? or can i prepare a patch that implements this?
i guess we can allocate the memory for the list in TopTransactionContext.

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc

2007-06-06 Thread Jim C. Nasby

On Sat, Jun 02, 2007 at 01:37:19PM +, Tasneem Memon wrote:
> We can make the system ask the user as to what membership degree s/he wants 
> to get the values, but we don?t want to make the system interactive, where a 
> user gives a membership degree value of his/her choice. These operators are 
> supposed to work just like the other operators in SQL.. you just put them in 
> the query and get a result. I have put 0.8 because all the case studies I 
> have made for the NEAR,  0.8 seems to be the best choice.. 0.9 narrows the 
> range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, 
> these values will no more seem to be irrelevant when we haven?t got any 
> values till the md 0.8, so the operator fetches them when they are the 
> NEARest. 
  
While having them function just like any other operator is good, it
seems like you're making quite a bit of an assumption for the user;
namely that you know what their data looks like better than they might.
Is it not possible that someone would come along with a dataset that
looks different enough from your test cases so that the values you
picked wouldn't work?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpqZoDw3txQR.pgp
Description: PGP signature

[HACKERS] is_array_type vs type_is_array

2007-06-06 Thread Jeremy Drake

Was there some change in functionality reason for renaming is_array_type
to type_is_array?  It broke compilation of fulldisjunctions, which I build
and run regression tests on in my sandbox to keep it getting too horribly
broken with respect to current HEAD.  I got it to build and pass its
regression tests by adding this:

+ #if !defined(is_array_type) && defined(type_is_array)
+ #define is_array_type(x) type_is_array(x)
+ #endif

to the beginning of the one file which uses is_array_type.  Is this
reasonable to send back to the fulldisjunctions maintainer, or is there
some subtle change that prompted the name change to make uses of this
function immediately apparent?


-- 
Ducharme's Axiom:
If you view your problem closely enough you will recognize
yourself as part of the problem.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] is_array_type vs type_is_array

2007-06-06 Thread Tom Lane

Jeremy Drake <[EMAIL PROTECTED]> writes:
> Was there some change in functionality reason for renaming is_array_type
> to type_is_array?

Just to sync style with type_is_enum ... there were more of the latter
than the former.

> It broke compilation of fulldisjunctions,

Sorry, but we change internal APIs every day, and twice on Sundays.
Deal with it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] is_array_type vs type_is_array

2007-06-06 Thread Jeremy Drake

On Thu, 7 Jun 2007, Tom Lane wrote:

> Jeremy Drake <[EMAIL PROTECTED]> writes:
> > Was there some change in functionality reason for renaming is_array_type
> > to type_is_array?
>
> Just to sync style with type_is_enum ... there were more of the latter
> than the former.

OK, so it is safe to just #define one to the other, right?

> > It broke compilation of fulldisjunctions,
>
> Sorry, but we change internal APIs every day, and twice on Sundays.
> Deal with it.

This is why I build fulldisjunctions in my sandbox, because when I decided
to try it out one time, I could not get it to compile.  After much effort
getting it happy with HEAD and sending the changes back to the maintainer
of fulldisjunctions, I thought a good thing for me to contribute is to
make sure it continues to compile cleanly against HEAD and send patches
when it breaks.  I just wanted to make sure that the functionality of this
function did not change in some way that I did not see before sending a
patch to the maintainer of fulldisjunctions.  "Deal with it" was not the
response I was expecting, as that is exactly what I am trying to do ;)

-- 
It is generally agreed that "Hello" is an appropriate greeting because
if you entered a room and said "Goodbye," it could confuse a lot of
people.
-- Dolph Sharp, "I'm O.K., You're Not So Hot"

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

45 matches

Mail list logo