Re: [HACKERS] Selecting a constant question: A summary

2007-06-13 Thread Chuck McDevitt
I see... PostgreSQL wants to guess the datatype, given no clear
syntactic information, and perhaps a varchar(n) wouldn't be a valid cast
to some of the possible datatypes.

So,  where x = '(1,2)' might be legal for comparing to x, but a field of
type varchar(5) might not be, as in where x = y, where y is type
varchar(5) containing '(1,2)'. 

(Time values don't have this problem in pure ANSI SQL, since the literal
is TIME '12:34', but I can see for user types it might be ambiguous).

I find PostgreSQL's handling of this strange, as I come from systems
where 'xxx' is either a varchar or char type, in all contexts, and
implicit casts handle any needed conversions.
But now I understand why it does things this way.

Thanks.


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 12, 2007 9:50 PM
To: Chuck McDevitt
Cc: Andrew Hammond; Josh Berkus; pgsql-hackers@postgresql.org; Dann
Corbit; Larry McGhaw
Subject: Re: [HACKERS] Selecting a constant question: A summary

Chuck McDevitt [EMAIL PROTECTED] writes:
 Just a curiosity question:  Why is the type of a literal '1' unknown
 instead of varchar(1)?

Because, for instance, it might be intended as an integer or float or
numeric value.  Change the content a little, like '(1,2)' or '12:34',
and maybe it's a point or time value.  There are plenty of contexts in
which the intended type of a literal is obviously not text/varchar.

We assign unknown initially as a way of flagging that the type
assignment is uncertain.  Once we have a value that we think is varchar
(a table column for instance), the rules for deciding to cast it to a
different type get a lot more stringent.

regards, tom lane



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] DROP TABLE and autovacuum

2007-06-13 Thread ITAGAKI Takahiro
If we tries to drop the table on which autovacuum is running, we have to
wait finish of the vacuum. However, the vacuuming effort goes to waste for
the table being dropped or rewritten. Meanwhile, we've already had the
autovacuum killer triggered in CREATE/DROP/RENAME DATABASE commands.
Can we extend the feature to several TABLE commands?

One simple solution is that every time a non-autovacuum backend tries to
access a table with a lock equal or stronger than SHARE UPDATE EXCLUSIVE,
the backend checks whether some autovacuum workers are vacuuming the table
and send SIGINT to them.

Is this worth doing? Or are there any dangerous situation in it?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Selecting a constant question: A summary

2007-06-13 Thread Martijn van Oosterhout
On Wed, Jun 13, 2007 at 02:12:37AM -0400, Chuck McDevitt wrote:
 So,  where x = '(1,2)' might be legal for comparing to x, but a field of
 type varchar(5) might not be, as in where x = y, where y is type
 varchar(5) containing '(1,2)'. 

Normally, just about every type can be converted to or from text. So if
postgres converted to varchar first you have problems with the
statement x='const' where x is type foo. It's now ambiguous since you
either convert x to varchar or the constant to foo. Instead, postgres
marks the constant as unknown and now it always gets converted because
nothing can convert to unknown.

Thus far this system has worked excellently, though not perfectly
obviously. It matches people's expectations well, which is the most
important part.

 I find PostgreSQL's handling of this strange, as I come from systems
 where 'xxx' is either a varchar or char type, in all contexts, and
 implicit casts handle any needed conversions.
 But now I understand why it does things this way.

User-defined types makes for lots of interesting choices, but they are
by far the most powerful feature of postgres and we don't want to mess
with that.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Dave Page
I was just looking at implementing some query tuning/debugging features
in pgAdmin, and was looking to use EXPLAIN output to get a list of the
base tables involved in the users' query. Unfortunately though it
doesn't include the schema name in the output which means I have no way
of telling for sure which table is being referred to (even in a single
query, consider SELECT * FROM s1.foo, s2.foo).

Looking to fix this, a comment in src/backend/commands/explain.c
indicates that this is intentional:

/* We only show the rel name, not schema name */
relname = get_rel_name(rte-relid);

Anyone know why? This seems like a bug to me given the ambiguity of
possible output.

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Selecting a constant question: A summary

2007-06-13 Thread Gregory Stark

Chuck McDevitt [EMAIL PROTECTED] writes:

 Just a curiosity question:  Why is the type of a literal '1' unknown
 instead of varchar(1)?

Even if it was assigned a text datatype it would be the unconstrainted text
not varchar(1). If we used varchar(1) then things like:

 create table foo as select 'foo';

would create a table with varchar(3) which would then complain if you tried to
insert 'foobar' into. There doesn't seem to be enough evidence that the user
intended to constrain the input to just 'foo' in that case.

Of course right now you get a table with a column of type unknown which is
very unlikely to be what the user expects.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] comparing index columns

2007-06-13 Thread Heikki Linnakangas

Tom Lane wrote:

Pavan Deolasee [EMAIL PROTECTED] writes:

I don't have much insight into the operator classes and operator families
and how they work. Where should I look for the related code ?


Primary opclass members are stored right in the Relation data struct for
you.  Since (I trust) you're only supporting this for btree, you could
just use rd_supportinfo[0] which will not even cost an fmgr lookup.
See index_getprocinfo() and callers.


There's currently no reason to limit HOT to b-trees.

How about just doing a memcmp? That would be safe, simple and fast and 
covers all interesting use cases.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Selecting a constant question: A summary

2007-06-13 Thread Zeugswetter Andreas ADI SD

 For some Unicode character sets, element_width can be as much as 4

In UTF8 one char can be up to 6 bytes, so 4 is not correct in general.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Load Distributed Checkpoints test results

2007-06-13 Thread Heikki Linnakangas
Here's results from a batch of test runs with LDC. This patch only 
spreads out the writes, fsyncs work as before. This patch also includes 
the optimization that we don't write buffers that were dirtied after 
starting the checkpoint.


http://community.enterprisedb.com/ldc/

See tests 276-280. 280 is the baseline with no patch attached, the 
others are with load distributed checkpoints with different values for 
checkpoint_write_percent. But after running the tests I noticed that the 
spreading was actually controlled by checkpoint_write_rate, which sets 
the minimum rate for the writes, so all those tests with the patch 
applied are effectively the same; the writes were spread over a period 
of 1 minute. I'll fix that setting and run more tests.


The response time graphs show that the patch reduces the max (new-order) 
response times during checkpoints from ~40-60 s to ~15-20 s. The change 
in minute by minute average is even more significant.


The change in overall average response times is also very significant. 
1.5s without patch, and ~0.3-0.4s with the patch for new-order 
transactions. That also means that we pass the TPC-C requirement that 
90th percentile of response times must be  average.



All that said, there's still significant checkpoint spikes present, even 
though they're much less severe than without the patch. I'm willing to 
settle with this for 8.3. Does anyone want to push for more testing and 
thinking of spreading the fsyncs as well, and/or adding a delay between 
writes and fsyncs?


Attached is the patch used in the tests. It still needs some love..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: doc/src/sgml/config.sgml
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.126
diff -c -r1.126 config.sgml
*** doc/src/sgml/config.sgml	7 Jun 2007 19:19:56 -	1.126
--- doc/src/sgml/config.sgml	12 Jun 2007 08:16:55 -
***
*** 1565,1570 
--- 1565,1619 
/listitem
   /varlistentry
  
+  varlistentry id=guc-checkpoint-write-percent xreflabel=checkpoint_write_percent
+   termvarnamecheckpoint_write_percent/varname (typefloating point/type)/term
+   indexterm
+primaryvarnamecheckpoint_write_percent/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ To spread works in checkpoints, each checkpoint spends the specified
+ time and delays to write out all dirty buffers in the shared buffer
+ pool. The default value is 50.0 (50% of varnamecheckpoint_timeout/).
+ This parameter can only be set in the filenamepostgresql.conf/
+ file or on the server command line.
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry id=guc-checkpoint-nap-percent xreflabel=checkpoint_nap_percent
+   termvarnamecheckpoint_nap_percent/varname (typefloating point/type)/term
+   indexterm
+primaryvarnamecheckpoint_nap_percent/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the delay between writing out all dirty buffers and flushing
+ all modified files. Make the kernel's disk writer to flush dirty buffers
+ during this time in order to reduce works in the next flushing phase.
+ The default value is 10.0 (10% of varnamecheckpoint_timeout/).
+ This parameter can only be set in the filenamepostgresql.conf/
+ file or on the server command line.
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry id=guc-checkpoint-sync-percent xreflabel=checkpoint_sync_percent
+   termvarnamecheckpoint_sync_percent/varname (typefloating point/type)/term
+   indexterm
+primaryvarnamecheckpoint_sync_percent/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ To spread works in checkpoints, each checkpoint spends the specified
+ time and delays to flush all modified files.
+ The default value is 20.0 (20% of varnamecheckpoint_timeout/).
+ This parameter can only be set in the filenamepostgresql.conf/
+ file or on the server command line.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning
termvarnamecheckpoint_warning/varname (typeinteger/type)/term
indexterm
Index: src/backend/access/transam/xlog.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.272
diff -c -r1.272 xlog.c
*** src/backend/access/transam/xlog.c	31 May 2007 15:13:01 -	1.272
--- src/backend/access/transam/xlog.c	12 Jun 2007 08:16:55 -
***
*** 398,404 
  static void exitArchiveRecovery(TimeLineID endTLI,
  	uint32 

Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Stephen Frost
* Dave Page ([EMAIL PROTECTED]) wrote:
 /* We only show the rel name, not schema name */
 relname = get_rel_name(rte-relid);
 
 Anyone know why? This seems like a bug to me given the ambiguity of
 possible output.

I'd assume it's to keep the explain output smaller with the
expectation/assumption that in general you're going to know.  A possible
work-around would be to just always provide table aliases for your
queries, as those are shown in the explain.

In terms of behaviour changes, I think it'd be nice to show the schema
name when necessary but otherwise don't, ala how '\d view' works.

Another option might be to omit the schema when an alias is provided, or
maybe even omit the entire table name in favor of the alias.

Just my 2c.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Dave Page

Stephen Frost wrote:

* Dave Page ([EMAIL PROTECTED]) wrote:

/* We only show the rel name, not schema name */
relname = get_rel_name(rte-relid);

Anyone know why? This seems like a bug to me given the ambiguity of
possible output.


I'd assume it's to keep the explain output smaller with the
expectation/assumption that in general you're going to know.  A possible
work-around would be to just always provide table aliases for your
queries, as those are shown in the explain.


I have no control over the queries themselves.


In terms of behaviour changes, I think it'd be nice to show the schema
name when necessary but otherwise don't, ala how '\d view' works.


In my case that would be awkward as pgAdmin would then need to try to 
work out what the actual table was based on the search path used for the 
users query.



Another option might be to omit the schema when an alias is provided, or
maybe even omit the entire table name in favor of the alias.


That would make it very painful as I'd need to parse the query client 
side to resolve the table names. Yeuch.


Just adding the schema name seems the most sensible and usable option - 
not to mention the easiest!


Regards, Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Stephen Frost
* Dave Page ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 In terms of behaviour changes, I think it'd be nice to show the schema
 name when necessary but otherwise don't, ala how '\d view' works.
 
 In my case that would be awkward as pgAdmin would then need to try to 
 work out what the actual table was based on the search path used for the 
 users query.

Actually, as mentioned in another thread, a function to take a table
name and a search_path and return the 'fully qualified' table name would
make that much easier, and would be useful in other situations.

 Another option might be to omit the schema when an alias is provided, or
 maybe even omit the entire table name in favor of the alias.
 
 That would make it very painful as I'd need to parse the query client 
 side to resolve the table names. Yeuch.

Indeed, if you're not constructing the queries that would make things
somewhat difficult.  Then again, parsing the explain output seems like
it's going to be rather difficult itself anyway.

 Just adding the schema name seems the most sensible and usable option - 
 not to mention the easiest!

While completely ignoring the current behaviour and likely the reason
it's done the way it is now...  explain output was, and still is
primairly, for humans to read.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Dave Page

Stephen Frost wrote:

Indeed, if you're not constructing the queries that would make things
somewhat difficult.  Then again, parsing the explain output seems like
it's going to be rather difficult itself anyway.


Well, we do that anyway - and just grabbing the base table names isn't 
too hard.


Just adding the schema name seems the most sensible and usable option - 
not to mention the easiest!


While completely ignoring the current behaviour and likely the reason
it's done the way it is now...  explain output was, and still is
primairly, for humans to read.


Humans deserve schemas as well!! :-). As for the likely reason for the 
current behaviour, well, I'd rather have precise, 
non-potentially-ambiguous info than save a few characters.


Regards, Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Lukas Kahwe Smith

Stephen Frost wrote:

* Dave Page ([EMAIL PROTECTED]) wrote:

/* We only show the rel name, not schema name */
relname = get_rel_name(rte-relid);

Anyone know why? This seems like a bug to me given the ambiguity of
possible output.


I'd assume it's to keep the explain output smaller with the
expectation/assumption that in general you're going to know.  A possible
work-around would be to just always provide table aliases for your
queries, as those are shown in the explain.


I am hoping that once we have WITH RECURSIVE, we could optionally 
provide a normalized dump into a table of the EXPLAIN output, that could 
then be easily connected the the old output using WITH RECURSIVE.


regards,
Lukas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Magnus Hagander
On Wed, Jun 13, 2007 at 01:20:25PM +0100, Dave Page wrote:
 Just adding the schema name seems the most sensible and usable option - 
 not to mention the easiest!
 
 While completely ignoring the current behaviour and likely the reason
 it's done the way it is now...  explain output was, and still is
 primairly, for humans to read.
 
 Humans deserve schemas as well!! :-). As for the likely reason for the 
 current behaviour, well, I'd rather have precise, 
 non-potentially-ambiguous info than save a few characters.

Just to open a whole new can of worms ;-)

I read an article a couple of days ago about the machine readable showplan
output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
format). It does make a lot of sense if yourp rimary interface is !=
commandline (psql), such as pgadmin or phppgadmin. The idea being that you
can stick in *all* the details you want, since you can't possibly clutter
up the display. And you stick them in a well-defined XML format (or another
format if you happen to hate XML) where the client-side program can easily
parse out whatever it needs. It's also future-proof - if you add a new
field somewhere, the client program parser won't break.

Something worth doing? Not to replace the current explain output, but as a
second option (EXPLAIN XML whatever)?

//Magnus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Alvaro Herrera
Magnus Hagander wrote:

 Just to open a whole new can of worms ;-)
 
 I read an article a couple of days ago about the machine readable showplan
 output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
 format). It does make a lot of sense if yourp rimary interface is !=
 commandline (psql), such as pgadmin or phppgadmin. The idea being that you
 can stick in *all* the details you want, since you can't possibly clutter
 up the display. And you stick them in a well-defined XML format (or another
 format if you happen to hate XML) where the client-side program can easily
 parse out whatever it needs. It's also future-proof - if you add a new
 field somewhere, the client program parser won't break.
 
 Something worth doing? Not to replace the current explain output, but as a
 second option (EXPLAIN XML whatever)?

FYI a patch was posted for this some time ago, because a friend of mine
wanted to help a student to write an EXPLAIN parsing tool.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today (Mary Gardiner)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Magnus Hagander
On Wed, Jun 13, 2007 at 08:47:30AM -0400, Alvaro Herrera wrote:
 Magnus Hagander wrote:
 
  Just to open a whole new can of worms ;-)
  
  I read an article a couple of days ago about the machine readable showplan
  output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
  format). It does make a lot of sense if yourp rimary interface is !=
  commandline (psql), such as pgadmin or phppgadmin. The idea being that you
  can stick in *all* the details you want, since you can't possibly clutter
  up the display. And you stick them in a well-defined XML format (or another
  format if you happen to hate XML) where the client-side program can easily
  parse out whatever it needs. It's also future-proof - if you add a new
  field somewhere, the client program parser won't break.
  
  Something worth doing? Not to replace the current explain output, but as a
  second option (EXPLAIN XML whatever)?
 
 FYI a patch was posted for this some time ago, because a friend of mine
 wanted to help a student to write an EXPLAIN parsing tool.

Didn't see that one. Explain in XML format? Got an URL for it, I can't seem
to find it on -patches.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Pavel Stehule


Something worth doing? Not to replace the current explain output, but as a
second option (EXPLAIN XML whatever)?

//Magnus



It's good idea. Similar situation is in stack trace output.

Pavel

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Heikki Linnakangas

Magnus Hagander wrote:

On Wed, Jun 13, 2007 at 01:20:25PM +0100, Dave Page wrote:
Just adding the schema name seems the most sensible and usable option - 
not to mention the easiest!

While completely ignoring the current behaviour and likely the reason
it's done the way it is now...  explain output was, and still is
primairly, for humans to read.
Humans deserve schemas as well!! :-). As for the likely reason for the 
current behaviour, well, I'd rather have precise, 
non-potentially-ambiguous info than save a few characters.


Just to open a whole new can of worms ;-)

I read an article a couple of days ago about the machine readable showplan
output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
format). It does make a lot of sense if yourp rimary interface is !=
commandline (psql), such as pgadmin or phppgadmin. The idea being that you
can stick in *all* the details you want, since you can't possibly clutter
up the display. And you stick them in a well-defined XML format (or another
format if you happen to hate XML) where the client-side program can easily
parse out whatever it needs. It's also future-proof - if you add a new
field somewhere, the client program parser won't break.

Something worth doing? Not to replace the current explain output, but as a
second option (EXPLAIN XML whatever)?


I agree it would be nice to have machine readable explain output.

DB2 has the concept of explain tables. Explain output is written to 
tables, which tools query and pretty print the output. I like that idea 
in principle. PostgreSQL is a relational database, so having the explain 
output in relations make sense. No need for XML or any other extra 
libraries, in either the server or client. Having the data in relational 
format allows you to query them. For example, show me all sequential 
scans, or all nodes where the estimated number of rows is off by a 
certain factor.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Stephen Frost
* Heikki Linnakangas ([EMAIL PROTECTED]) wrote:
 Something worth doing? Not to replace the current explain output, but as a
 second option (EXPLAIN XML whatever)?
 
 I agree it would be nice to have machine readable explain output.

Seconded here, I'd much rather see this as a seperate option rather than
cluttering up regular 'explain' output for the humans.  I do think we
should provide the schema name when it's not clear from the search_path
tho, since that helps the humans too. :)

 DB2 has the concept of explain tables. Explain output is written to 
 tables, which tools query and pretty print the output. I like that idea 
 in principle. PostgreSQL is a relational database, so having the explain 
 output in relations make sense. No need for XML or any other extra 
 libraries, in either the server or client. Having the data in relational 
 format allows you to query them. For example, show me all sequential 
 scans, or all nodes where the estimated number of rows is off by a 
 certain factor.

I like this approach, the only downside is someone/something needs to manage
those tables, unless you can say where the tables are to put the explain
output into or similar?  Also, with tables, if someone really wants XML
the tables can be extracted as XML.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Magnus Hagander
On Wed, Jun 13, 2007 at 02:02:24PM +0100, Heikki Linnakangas wrote:
 Magnus Hagander wrote:
 On Wed, Jun 13, 2007 at 01:20:25PM +0100, Dave Page wrote:
 Just adding the schema name seems the most sensible and usable option - 
 not to mention the easiest!
 While completely ignoring the current behaviour and likely the reason
 it's done the way it is now...  explain output was, and still is
 primairly, for humans to read.
 Humans deserve schemas as well!! :-). As for the likely reason for the 
 current behaviour, well, I'd rather have precise, 
 non-potentially-ambiguous info than save a few characters.
 
 Just to open a whole new can of worms ;-)
 
 I read an article a couple of days ago about the machine readable showplan
 output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
 format). It does make a lot of sense if yourp rimary interface is !=
 commandline (psql), such as pgadmin or phppgadmin. The idea being that you
 can stick in *all* the details you want, since you can't possibly clutter
 up the display. And you stick them in a well-defined XML format (or another
 format if you happen to hate XML) where the client-side program can easily
 parse out whatever it needs. It's also future-proof - if you add a new
 field somewhere, the client program parser won't break.
 
 Something worth doing? Not to replace the current explain output, but as a
 second option (EXPLAIN XML whatever)?
 
 I agree it would be nice to have machine readable explain output.
 
 DB2 has the concept of explain tables. Explain output is written to 
 tables, which tools query and pretty print the output. I like that idea 
 in principle. PostgreSQL is a relational database, so having the explain 
 output in relations make sense. No need for XML or any other extra 
 libraries, in either the server or client. Having the data in relational 
 format allows you to query them. For example, show me all sequential 
 scans, or all nodes where the estimated number of rows is off by a 
 certain factor.

Assuming you can actually *represent* the whole plan as tables, that would
of course work fine. But I assume you mean virtual tables? So I do
EXPLAIN whatever, and get back one or more resultssets with the data? Or do
they write it to *actual* tables in the database?

Machine-readable is of course the main point - the exact format is more of
an implementation detail.

//Magnus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Heikki Linnakangas

Magnus Hagander wrote:

On Wed, Jun 13, 2007 at 02:02:24PM +0100, Heikki Linnakangas wrote:
DB2 has the concept of explain tables. Explain output is written to 
tables, which tools query and pretty print the output. I like that idea 
in principle. PostgreSQL is a relational database, so having the explain 
output in relations make sense. No need for XML or any other extra 
libraries, in either the server or client. Having the data in relational 
format allows you to query them. For example, show me all sequential 
scans, or all nodes where the estimated number of rows is off by a 
certain factor.


Assuming you can actually *represent* the whole plan as tables, that would
of course work fine. 


Sure you can. It's just a question of how complex the schema is :).


But I assume you mean virtual tables? So I do
EXPLAIN whatever, and get back one or more resultssets with the data? Or do
they write it to *actual* tables in the database?


I'm not sure. DB2 had real tables, but I found that a bit clumsy. It was 
nice because your old explain results were accumulated, but it was also 
not nice because of that same thing.


One idea would be temporary tables.


Machine-readable is of course the main point - the exact format is more of
an implementation detail.


Agreed.

A potential problem is that as we add new node types etc., we need to 
extend the schema (whether it's a real relational schema or XML), and 
clients need to understand it. But I guess we already have the same 
problem with clients that parse the current explain output.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] comparing index columns

2007-06-13 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 How about just doing a memcmp? That would be safe, simple and fast and 
 covers all interesting use cases.

You'd have to use datumIsEqual() or equivalent, and figure out what to
do about nulls.  I think it'd work though, at least for the purposes
that HOT needs.  There are failure cases; for example a previously
not-toasted index key column could get toasted due to expansion of an
unrelated data column.  But +1 for speed over accuracy here, as long as
it can never make a false equality report.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] DROP TABLE and autovacuum

2007-06-13 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 If we tries to drop the table on which autovacuum is running, we have to
 wait finish of the vacuum. However, the vacuuming effort goes to waste for
 the table being dropped or rewritten. Meanwhile, we've already had the
 autovacuum killer triggered in CREATE/DROP/RENAME DATABASE commands.
 Can we extend the feature to several TABLE commands?
 
 One simple solution is that every time a non-autovacuum backend tries to
 access a table with a lock equal or stronger than SHARE UPDATE EXCLUSIVE,
 the backend checks whether some autovacuum workers are vacuuming the table
 and send SIGINT to them.
 
 Is this worth doing? Or are there any dangerous situation in it?

Well, one problem with this is that currently SIGINT cancels the whole
autovacuum worker, not just the table currently being processed.  I
think this can be fixed easily by improving the signal handling.

Aside from that, I don't see any problem in handling DROP TABLE like you
suggest.  But I don't feel comfortable with doing it with just any
strong locker, because that would easily starve tables from being
vacuumed at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] DROP TABLE and autovacuum

2007-06-13 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 If we tries to drop the table on which autovacuum is running, we have to
 wait finish of the vacuum. However, the vacuuming effort goes to waste for
 the table being dropped or rewritten. Meanwhile, we've already had the
 autovacuum killer triggered in CREATE/DROP/RENAME DATABASE commands.
 Can we extend the feature to several TABLE commands?

 One simple solution is that every time a non-autovacuum backend tries to
 access a table with a lock equal or stronger than SHARE UPDATE EXCLUSIVE,
 the backend checks whether some autovacuum workers are vacuuming the table
 and send SIGINT to them.

I don't think this is a good idea at all.  You're proposing putting a
dangerous sledgehammer into a core part of the system in order to fix a
fairly minor annoyance.

For the specific case of DROP TABLE, a SIGINT might be a good idea
but I don't agree with it for any weaker action.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Alvaro Herrera
Magnus Hagander wrote:
 On Wed, Jun 13, 2007 at 08:47:30AM -0400, Alvaro Herrera wrote:
  Magnus Hagander wrote:
  
   Just to open a whole new can of worms ;-)
   
   I read an article a couple of days ago about the machine readable 
   showplan
   output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
   format). It does make a lot of sense if yourp rimary interface is !=
   commandline (psql), such as pgadmin or phppgadmin. The idea being that you
   can stick in *all* the details you want, since you can't possibly clutter
   up the display. And you stick them in a well-defined XML format (or 
   another
   format if you happen to hate XML) where the client-side program can easily
   parse out whatever it needs. It's also future-proof - if you add a new
   field somewhere, the client program parser won't break.
   
   Something worth doing? Not to replace the current explain output, but as a
   second option (EXPLAIN XML whatever)?
  
  FYI a patch was posted for this some time ago, because a friend of mine
  wanted to help a student to write an EXPLAIN parsing tool.
 
 Didn't see that one. Explain in XML format? Got an URL for it, I can't seem
 to find it on -patches.

I can't find the patch itself ... maybe he didn't ever post it.  He last
talked about it here:
http://archives.postgresql.org/pgsql-hackers/2006-04/msg00455.php

BTW can I bug you to add the Message-Ids in the messages as displayed in
our archives?

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
El día que dejes de cambiar dejarás de vivir

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Aidan Van Dyk
* Alvaro Herrera [EMAIL PROTECTED] [070613 09:58]:
 
 BTW can I bug you to add the Message-Ids in the messages as displayed in
 our archives?

Yes! Yes! Yes!  Pretty please!

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


[HACKERS] Tom Lane's presentation on SERIALIZABLE etc?

2007-06-13 Thread Alvaro Herrera
Hi,

I remember there used to be somewhere a link to a talk Tom gave (on
OSCON 2001 maybe?) on transaction processing, SERIALIZABLE, SELECT FOR
UPDATE, etc.  I'm pretty sure it was linked to somewhere in the old
developers page, but I cannot find it in the new one or elsewhere.  I
searched the OSCON site but it isn't showing up either.

This one is not it:
http://www.postgresql.org/files/developer/transactions.pdf
Transaction Processing in PostgreSQL

Any clues?

Thanks,

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo (G. Lama)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Tom Lane
Dave Page [EMAIL PROTECTED] writes:
 Looking to fix this, a comment in src/backend/commands/explain.c
 indicates that this is intentional:

Quite.

 Anyone know why?

As already noted, it'd usually be clutter in lines that are too long
already.  Also, conditionally adding a schema name isn't very good
because it makes life even more complicated for programs that are
parsing EXPLAIN output (yes, there are some).

I agree with the idea of having an option to get EXPLAIN's output in
an entirely different, more machine-readable format.  Not wedded to
XML, but I fear that a pure relational structure might be too strict ---
there's a lot of variability in the entries already.  XML also could
deal naturally with nesting, whereas we'd have to jump through hoops
to represent the plan tree structure in relational form.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Magnus Hagander
On Wed, Jun 13, 2007 at 09:55:19AM -0400, Alvaro Herrera wrote:
 Magnus Hagander wrote:
  On Wed, Jun 13, 2007 at 08:47:30AM -0400, Alvaro Herrera wrote:
   Magnus Hagander wrote:
   
Just to open a whole new can of worms ;-)

I read an article a couple of days ago about the machine readable 
showplan
output in SQL Server 2005 (basically, it's EXPLAIN output but in XML
format). It does make a lot of sense if yourp rimary interface is !=
commandline (psql), such as pgadmin or phppgadmin. The idea being that 
you
can stick in *all* the details you want, since you can't possibly 
clutter
up the display. And you stick them in a well-defined XML format (or 
another
format if you happen to hate XML) where the client-side program can 
easily
parse out whatever it needs. It's also future-proof - if you add a new
field somewhere, the client program parser won't break.

Something worth doing? Not to replace the current explain output, but 
as a
second option (EXPLAIN XML whatever)?
   
   FYI a patch was posted for this some time ago, because a friend of mine
   wanted to help a student to write an EXPLAIN parsing tool.
  
  Didn't see that one. Explain in XML format? Got an URL for it, I can't seem
  to find it on -patches.
 
 I can't find the patch itself ... maybe he didn't ever post it.  He last
 talked about it here:
 http://archives.postgresql.org/pgsql-hackers/2006-04/msg00455.php
 
 BTW can I bug you to add the Message-Ids in the messages as displayed in
 our archives?

No. Because I don't know how to do that :-) And what work is done to th
archives should be done to redo the whole thing and not bandaid what we
have now.

That said, you can get the message-id if you do a view-source. It's in a
comment at the beginning of the page.

//Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Josh Tolley

On 6/13/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

Magnus Hagander wrote:
 On Wed, Jun 13, 2007 at 02:02:24PM +0100, Heikki Linnakangas wrote:
 DB2 has the concept of explain tables. Explain output is written to
 tables, which tools query and pretty print the output. I like that idea
 in principle. PostgreSQL is a relational database, so having the explain
 output in relations make sense. No need for XML or any other extra
 libraries, in either the server or client. Having the data in relational
 format allows you to query them. For example, show me all sequential
 scans, or all nodes where the estimated number of rows is off by a
 certain factor.

 Assuming you can actually *represent* the whole plan as tables, that would
 of course work fine.

Sure you can. It's just a question of how complex the schema is :).

 But I assume you mean virtual tables? So I do
 EXPLAIN whatever, and get back one or more resultssets with the data? Or do
 they write it to *actual* tables in the database?

I'm not sure. DB2 had real tables, but I found that a bit clumsy. It was
nice because your old explain results were accumulated, but it was also
not nice because of that same thing.

One idea would be temporary tables.

 Machine-readable is of course the main point - the exact format is more of
 an implementation detail.

Agreed.

A potential problem is that as we add new node types etc., we need to
extend the schema (whether it's a real relational schema or XML), and
clients need to understand it. But I guess we already have the same
problem with clients that parse the current explain output.


Oracle forces you (AFAIK) to create a set of tables to store explain
plan output, so when you EXPLAIN, it populates those tables, and then
you have to query to get it out. This is nice for admin tools that
have to parse the explain output, though it's obviously a pain for
explain-ing inside a command-line. An XML explain would be neat.

On a different sideline based on the original note of this thread,
much as EXPLAIN doesn't include the schema,  \d doesn't include the
schema to describe INHERIT relationships in 8.2.4. If you have two
tables called PARENT, in two different schemas, and a child that
inherits from one of them, \d won't tell you which of the two it
inherits from.

- Josh

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Tom Lane's presentation on SERIALIZABLE etc?

2007-06-13 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I remember there used to be somewhere a link to a talk Tom gave (on
 OSCON 2001 maybe?) on transaction processing, SERIALIZABLE, SELECT FOR
 UPDATE, etc.  I'm pretty sure it was linked to somewhere in the old
 developers page, but I cannot find it in the new one or elsewhere.  I
 searched the OSCON site but it isn't showing up either.

I think you mean

\Header{PostgreSQL Concurrency Issues}
\LFooter{Tom Lane}
\CFooter{O'Reilly Open Source Convention, July 2002}

I'm not sure where it is on the website either; there used to be a link
in the developer materials but it may be gone.  Anyway there's still a
PDF of the slides in my home directory on cvs.postgresql.org.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Truncate Permission

2007-06-13 Thread Nicholas Barr
 * Zeugswetter Andreas ADI SD ([EMAIL PROTECTED]) wrote:

   Wouldn't it be far more logical to decide that if a user has the
   permissions to do a DELETE FROM table; then they have permission to
 do
   a TRUNCATE? Why make an additional permission?
 
  Truncate doesn't fire ON DELETE triggers.

 Yes, but it would imho be ok if there are'nt any on delete triggers on
 the table.

 Nope, it doesn't follow MVCC rules properly either.  It really needs to
 be a seperate permission.

   Thanks,

   Stephen

Hi,

Thanks for all the replies. I was primarily looking for some development
to do in my spare time, and have since produced a patch for this. I assume
this patch will be put on hold, which is fine.

Would the core developers accept a patch that extended the ACL types to
support more possible permissions?

At the moment it seems as if a single 32 bit integer is used for the
permissions, with the top half being the grantable rights. I assume I
would need to extend this into two 32 bit integers, or one 64 bit integer?

Would it be worth making this two 64 bit integers whilst we are at it, or
is that just silly? I agree that making a permission for every possible
command would be overkill and somewhat time consuming, so I assume that
two 64 bit integers would also be overkill.


Nick



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Load Distributed Checkpoints test results

2007-06-13 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 The response time graphs show that the patch reduces the max (new-order)
 response times during checkpoints from ~40-60 s to ~15-20 s. 

I think that's the headline number here. The worst-case response time is
reduced from about 60s to about 17s. That's pretty impressive on its own. It
would be worth knowing if that benefit goes away if we push the machine again
to the edge of its i/o bandwidth.

 The change in overall average response times is also very significant. 1.5s
 without patch, and ~0.3-0.4s with the patch for new-order transactions. That
 also means that we pass the TPC-C requirement that 90th percentile of response
 times must be  average.

Incidentally this is backwards. the 90th percentile response time must be
greater than the average response time for that transaction.

This isn't actually a very stringent test given that most of the data points
in the 90th percentile are actually substantially below the maximum. It's
quite possible to achieve it even with maximum response times above 60s.

However TPC-E has even more stringent requirements:

During Steady State the throughput of the SUT must be sustainable for the
remainder of a Business Day started at the beginning of the Steady State.

Some aspects of the benchmark implementation can result in rather
insignificant but frequent variations in throughput when computed over
somewhat shorter periods of time. To meet the sustainable throughput
requirement, the cumulative effect of these variations over one Business
Day must not exceed 2% of the Reported Throughput.

Comment 1: This requirement is met when the throughput computed over any
period of one hour, sliding over the Steady State by increments of ten
minutes, varies from the Reported Throughput by no more than 2%.

Some aspects of the benchmark implementation can result in rather
significant but sporadic variations in throughput when computed over some
much shorter periods of time. To meet the sustainable throughput
requirement, the cumulative effect of these variations over one Business
Day must not exceed 20% of the Reported Throughput.

Comment 2: This requirement is met when the throughput level computed over
any period of ten minutes, sliding over the Steady State by increments one
minute, varies from the Reported Throughput by no more than 20%.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Dave Page

Magnus Hagander wrote:


Something worth doing? Not to replace the current explain output, but as a
second option (EXPLAIN XML whatever)?


Yeah, thats been mentioned before. I was looking to bring it up for 8.4.

/D

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Tom Lane
Josh Tolley [EMAIL PROTECTED] writes:
 On a different sideline based on the original note of this thread,
 much as EXPLAIN doesn't include the schema,  \d doesn't include the
 schema to describe INHERIT relationships in 8.2.4. If you have two
 tables called PARENT, in two different schemas, and a child that
 inherits from one of them, \d won't tell you which of the two it
 inherits from.

Yes it does, because that's actually regclass output.  It'll be
schema-qualified if the table is not visible in your search path.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Truncate Permission

2007-06-13 Thread Tom Lane
Nicholas Barr [EMAIL PROTECTED] writes:
 At the moment it seems as if a single 32 bit integer is used for the
 permissions, with the top half being the grantable rights. I assume I
 would need to extend this into two 32 bit integers, or one 64 bit integer?

Two 32-bit please.  We are still trying not to depend on 64-bit
arithmetic for any core functionality.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tom Lane's presentation on SERIALIZABLE etc?

2007-06-13 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I remember there used to be somewhere a link to a talk Tom gave (on
  OSCON 2001 maybe?) on transaction processing, SERIALIZABLE, SELECT FOR
  UPDATE, etc.  I'm pretty sure it was linked to somewhere in the old
  developers page, but I cannot find it in the new one or elsewhere.  I
  searched the OSCON site but it isn't showing up either.
 
 I think you mean
 
 \Header{PostgreSQL Concurrency Issues}
 \LFooter{Tom Lane}
 \CFooter{O'Reilly Open Source Convention, July 2002}

Yes, this one was it.

 I'm not sure where it is on the website either; there used to be a link
 in the developer materials but it may be gone.  Anyway there's still a
 PDF of the slides in my home directory on cvs.postgresql.org.

Got it, thanks!  This is excellent material and it should definitively
be in the website.  The 7.4 optimizer talk (oscon 2003) is great too.
Too bad you weren't writing new ones for subsequent releases!

Since this was publically released in the past, I assume this is OK to
distribute?

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
We're here to devour each other alive(Hobbes)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Gregory Stark

Magnus Hagander [EMAIL PROTECTED] writes:

 Assuming you can actually *represent* the whole plan as tables, that would
 of course work fine. But I assume you mean virtual tables? 

Are you saying there are data structures relational databases aren't good at
representing?

In Oracle you had to run explain and the go run a query against your
plan_table to generate a text report of the plan. It was kind of cumbersome in
the usual case but it's very convenient for gui tools which can then perform
different queries than users would run from the command-line client.

The main advantage of using real tables is that you can then let your
application run unchanged and go look at the plans that it generated from
another connection. The plan it generated may well be different from what you
would get if you tried to run the same query later in a different connection.

It also means you could go query for things like what query performed the
largest disk sort or what is the average cost/millisecond ratio or which
query nodes had the largest and smallest expected-rows/actual rows ratio etc.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Dave Page

Tom Lane wrote:

As already noted, it'd usually be clutter in lines that are too long
already.  Also, conditionally adding a schema name isn't very good
because it makes life even more complicated for programs that are
parsing EXPLAIN output (yes, there are some).


Well, yes - that's precisely what pgAdmin does, which is why I'd want to 
see the schema name all the time. Up until now though all we've done is 
graphically represent the plan, so the object names haven't really been 
an issue. To take that further and allow the user to drill down to 
further information, or to provide tools to help tune queries we need to 
know for certain what table we're dealing with.


Regards, Dave


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Andrew Dunstan



Tom Lane wrote:

I agree with the idea of having an option to get EXPLAIN's output in
an entirely different, more machine-readable format.  Not wedded to
XML, but I fear that a pure relational structure might be too strict ---
there's a lot of variability in the entries already.  XML also could
deal naturally with nesting, whereas we'd have to jump through hoops
to represent the plan tree structure in relational form.

  


I agree. XML seems like a fairly natural fit for this. Just as people 
should not try to shoehorn everything into XML, neither should they try 
to shoehorn everything into a relational format either.


Now all we need is an XML schema for it ;-)

cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Truncate Permission

2007-06-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Nicholas Barr [EMAIL PROTECTED] writes:
  At the moment it seems as if a single 32 bit integer is used for the
  permissions, with the top half being the grantable rights. I assume I
  would need to extend this into two 32 bit integers, or one 64 bit integer?
 
 Two 32-bit please.  We are still trying not to depend on 64-bit
 arithmetic for any core functionality.

Agreed.  Also, most of the time you'll only be pulling in the first one
(for a permissions check).  The second would only ever be used when a
'GRANT' is done.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Tom Lane's presentation on SERIALIZABLE etc?

2007-06-13 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Since this was publically released in the past, I assume this is OK to
 distribute?

Sure, no problem.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Lukas Kahwe Smith

Tom Lane wrote:


I agree with the idea of having an option to get EXPLAIN's output in
an entirely different, more machine-readable format.  Not wedded to
XML, but I fear that a pure relational structure might be too strict ---
there's a lot of variability in the entries already.  XML also could
deal naturally with nesting, whereas we'd have to jump through hoops
to represent the plan tree structure in relational form.


Which was my point regarding needing WITH RECURSIVE to make this truely 
useful.


XML output is nice, but only as an addition imho. Then again it would 
indeed be quite useful for external development tools.


regards,
Lukas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Aidan Van Dyk
  BTW can I bug you to add the Message-Ids in the messages as displayed in
  our archives?
 
 That said, you can get the message-id if you do a view-source. It's in a
 comment at the beginning of the page.

I hadn't seen that before...

2 bookmarklets I find useful and have in my Personal Toolbar Folder:
  - GMANE message lookup:
javascript:var id=prompt('Enter 
Message-ID','');window.location='http://news.gmane.org/find-root.php?message_id='+id;

  - Mhonarc message - GMANE lookup
  javascript:var id=prompt('Enter Mhonarc 
URL','');window.location='http://www.highrise.ca/cgi-bin/mhonarc/'+id;

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Joshua D. Drake

Andrew Dunstan wrote:



Tom Lane wrote:

I agree with the idea of having an option to get EXPLAIN's output in
an entirely different, more machine-readable format.  Not wedded to
XML, but I fear that a pure relational structure might be too strict ---
there's a lot of variability in the entries already.  XML also could
deal naturally with nesting, whereas we'd have to jump through hoops
to represent the plan tree structure in relational form.

  


I agree. XML seems like a fairly natural fit for this. Just as people 
should not try to shoehorn everything into XML, neither should they try 
to shoehorn everything into a relational format either.


Now all we need is an XML schema for it ;-)


Well I am not a big fan of XML but it certainly seems applicable in this 
case.


Joshua D. Drake




cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Joshua D. Drake

Tom Lane wrote:

Dave Page [EMAIL PROTECTED] writes:

Looking to fix this, a comment in src/backend/commands/explain.c
indicates that this is intentional:


Quite.


Anyone know why?


As already noted, it'd usually be clutter in lines that are too long
already.  Also, conditionally adding a schema name isn't very good
because it makes life even more complicated for programs that are
parsing EXPLAIN output (yes, there are some).


We shouldn't do it conditionally. We should do it explicitly. If I have 
a partitioned table with 30 child partitions, how do I know which table 
is getting the seqscan?


Joshua D. Drake




I agree with the idea of having an option to get EXPLAIN's output in
an entirely different, more machine-readable format.  Not wedded to
XML, but I fear that a pure relational structure might be too strict ---
there's a lot of variability in the entries already.  XML also could
deal naturally with nesting, whereas we'd have to jump through hoops
to represent the plan tree structure in relational form.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Dave Page [EMAIL PROTECTED] writes:
 Looking to fix this, a comment in src/backend/commands/explain.c
 indicates that this is intentional:

 Quite.

 Anyone know why?

 As already noted, it'd usually be clutter in lines that are too long
 already.  Also, conditionally adding a schema name isn't very good
 because it makes life even more complicated for programs that are
 parsing EXPLAIN output (yes, there are some).

Arguably this is a bug if it's causing pg_admin difficulties in parsing the
output. Even for a user in an environment where, for example, he has several
identical schemas and may be accidentally getting a different table than he's
expecting the current output is ambiguous.

Attached is a small patch which adds this conditionally on a guc that pg_admin
or other GUI tools could set, leaving it unchanged for users.

But it doesn't really seem like all that much clutter to add it to the scans
all the time:

  QUERY PLAN
  
--
 Nested Loop  (cost=3.77..1337.74 rows=62 width=8)
   -  Hash Join  (cost=3.77..92.30 rows=123 width=8)
 Hash Cond: (p.oid = (a.aggfnoid)::oid)
 -  Seq Scan on pg_catalog.pg_proc p  (cost=0.00..78.49 rows=2349 
width=4)
 -  Hash  (cost=2.23..2.23 rows=123 width=8)
   -  Seq Scan on pg_catalog.pg_aggregate a  (cost=0.00..2.23 
rows=123 width=8)
   -  Index Scan using pg_operator_oid_index on pg_catalog.pg_operator o  
(cost=0.00..10.11 rows=1 width=4)
 Index Cond: (o.oid = a.aggsortop)
 Filter: (NOT (subplan))
 SubPlan
   -  Index Scan using pg_amop_opr_fam_index on pg_catalog.pg_amop  
(cost=1.05..9.33 rows=1 width=0)
 Index Cond: (amopopr = $1)
 Filter: ((amopmethod = $0) AND (amoplefttype = $2) AND 
(amoprighttype = $3))
 InitPlan
   -  Seq Scan on pg_catalog.pg_am  (cost=0.00..1.05 rows=1 
width=4)
 Filter: (amname = 'btree'::name)
(16 rows)




explain-with-schema-guc.patch.gz
Description: Binary data


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 I agree. XML seems like a fairly natural fit for this. Just as people should
 not try to shoehorn everything into XML, neither should they try to shoehorn
 everything into a relational format either.

 Now all we need is an XML schema for it ;-)

 Well I am not a big fan of XML but it certainly seems applicable in this
 case.

I'm not a fan either so perhaps I'm biased, but this seems like a good example
of where it would be an *awful* idea.

Once you have an XML plan what can you do with it? All you can do is parse it
into constituent bits and display it. You cant do any sort of comparison
between plans, aggregate results, search for plans matching constraints, etc.

How would I, with XML output, do something like:

SELECT distinct node.relation 
  FROM plan_table 
 WHERE node.expected_rows  node.actual_rows*2;

or

SELECT node.type, average(node.ms/node.cost)
  FROM plan_table 
 GROUP BY node.type;

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 And you stick them in a well-defined XML format (or another
 format if you happen to hate XML) where the client-side program can easily
 parse out whatever it needs. It's also future-proof - if you add a new
 field somewhere, the client program parser won't break.

 Something worth doing? Not to replace the current explain output, but as a
 second option (EXPLAIN XML whatever)?

This reminded me of a quick function I wrote up for my PGCon talk last month. 
I've posted it on the blog:

http://people.planetpostgresql.org/greg/index.php?/archives/106-Putting-EXPLAIN-results-into-a-table.html

I'd rather see tables with a convert-to-XML function than direct XML FWIW.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200706131211
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFGcBdnvJuQZxSWSsgRA/irAJsH0ZT3wLNN4mLirsTryiK1m9gyHwCg6+9A
0MuJqGxJ9gkEIWVUeq4iXag=
=NeB/
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] \d omits schema on inherited tables (Was: EXPLAIN omits schema?)

2007-06-13 Thread Josh Tolley

On 6/13/07, Tom Lane [EMAIL PROTECTED] wrote:

Josh Tolley [EMAIL PROTECTED] writes:
 On a different sideline based on the original note of this thread,
 much as EXPLAIN doesn't include the schema,  \d doesn't include the
 schema to describe INHERIT relationships in 8.2.4. If you have two
 tables called PARENT, in two different schemas, and a child that
 inherits from one of them, \d won't tell you which of the two it
 inherits from.

Yes it does, because that's actually regclass output.  It'll be
schema-qualified if the table is not visible in your search path.


I figured it was better to start a new thread, since this changes from
the original topic. My test didn't display the schema despite the
parent not being in my search path, as shown below:

[EMAIL PROTECTED] ~]$ psql
Welcome to psql 8.2.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
  \h for help with SQL commands
  \? for help with psql commands
  \g or terminate with semicolon to execute query
  \q to quit

jtolley=# create schema a;
CREATE SCHEMA
jtolley=# create schema b;
CREATE SCHEMA
jtolley=# set search_path to a, public;
SET
jtolley=# create table parent (f int);
CREATE TABLE
jtolley=# set search_path to b, public;
SET
jtolley=# create table parent (g text);
CREATE TABLE
jtolley=# create table child () inherits (a.parent);
CREATE TABLE
jtolley=# \d child
  Table b.child
Column |  Type   | Modifiers
+-+---
f  | integer |
Inherits: parent

jtolley=# \d parent
Table b.parent
Column | Type | Modifiers
+--+---
g  | text |

jtolley=# \d a.parent
  Table a.parent
Column |  Type   | Modifiers
+-+---
f  | integer |

jtolley=# set search_path to b;
SET
jtolley=# \d child
  Table b.child
Column |  Type   | Modifiers
+-+---
f  | integer |
Inherits: parent

jtolley=# set search_path to a;
SET
jtolley=# \d b.child
  Table b.child
Column |  Type   | Modifiers
+-+---
f  | integer |
Inherits: parent

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tom Lane's presentation on SERIALIZABLE etc?

2007-06-13 Thread Oleg Bartunov

On Wed, 13 Jun 2007, Tom Lane wrote:


Alvaro Herrera [EMAIL PROTECTED] writes:

I remember there used to be somewhere a link to a talk Tom gave (on
OSCON 2001 maybe?) on transaction processing, SERIALIZABLE, SELECT FOR
UPDATE, etc.  I'm pretty sure it was linked to somewhere in the old
developers page, but I cannot find it in the new one or elsewhere.  I
searched the OSCON site but it isn't showing up either.


I think you mean

\Header{PostgreSQL Concurrency Issues}
\LFooter{Tom Lane}
\CFooter{O'Reilly Open Source Convention, July 2002}

I'm not sure where it is on the website either; there used to be a link
in the developer materials but it may be gone.  Anyway there's still a
PDF of the slides in my home directory on cvs.postgresql.org.


Interesting, that google points to my page
http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/concurrency.pdf




regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-13 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 
 Alvaro Herrera [EMAIL PROTECTED] wrote:
 
   No, I meant a while (sleep 1(or 10) and counter  longtime) check for
   exit instead of sleep longtime.
  
  Ah; yes, what I was proposing (or thought about proposing, not sure if I
  posted it or not) was putting a upper limit of 10 seconds in the sleep
  (bgwriter sleeps 10 seconds if configured to not do anything).  Though
  10 seconds may seem like an eternity for systems like the ones Peter was
  talking about, where there is a script trying to restart the server as
  soon as the postmaster dies.
 
 Here is a patch for split-sleep of autovacuum_naptime.
 
 There are some other issues in CVS HEAD; We use the calculation
 {autovacuum_naptime * 100} in launcher_determine_sleep().
 The result will be corrupted if we set autovacuum_naptime to 2147.

Ugh.  How about this patch; this avoids the overflow issue altogether.
I am not sure that this works on Win32 but it seems we are already using
struct timeval elsewhere, so I don't see why it wouldn't work.


 In another place, we use {autovacuum_naptime * 1000}, so we should
 set the upper bound to INT_MAX/1000 instead of INT_MAX.
 Incidentally, we've already had the same protections for 
 log_min_duration_statement and log_autovacuum.

Hmm, yes, the naptime should have an upper bound of INT_MAX/1000.  It
doesn't seem worth the trouble of changing those places, when we know
that such a high value of naptime is uselessly high.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/postmaster/autovacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.49
diff -c -p -r1.49 autovacuum.c
*** src/backend/postmaster/autovacuum.c	8 Jun 2007 21:21:28 -	1.49
--- src/backend/postmaster/autovacuum.c	13 Jun 2007 17:27:39 -
***
*** 18,23 
--- 18,24 
  
  #include signal.h
  #include sys/types.h
+ #include sys/time.h
  #include time.h
  #include unistd.h
  
*** int			autovacuum_vac_cost_limit;
*** 73,78 
--- 74,83 
  
  int			Log_autovacuum = -1;
  
+ 
+ /* maximum sleep duration in the launcher */
+ #define AV_SLEEP_QUANTUM 10
+ 
  /* Flags to tell if we are in an autovacuum process */
  static bool am_autovacuum_launcher = false;
  static bool am_autovacuum_worker = false;
*** NON_EXEC_STATIC void AutoVacWorkerMain(i
*** 197,203 
  NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]);
  
  static Oid do_start_worker(void);
! static uint64 launcher_determine_sleep(bool canlaunch, bool recursing);
  static void launch_worker(TimestampTz now);
  static List *get_database_list(void);
  static void rebuild_database_list(Oid newdb);
--- 202,209 
  NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]);
  
  static Oid do_start_worker(void);
! static void launcher_determine_sleep(bool canlaunch, bool recursing,
! 		 struct timeval *nap);
  static void launch_worker(TimestampTz now);
  static List *get_database_list(void);
  static void rebuild_database_list(Oid newdb);
*** AutoVacLauncherMain(int argc, char *argv
*** 487,493 
  
  	for (;;)
  	{
! 		uint64		micros;
  		bool	can_launch;
  		TimestampTz current_time = 0;
  
--- 493,499 
  
  	for (;;)
  	{
! 		struct timeval nap;
  		bool	can_launch;
  		TimestampTz current_time = 0;
  
*** AutoVacLauncherMain(int argc, char *argv
*** 498,508 
  		if (!PostmasterIsAlive(true))
  			exit(1);
  
! 		micros = launcher_determine_sleep(AutoVacuumShmem-av_freeWorkers !=
! 		  INVALID_OFFSET, false);
  
! 		/* Sleep for a while according to schedule */
! 		pg_usleep(micros);
  
  		/* the normal shutdown case */
  		if (avlauncher_shutdown_request)
--- 504,542 
  		if (!PostmasterIsAlive(true))
  			exit(1);
  
! 		launcher_determine_sleep(AutoVacuumShmem-av_freeWorkers !=
!    INVALID_OFFSET, false, nap);
! 
! 		/*
! 		 * Sleep for a while according to schedule.  We only sleep in
! 		 * AV_SLEEP_QUANTUM sec intervals, in order to promptly notice
! 		 * postmaster death.
! 		 */
! 		while (nap.tv_sec  0 || nap.tv_usec  0)
! 		{
! 			uint32	sleeptime;
! 
! 			sleeptime = nap.tv_usec;
! 			nap.tv_usec = 0;
  
! 			if (nap.tv_sec  0)
! 			{
! sleeptime += Min(nap.tv_sec, AV_SLEEP_QUANTUM) * 100;
! nap.tv_sec -= Min(nap.tv_sec, AV_SLEEP_QUANTUM);
! 			}
! 			
! 			pg_usleep(sleeptime);
! 
! 			/*
! 			 * Emergency bailout if postmaster has died.  This is to avoid the
! 			 * necessity for manual cleanup of all postmaster children.
! 			 */
! 			if (!PostmasterIsAlive(true))
! exit(1);
! 
! 			if (avlauncher_shutdown_request || got_SIGHUP || got_SIGUSR1)
! break;
! 		}
  
  		/* the normal shutdown case */
  		if (avlauncher_shutdown_request)
*** AutoVacLauncherMain(int argc, char 

[HACKERS] tsearch_core patch: permissions and security issues

2007-06-13 Thread Tom Lane
I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.

The TSParser objects reference functions declared to take and return
INTERNAL arguments.  This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER.  What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function.  That trivially allows crashing the backend, and it
could allow worse security holes than that.

TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.

At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers.  (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.)  I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly?  Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand.  I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.

You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe.  I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do.  If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.

TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries.  But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data.  Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on?  How can we even recognize whether there is
stored data that will be affected by a configuration change?  (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.)  And who gets to decide which configuration is default,
anyway?

I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive.  That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path.  It's particularly bad to have
the default object be search-path-dependent.  We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.

Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept.  Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.

Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database.  It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes.  What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] \d omits schema on inherited tables (Was: EXPLAIN omits schema?)

2007-06-13 Thread Tom Lane
Josh Tolley [EMAIL PROTECTED] writes:
 On 6/13/07, Tom Lane [EMAIL PROTECTED] wrote:
 Yes it does, because that's actually regclass output.  It'll be
 schema-qualified if the table is not visible in your search path.

 I figured it was better to start a new thread, since this changes from
 the original topic. My test didn't display the schema despite the
 parent not being in my search path, as shown below:

Oh, that's what I get for looking at CVS HEAD instead of the 8.2
branch... it was changed here:

http://archives.postgresql.org/pgsql-committers/2007-02/msg00397.php

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Arguably this is a bug if it's causing pg_admin difficulties in parsing the
 output. Even for a user in an environment where, for example, he has several
 identical schemas and may be accidentally getting a different table than he's
 expecting the current output is ambiguous.

 Attached is a small patch which adds this conditionally on a guc that
 pg_admin or other GUI tools could set, leaving it unchanged for users.

That makes things *worse* not better, since now tools would have to
deal with both possibilities.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread Jim C. Nasby
On Sun, Jun 10, 2007 at 08:49:24PM +0100, Heikki Linnakangas wrote:
 Jim C. Nasby wrote:
 On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Thinking about this whole idea a bit more, it occured to me that the 
 current approach to write all, then fsync all is really a historical 
 artifact of the fact that we used to use the system-wide sync call 
 instead of fsyncs to flush the pages to disk. That might not be the best 
 way to do things in the new load-distributed-checkpoint world.
 How about interleaving the writes with the fsyncs?
 I don't think it's a historical artifact at all: it's a valid reflection
 of the fact that we don't know enough about disk layout to do low-level
 I/O scheduling.  Issuing more fsyncs than necessary will do little
 except guarantee a less-than-optimal scheduling of the writes.
 
 If we extended relations by more than 8k at a time, we would know a lot
 more about disk layout, at least on filesystems with a decent amount of
 free space.
 
 I doubt it makes that much difference. If there was a significant amount 
 of fragmentation, we'd hear more complaints about seq scan performance.
 
 The issue here is that we don't know which relations are on which drives 
 and controllers, how they're striped, mirrored etc.

Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp9v0jJYJxA0.pgp
Description: PGP signature


Re: [HACKERS] Load Distributed Checkpoints test results

2007-06-13 Thread Josh Berkus
Greg,

 However TPC-E has even more stringent requirements:

I'll see if I can get our TPCE people to test this, but I'd say that the 
existing patch is already good enough to be worth accepting based on the TPCC 
results.

However, I would like to see some community testing on oddball workloads (like 
huge ELT operations and read-only workloads) to see if the patch imposes any 
extra overhead on non-OLTP databases.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-13 Thread Alvaro Herrera
Alvaro Herrera wrote:

  Ah; yes, what I was proposing (or thought about proposing, not sure if I
  posted it or not) was putting a upper limit of 10 seconds in the sleep
  (bgwriter sleeps 10 seconds if configured to not do anything).  Though
  10 seconds may seem like an eternity for systems like the ones Peter was
  talking about, where there is a script trying to restart the server as
  soon as the postmaster dies.

Peter, is 10 seconds good enough for you?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Joshua D. Drake

Gregory Stark wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:


I agree. XML seems like a fairly natural fit for this. Just as people should
not try to shoehorn everything into XML, neither should they try to shoehorn
everything into a relational format either.

Now all we need is an XML schema for it ;-)

Well I am not a big fan of XML but it certainly seems applicable in this
case.


I'm not a fan either so perhaps I'm biased, but this seems like a good example
of where it would be an *awful* idea.

Once you have an XML plan what can you do with it? All you can do is parse it
into constituent bits and display it. You cant do any sort of comparison
between plans, aggregate results, search for plans matching constraints, etc.


Honestly, I had never even considered doing such a thing. I would just 
like a nice way to parse explain output :)


Joshua D. Drake




How would I, with XML output, do something like:

SELECT distinct node.relation 
  FROM plan_table 
 WHERE node.expected_rows  node.actual_rows*2;


or

SELECT node.type, average(node.ms/node.cost)
  FROM plan_table 
 GROUP BY node.type;





--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Can autovac try to lock multiple tables at once?

2007-06-13 Thread Tom Lane
$subject would be bad because of potential deadlocks against other
transactions that might try to exclusive-lock more than one table.

We should be OK for actual vacuum operations, but I think that if
autovac chooses to just ANALYZE multiple tables, it will do it in
one transaction and accumulate locks.  Probably the use_own_xacts
logic in vacuum() ought to special-case IsAutoVacuumWorkerProcess.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Florian G. Pflug

Gregory Stark wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

I agree. XML seems like a fairly natural fit for this. Just as people should
not try to shoehorn everything into XML, neither should they try to shoehorn
everything into a relational format either.

Now all we need is an XML schema for it ;-)

Well I am not a big fan of XML but it certainly seems applicable in this
case.


I'm not a fan either so perhaps I'm biased, but this seems like a good example
of where it would be an *awful* idea.

Once you have an XML plan what can you do with it? All you can do is parse it
into constituent bits and display it. You cant do any sort of comparison
between plans, aggregate results, search for plans matching constraints, etc.

How would I, with XML output, do something like:

SELECT distinct node.relation 
  FROM plan_table 
 WHERE node.expected_rows  node.actual_rows*2;


or

SELECT node.type, average(node.ms/node.cost)
  FROM plan_table 
 GROUP BY node.type;


I believe that XQuery actually supports such queries. So if postgres
supported XQuery (or does it already? I honestly don't know), writing
such a query wouldn't be that hard I think. The execution probably
won't be super-efficient, but for query plans that seems OK.

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread Florian G. Pflug

Heikki Linnakangas wrote:

Jim C. Nasby wrote:

On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
Thinking about this whole idea a bit more, it occured to me that the 
current approach to write all, then fsync all is really a historical 
artifact of the fact that we used to use the system-wide sync call 
instead of fsyncs to flush the pages to disk. That might not be the 
best way to do things in the new load-distributed-checkpoint world.

How about interleaving the writes with the fsyncs?

I don't think it's a historical artifact at all: it's a valid reflection
of the fact that we don't know enough about disk layout to do low-level
I/O scheduling.  Issuing more fsyncs than necessary will do little
except guarantee a less-than-optimal scheduling of the writes.


If we extended relations by more than 8k at a time, we would know a lot
more about disk layout, at least on filesystems with a decent amount of
free space.


I doubt it makes that much difference. If there was a significant amount 
of fragmentation, we'd hear more complaints about seq scan performance.


OTOH, extending a relation that uses N pages by something like
min(ceil(N/1024), 1024)) pages might help some filesystems to
avoid fragmentation, and hardly introduce any waste (about 0.1%
in the worst case). So if it's not too hard to do it might
be worthwhile, even if it turns out that most filesystems deal
well with the current allocation pattern.

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] tsearch_core patch: permissions and security issues

2007-06-13 Thread Bruce Momjian

You bring up a very good point.  There are fifteen new commands being
added for full text indexing:

alter-fulltext-config.sgml  alter-fulltext-owner.sgml
create-fulltext-dict.sgml   drop-fulltext-dict.sgml
alter-fulltext-dict.sgmlalter-fulltext-parser.sgml
create-fulltext-map.sgmldrop-fulltext-map.sgml
alter-fulltext-dictset.sgml comment-fulltext.sgml
create-fulltext-parser.sgml drop-fulltext-parser.sgml
alter-fulltext-map.sgml create-fulltext-config.sgml
drop-fulltext-config.sgml

I think encoding is a good example to follow.  We allow users to create
new conversions (CREATE CONVERSION), but we don't allow them to create
new encodings --- those are hard-coded in the backend.  Which of the
following full text objects:

config
dict
map
dictset
parser

can we hard-code into the backend, and just update for every major
release like we do for encodings?

---

Tom Lane wrote:
 I've been looking at the tsearch patch a bit, and I think there needs to
 be more thought given to the permissions required to mess around with
 tsearch configuration objects.
 
 The TSParser objects reference functions declared to take and return
 INTERNAL arguments.  This means that the underlying functions must be
 coded in C and can only be installed by a superuser, which in turn means
 that there is no scenario where it is really useful for a non-superuser
 to execute CREATE PARSER.  What's more, allowing a non-superuser to do
 it creates security holes: if you can find an unrelated function taking
 the right number of INTERNAL arguments, you can install it as a TSParser
 support function.  That trivially allows crashing the backend, and it
 could allow worse security holes than that.
 
 TSDictionary objects have exactly the same issues since they also depend
 on functions with INTERNAL arguments.
 
 At minimum this means that we should restrict CREATE/DROP/ALTER commands
 for these objects to superusers.  (Which in turn means there's no point
 in tracking an ownership column for them; every superuser is the same as
 every other one, permissions-wise.)  I'm wondering though whether this
 doesn't mean that we don't need manipulation commands for them at all.
 Is it likely that people will be adding parser or dictionary support to
 an installation on the fly?  Maybe we can just create 'em all at initdb
 time and be done, similar to the way index access methods are treated.
 This doesn't say that it's not possible to add more; you can add an
 index access method on the fly too, if you want, by inserting stuff into
 pg_am by hand.  I'm just wondering whether all that SQL-statement
 support and pg_dump support for custom parsers and dictionaries is
 really worth the code space and future maintenance effort it'll eat up.
 
 You could remove the immediate source of this objection if you could
 redesign the APIs for the underlying support functions to be more
 type-safe.  I'm not sure how feasible or useful that would be though.
 The bottom-line question here is whether developing a new parser or
 dictionary implementation is really something that ordinary users might
 do.  If not, then having all this SQL-level support for setting up
 catalog entries seems like wasted effort.
 
 TSConfiguration objects are a different story, since they have only
 type-safe dependencies on parsers, locales, and dictionaries.  But they
 still need some more thought about permissions, because AFAICS mucking
 with a configuration can invalidate some other user's data.  Do we want
 to allow runtime changes in a configuration that existing tsvector
 columns already depend on?  How can we even recognize whether there is
 stored data that will be affected by a configuration change?  (AFAICS
 the patch doesn't put anything into the pg_depend machinery that could
 deal with this.)  And who gets to decide which configuration is default,
 anyway?
 
 I'm also a bit disturbed that you've made searches for TSConfiguration
 objects be search-path-sensitive.  That is likely to create problems
 similar to those we've recently recognized for function lookup, eg,
 an insertion into a full-text-indexed column gets treated differently
 depending on the caller's search path.  It's particularly bad to have
 the default object be search-path-dependent.  We learned the hard way
 not to do that for default index operator classes; let's not make the
 same mistake again for tsearch configurations.
 
 Next, it took me a while to understand how Mapping objects fit into
 the scheme at all, and now that (I think) I understand, I'm wondering
 why treat them as an independent concept.  Seems like the mapping from
 token types to dictionaries is really a property of a configuration,
 and we ought to be handling it through options of CREATE/ALTER
 CONFIGURATION commands, not as 

Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Once you have an XML plan what can you do with it? All you can do is parse it
 into constituent bits and display it. You cant do any sort of comparison
 between plans, aggregate results, search for plans matching constraints, etc.

Sure you can, just not in SQL ;-)

Given the amount of trouble we'd have to go to to put the data into a
pure SQL format, I don't think that's exactly an ideal answer either.
I'm for making the raw EXPLAIN output be in a simple and robust format,
which people can then postprocess however they want --- including
forcing it into SQL if that's what they want.  But just because we're a
SQL database doesn't mean we should think SQL is the best answer to
every problem.

While I'm surely not an XML fanboy, it looks better suited to this
problem than a pure relational representation would be.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Controlling Load Distributed Checkpoints

2007-06-13 Thread PFC

If we extended relations by more than 8k at a time, we would know a lot
more about disk layout, at least on filesystems with a decent amount of
free space.

I doubt it makes that much difference. If there was a significant amount
of fragmentation, we'd hear more complaints about seq scan performance.

The issue here is that we don't know which relations are on which drives
and controllers, how they're striped, mirrored etc.


Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?


	My tests here show that, at least on reiserfs, after a few hours of  
benchmark torture (this represents several million write queries), table  
files become significantly fragmented. I believe the table and index files  
get extended more or less simultaneously and end up somehow a bit mixed up  
on disk. Seq scan perf suffers. reiserfs doesn't have an excellent  
fragmentation behaviour... NTFS is worse than hell in this respect. So,  
pre-alloc could be a good idea. Brutal Defrag (cp /var/lib/postgresql to  
somewhere and back) gets seq scan perf back to disk throughput.


	Also, by the way, InnoDB uses a BTree organized table. The advantage is  
that data is always clustered on the primary key (which means you have to  
use something as your primary key that isn't necessary natural, you have  
to choose it to get good clustering, and you can't always do it right, so  
it somehow, in the end, sucks rather badly). Anyway, seq-scan on InnoDB is  
very slow because, as the btree grows (just like postgres indexes) pages  
are split and scanning the pages in btree order becomes a mess of seeks.  
So, seq scan in InnoDB is very very slow unless periodic OPTIMIZE TABLE is  
applied. (caveat to the postgres TODO item implement automatic table  
clustering...)


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-06-13 Thread Andrew Dunstan


What is the state of play with this item? I think this is a must-fix bug 
for 8.3. There was a flurry of messages back in April but since then I 
don't recall seeing anything.


cheers

andrew



Mark Dilger wrote:

Mark Dilger wrote:

Bruce Momjian wrote:

Added to TODO:

* Fix cases where invalid byte encodings are accepted by the 
database,

  but throw an error on SELECT
  
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00767.php


Is anyone working on fixing this bug?


Hi, has anyone volunteered to fix this bug?  I did not see any reply 
on the mailing list to your question above.


mark


OK, I can take a stab at fixing this.  I'd like to state some 
assumptions so people can comment and reply:


I assume that I need to fix *all* cases where invalid byte encodings 
get into the database through functions shipped in the core distribution.


I assume I do not need to worry about people getting bad data into the 
system through their own database extensions.


I assume that the COPY problem discussed up-thread goes away once you 
eliminate all the paths by which bad data can get into the system.  
However, existing database installations with bad data already loaded 
will not be magically fixed with these code patches.


Do any of the string functions (see 
http://www.postgresql.org/docs/8.2/interactive/functions-string.html) 
run the risk of generating invalid utf8 encoded strings?  Do I need to 
add checks?  Are there known bugs with these functions in this regard?


If not, I assume I can add mbverify calls to the various input 
routines (textin, varcharin, etc) where invalid utf8 could otherwise 
enter the system.


I assume that this work can be limited to HEAD and that I don't need 
to back-patch it.  (I suspect this assumption is a contentious one.)


Advice and comments are welcome,




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Can autovac try to lock multiple tables at once?

2007-06-13 Thread Alvaro Herrera
Tom Lane wrote:
 $subject would be bad because of potential deadlocks against other
 transactions that might try to exclusive-lock more than one table.
 
 We should be OK for actual vacuum operations, but I think that if
 autovac chooses to just ANALYZE multiple tables, it will do it in
 one transaction and accumulate locks.  Probably the use_own_xacts
 logic in vacuum() ought to special-case IsAutoVacuumWorkerProcess.

Sure, it can do that.  I think it's easy enough to correct this problem;
see attached patch.  Should this be backpatched?  Earlier releases also
fall foul of this problem AFAICT.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
? cscope.files
? cscope.out
Index: src/backend/commands/vacuum.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.352
diff -c -p -r1.352 vacuum.c
*** src/backend/commands/vacuum.c	30 May 2007 20:11:57 -	1.352
--- src/backend/commands/vacuum.c	14 Jun 2007 01:17:19 -
*** vacuum(VacuumStmt *vacstmt, List *relids
*** 359,372 
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
  	 * only processing one relation.  For multiple relations when not within a
! 	 * transaction block, use own transactions so we can release locks sooner.
  	 */
  	if (vacstmt-vacuum)
  		use_own_xacts = true;
  	else
  	{
  		Assert(vacstmt-analyze);
! 		if (in_outer_xact)
  			use_own_xacts = false;
  		else if (list_length(relations)  1)
  			use_own_xacts = true;
--- 359,375 
  	 * For ANALYZE (no VACUUM): if inside a transaction block, we cannot
  	 * start/commit our own transactions.  Also, there's no need to do so if
  	 * only processing one relation.  For multiple relations when not within a
! 	 * transaction block, and also in an autovacuum worker, use own
! 	 * transactions so we can release locks sooner.
  	 */
  	if (vacstmt-vacuum)
  		use_own_xacts = true;
  	else
  	{
  		Assert(vacstmt-analyze);
! 		if (IsAutoVacuumWorkerProcess())
! 			use_own_xacts = true;
! 		else if (in_outer_xact)
  			use_own_xacts = false;
  		else if (list_length(relations)  1)
  			use_own_xacts = true;

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] little PITR annoyance

2007-06-13 Thread Jim Nasby
How about adding to the general TODO in case someone finds time  
before Simon?


On Jun 10, 2007, at 5:55 PM, Simon Riggs wrote:

On Sun, 2007-06-10 at 20:48 +0200, [EMAIL PROTECTED] wrote:


My questions was: why don't we start the archiving *BEFORE*  
postmaster to

make room.


The archiver is executed from the postmaster, so thats not possible.

I'm aware of that, my point is maybe the archiver doesn't need  
postmaster

to be fully operational  (responding to db requests) to be started

We could investigate where best to put some code, but it wouldn't be
executed very frequently.

I agree, OTOH, the more PITR is used on big busy db to more this may
happend.


Why not just execute the archive_command in a script, replacing
the .ready with .done files in archive_status directory when its
processed?


Sure, but if *I* can do it, why can't the system?

What do you think,


Just looked at the code. Does seem possible to start archiver  
earlier -

it has no hooks into anything else and doesn't need transactions.

Starting archiver earlier would not be the only change required, since
recovery could be very short. That will take some thought on how to
resolve.

I have other things pressing on me now, but I'll add this to my todo,
though I'll be relying on you to test it when I get round to it.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Can autovac try to lock multiple tables at once?

2007-06-13 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Sure, it can do that.  I think it's easy enough to correct this problem;
 see attached patch.  Should this be backpatched?  Earlier releases also
 fall foul of this problem AFAICT.

Yeah, because what made me think about it was a gripe from an 8.2
user ... maybe this wasn't his problem, but it could be.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EXPLAIN omits schema?

2007-06-13 Thread Nikolay Samokhvalov

On 6/13/07, Gregory Stark [EMAIL PROTECTED] wrote:

I'm not a fan either so perhaps I'm biased, but this seems like a good example
of where it would be an *awful* idea.

Once you have an XML plan what can you do with it? All you can do is parse it
into constituent bits and display it.


...and display it -- this, I suppose, covers the most frequent needs
(starting from displaying entire plans in some tools and finishing
with odd but useful examples like
http://archives.postgresql.org/pgsql-sql/2005-08/msg00046.php).


You cant do any sort of comparison
between plans, aggregate results, search for plans matching constraints, etc.


Wrong.


How would I, with XML output, do something like:

SELECT distinct node.relation
  FROM plan_table
 WHERE node.expected_rows  node.actual_rows*2;

or

SELECT node.type, average(node.ms/node.cost)
  FROM plan_table
 GROUP BY node.type;


XPath can help here. Now almost every language has XML with XPath
support. That's the point, that's why XML is suitable here -- it
simplifies application development (in this specific case ;-) ).

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match