Re: [HACKERS] sp-gist porting to postgreSQL

2004-11-10 Thread Oleg Bartunov
On Wed, 10 Nov 2004, Ramy M. Hassan wrote:
Oleg,
Thanks for your prompt reply.
Actually, I am able to create a new access method for testing and add an
operator class for the type integer using the new access method. Then
created a table with two integer fields, one indexed using the new access
method and the other using a btree index, and everything is ok so far. Even
using EXPLAIN statement for queries show that the indexes are used correctly
as they should.
I am using postgresql version 8.0.0beta3 from CVS.
I was wrong, Ramy. You could have several indices for the same type as soon
as they support different operations. I don't know if it's possible
to have them for the same operation but for different conditions.
Thanks
Ramy

-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:35 AM
To: Ramy M. Hassan; Pgsql Hackers
Cc: Teodor Sigaev; Walid G. Aref
Subject: Re: sp-gist porting to postgreSQL
Ramy,
glad to hear from you !
AFAIK, posgresql doesnt' supports several indices for the same type.
I think this is a problem of optimizer. Probably other hackers know
better. I forward your message to -hackers mailing list which is a
relevant place for GiST discussion.
regards,
Oleg
On Tue, 9 Nov 2004, Ramy M. Hassan wrote:
Dear Oleg and Teodor,
Thanks for offering help.
I have a design question for now.
Currently in the postgresql GiST implementation, I noticed that the way to

have a GiST based index is to define an operator class for a certain type
using GiST index. There is no new index type defined from the point of
view
of postgresql ( nothing is added to pg_am ). This means that for a certain

type there could only be one GiST based index. I mean that there is no way
in
the same server to use gist to implement an xtree index and a ytree  for
the
same  type even if they index different fields in different relations. is
that correct ?
What about doing it the other way ( I am talking about SP-GiST now ) , by
providing the extension writer with an API to use it to instantiate a
standalone SP-GiST based index ( for example trie index ) that has a
record
in the pg_am relation. In my point of view this would give more
flexibility,
and also would not require the extension writer to learn the postgresql
API (
maybe oneday SP-GiST will be ported to another database engine )  he will
just need to learn the SP-GiST API which will propably be less amount of
study  (and this is what GiST and SP-GiST is all about if I correctly
understand ).
Please let me know your opinions regarding to this.
Thanks
Ramy
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [Pgsphere-dev] GIST index concurrency concern

2004-11-10 Thread Oleg Bartunov
Patrick,
you didn't say us about your setup. Have you proved you've seen
locking issue for reading ? Are you sure you have no any locks in
your code ? Any tests demonstrated your problem would be great.
Oleg
On Tue, 9 Nov 2004, Patrick Clery wrote:
Oleg,
Daniel and I have both been collaborating on this structure for a while now.
We are aware that GiST reads work very fast. But won't they be paralyzed
when there are writes? Both of us are working on dating sites, and the main
problem that concerns us is a very heavy traffic load. At this point I am
planning to queue all changes to a GiST index and commit them every 10-15
minutes. Is that really necessary? It's realistic to assume here that if
there is a problem with locking the table for writes, it will be a problem in
this situation because this structure is going to be hit VERY hard (and
Daniel's situation is on an even larger scale). We hope that we can alleviate
that with a transaction queue, but this is not a simple fix. Have you seen
any projects that were under a heavy load using a GiST index, and were they
able to avoid being paralyzed somehow?
Thanks in advance,
Patrick
On Tuesday 09 November 2004 22:08, Oleg Bartunov wrote:
Oleg Bartunov [EMAIL PROTECTED]

Daniel,
concurrency is a big issue of current implementation of GiST.
But it should don't bite you for READ ops !
-hackers mailing list is a very relevant mailing list for GiST
discussions. It's pity we several times claimed to work on GiST
concurrency and recovery, but never got a chance :)
I see Neil become interested in GiST concurrency, though.
Oleg
On Tue, 9 Nov 2004, Daniel Ceregatti wrote:
Hi,
It's recently come to my attention that GIST indices suffer from
concurrency issues. I have already developed a dating sites using GIST
for use with attributes using the intarray contrib, and for Earth
distance/radius calculations using pg_sphere.
I'm wondering if I haven't shot myself in the foot here. So far, I
understand that a GIST index will be locked by a backend for any DML.
Basically I'm concerned that my database will not scale in the manner
that I was hoping, because the sites that access the database are to be
used by many multiple concurrent users, doing  some DML.
I expect my site to sustain something around 1000-3000 new user
acquisitions per day, all of which will account for an insert into 3
GIST indices. Additionally there will be people that will be updating
their attributes and locations as well, but this will probably only
account for a small fraction of the DML. We don't allow people to delete
stuff.
My concern now is this concurrency issue. My question is: Is there
anyone out there using a GIST index on a database where there's a lot of
DML? Should I be concerned with this issue at all?
If so, what can be done to minimize the impact of heavy DML on a GIST
index? I've pondered rolling all DML into queues via triggers and then
de-queuing them in one transaction every so often, like 15 minutes, via
cron. Any other suggestions?
I'm posting to this list because I understand that both Oleg and Teodor
read it, and I found no other relevant list. If I've misposted, please
accept my apology and please direct me to the appropriate list.
Thanks,
Daniel

___
Pgsphere-dev mailing list
[EMAIL PROTECTED]
http://gborg.postgresql.org/mailman/listinfo/pgsphere-dev
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] sp-gist porting to postgreSQL

2004-11-10 Thread Ramy M. Hassan
I believe that it is still possible to have several index access methods for
the same type and the same operations. But this requires that each index
access method has its own tuple in the pg_am relation and therefore
postgresql recognizes it by itself. But this is not the case with GiST based
indices. They are all recognized by postgresql as same index access method,
and from here comes the limitation.

Also, I think GiST and SP-GiST are better viewed as index classes not as
indices by themselves. So may be it is more logical to say:
CREATE INDEX index_name ON table_name USING spgist_trie(field)
Where spgist_trie is an spgist based index instance. 

Than to say:
CREATE INDEX index_name ON table_name USING spgist(field)
And depend on the operator classes to define the required support methods
for the trie function.


I am not sure I have a complete vision, but this is what I see. I would
appreciate your opinions regarding to this design issue.

Thanks
Ramy




-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 10, 2004 5:21 AM
To: Ramy M. Hassan
Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref'
Subject: RE: sp-gist porting to postgreSQL

On Wed, 10 Nov 2004, Ramy M. Hassan wrote:

 Oleg,

 Thanks for your prompt reply.
 Actually, I am able to create a new access method for testing and add an
 operator class for the type integer using the new access method. Then
 created a table with two integer fields, one indexed using the new access
 method and the other using a btree index, and everything is ok so far.
Even
 using EXPLAIN statement for queries show that the indexes are used
correctly
 as they should.
 I am using postgresql version 8.0.0beta3 from CVS.

I was wrong, Ramy. You could have several indices for the same type as soon
as they support different operations. I don't know if it's possible
to have them for the same operation but for different conditions.


 Thanks
 Ramy



 -Original Message-
 From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 10, 2004 12:35 AM
 To: Ramy M. Hassan; Pgsql Hackers
 Cc: Teodor Sigaev; Walid G. Aref
 Subject: Re: sp-gist porting to postgreSQL

 Ramy,

 glad to hear from you !
 AFAIK, posgresql doesnt' supports several indices for the same type.
 I think this is a problem of optimizer. Probably other hackers know
 better. I forward your message to -hackers mailing list which is a
 relevant place for GiST discussion.

   regards,
   Oleg


 On Tue, 9 Nov 2004, Ramy M. Hassan wrote:

 Dear Oleg and Teodor,
 Thanks for offering help.
 I have a design question for now.
 Currently in the postgresql GiST implementation, I noticed that the way
to

 have a GiST based index is to define an operator class for a certain type
 using GiST index. There is no new index type defined from the point of
 view
 of postgresql ( nothing is added to pg_am ). This means that for a
certain

 type there could only be one GiST based index. I mean that there is no
way
 in
 the same server to use gist to implement an xtree index and a ytree  for
 the
 same  type even if they index different fields in different relations. is
 that correct ?
 What about doing it the other way ( I am talking about SP-GiST now ) , by
 providing the extension writer with an API to use it to instantiate a
 standalone SP-GiST based index ( for example trie index ) that has a
 record
 in the pg_am relation. In my point of view this would give more
 flexibility,
 and also would not require the extension writer to learn the postgresql
 API (
 maybe oneday SP-GiST will be ported to another database engine )  he will
 just need to learn the SP-GiST API which will propably be less amount of
 study  (and this is what GiST and SP-GiST is all about if I correctly
 understand ).
 Please let me know your opinions regarding to this.

 Thanks

 Ramy


   Regards,
   Oleg
 _
 Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
 Sternberg Astronomical Institute, Moscow University (Russia)
 Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
 phone: +007(095)939-16-83, +007(095)939-23-83


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] sp-gist porting to postgreSQL

2004-11-10 Thread Oleg Bartunov
On Wed, 10 Nov 2004, Ramy M. Hassan wrote:
I believe that it is still possible to have several index access methods for
the same type and the same operations. But this requires that each index
access method has its own tuple in the pg_am relation and therefore
postgresql recognizes it by itself. But this is not the case with GiST based
indices. They are all recognized by postgresql as same index access method,
and from here comes the limitation.
It's possible, see contrib/intarray, for example. You can specify
opclass in CREATE INDEX command:
CREATE INDEX text_idx  on test__int using gist ( a gist__int_ops );
CREATE INDEX text_idx2 on test__int using gist ( a gist__intbig_ops );
Here  gist__int_ops and gist__intbig_ops  are different opclasses for the
same type and intended to use with different cardinality. The problem 
is how to use them (indices) automatically, how planner/optimizer could
select which indices to use.

Also, I think GiST and SP-GiST are better viewed as index classes not as
indices by themselves. So may be it is more logical to say:
CREATE INDEX index_name ON table_name USING spgist_trie(field)
Where spgist_trie is an spgist based index instance.
Than to say:
CREATE INDEX index_name ON table_name USING spgist(field)
And depend on the operator classes to define the required support methods
for the trie function.

why not use existed syntax  ?
CREATE INDEX index_name ON table_name USING spgist (fiels trie_ops)

I am not sure I have a complete vision, but this is what I see. I would
appreciate your opinions regarding to this design issue.
Teodor is rather busy right now, but he certainly knows better GiST 
internals,
so we'll wait his comments.
Thanks
Ramy

-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 5:21 AM
To: Ramy M. Hassan
Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref'
Subject: RE: sp-gist porting to postgreSQL
On Wed, 10 Nov 2004, Ramy M. Hassan wrote:
Oleg,
Thanks for your prompt reply.
Actually, I am able to create a new access method for testing and add an
operator class for the type integer using the new access method. Then
created a table with two integer fields, one indexed using the new access
method and the other using a btree index, and everything is ok so far.
Even
using EXPLAIN statement for queries show that the indexes are used
correctly
as they should.
I am using postgresql version 8.0.0beta3 from CVS.
I was wrong, Ramy. You could have several indices for the same type as soon
as they support different operations. I don't know if it's possible
to have them for the same operation but for different conditions.
Thanks
Ramy

-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:35 AM
To: Ramy M. Hassan; Pgsql Hackers
Cc: Teodor Sigaev; Walid G. Aref
Subject: Re: sp-gist porting to postgreSQL
Ramy,
glad to hear from you !
AFAIK, posgresql doesnt' supports several indices for the same type.
I think this is a problem of optimizer. Probably other hackers know
better. I forward your message to -hackers mailing list which is a
relevant place for GiST discussion.
regards,
Oleg
On Tue, 9 Nov 2004, Ramy M. Hassan wrote:
Dear Oleg and Teodor,
Thanks for offering help.
I have a design question for now.
Currently in the postgresql GiST implementation, I noticed that the way
to

have a GiST based index is to define an operator class for a certain type
using GiST index. There is no new index type defined from the point of
view
of postgresql ( nothing is added to pg_am ). This means that for a
certain

type there could only be one GiST based index. I mean that there is no
way
in
the same server to use gist to implement an xtree index and a ytree  for
the
same  type even if they index different fields in different relations. is
that correct ?
What about doing it the other way ( I am talking about SP-GiST now ) , by
providing the extension writer with an API to use it to instantiate a
standalone SP-GiST based index ( for example trie index ) that has a
record
in the pg_am relation. In my point of view this would give more
flexibility,
and also would not require the extension writer to learn the postgresql
API (
maybe oneday SP-GiST will be ported to another database engine )  he will
just need to learn the SP-GiST API which will propably be less amount of
study  (and this is what GiST and SP-GiST is all about if I correctly
understand ).
Please let me know your opinions regarding to this.
Thanks
Ramy
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, 

Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable

2004-11-10 Thread Stephan Szabo
On Tue, 10 Nov 2004, Greg Stark wrote:


 Tom Lane [EMAIL PROTECTED] writes:

  I'd like to propose that we get rid of GUC's USERLIMIT category and
  convert all the variables in it to plain SUSET.  In my mind, USERLIMIT
  is a failed experiment: it's way too complicated, and it still doesn't
  do quite what it was intended to do, because there are times when it
  can't check whether you're a superuser.
 
  The only variables that are in the category are log-verbosity-related:

 Would that mean I wouldn't be able to change the logging level on the fly at
 all?

I would think you'd still be able to do it through a security definer
wrapper function owned by a superuser.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category

2004-11-10 Thread Tom Lane
Andrew McMillan [EMAIL PROTECTED] writes:
 When tracking down gnarly problems in heavily multi-user applications
 enabling higher log levels at selective points has the potential to help
 _a lot_ with diagnostic detail, without smothering you in _every_
 detail.

Sure.  As I pointed out in the other thread, if you want to allow an app
to do this, you can make available a SECURITY DEFINER function that
performs the desired SET on its behalf.  By setting execute permissions
on the function and/or including restrictions in the function's code,
you can make this as tight or as loose a loophole as you like.  So it's
certainly possible to do what you want in any case.  I think the issue
at hand is what's appropriate to provide as hard-wired functionality.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread Tom Lane
John Hansen [EMAIL PROTECTED] writes:
 When doing CREATE or REPLACE FUNCTION of a builtin function, it seems to
 have no effect if its in the 'C language. SQL functions seem to work,
 but as neilc pointed out, it may be due to the SQL function being
 inlined.
 The builtin function is still called, not the userdefined function for
 'C' language functions.

You can't override a builtin C function that way because there is a
built-in map from function OID to builtin function address, and it's
consulted before trying to look in pg_proc.

This behavior is not really open to negotiation; not only on grounds of
speed, but on grounds of circularity.  (The functions used in the
process of looking up entries in pg_proc itself obviously must have such
a short circuit...)  You'd have to build a modified backend in which the
particular functions you want to replace are not listed in the builtin
mapping table.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category

2004-11-10 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I'd like to propose that we get rid of GUC's USERLIMIT category and
 convert all the variables in it to plain SUSET.

 Would that mean I wouldn't be able to change the logging level on the fly at
 all?

No, it would mean that you'd need to be superuser to change it.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] sp-gist porting to postgreSQL

2004-11-10 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 AFAIK, posgresql doesnt' supports several indices for the same type.

I think what's really being asked for is several operator classes for
the same type.   This most certainly *is* possible.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread Tom Lane
elein [EMAIL PROTECTED] writes:
 Isn't there a load/unload function for the .so that would work
 in this case?

Won't affect the fmgr_builtins table ... he wanted to replace a builtin,
not a previously-dynamically-loaded function.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread John Hansen
  The builtin function is still called, not the userdefined 
 function for 
  'C' language functions.
 
 You can't override a builtin C function that way because 
 there is a built-in map from function OID to builtin function 
 address, and it's consulted before trying to look in pg_proc.

That doesn't make sense, since if I delete the entry from pg_proc and
then create the funtion, everything works fine.

Eg: delete from pg_proc whete proname = 'funcname'; create function
pg_catalog.funcname();

 This behavior is not really open to negotiation; not only on 
 grounds of speed, but on grounds of circularity.  (The 
 functions used in the process of looking up entries in 
 pg_proc itself obviously must have such a short circuit...)  
 You'd have to build a modified backend in which the 
 particular functions you want to replace are not listed in 
 the builtin mapping table.
 
   regards, tom lane

Well, as someone pointed out, if it is possible to execute replace
function on a builtin, then it should work.

... John

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Increasing the length of pg_stat_activity.current_query...

2004-11-10 Thread Jan Wieck
On 11/8/2004 5:32 PM, Tom Lane wrote:
Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.
The pgstats were originally designed to give hints for tuning. That's 
why they cover cache hits vs. misses per table and numbers that can be 
used to point out missing as well as obsolete indexes. That was what led 
to the design of the pgstats file, the UDP communication and those fixed 
sizes. The goal was to let it have as little impact on the server 
performance as possible. The whole current query stuff was added later 
on request.

In my opinion it is quite pointless to attempt to transmit the last byte 
of every single query sent to the backend, when all you can get out of 
that view is a random query every 500 milliseconds.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread Tom Lane
John Hansen [EMAIL PROTECTED] writes:
 You can't override a builtin C function that way because 
 there is a built-in map from function OID to builtin function 
 address, and it's consulted before trying to look in pg_proc.

 That doesn't make sense, since if I delete the entry from pg_proc and
 then create the funtion, everything works fine.

Sure, because then the new entry has a new OID that doesn't match any
entry in the fmgr_builtin table.

 Well, as someone pointed out, if it is possible to execute replace
 function on a builtin, then it should work.

[ shrug... ]  Nobody promised that you could change any arbitrary thing
without breaking your system.  Superusers are allowed to do delete from
pg_proc, too, but that doesn't mean you'll be pleased with the results.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Increasing the length of

2004-11-10 Thread Simon Riggs
On Wed, 2004-11-10 at 20:25, Jan Wieck wrote:
 On 11/8/2004 5:32 PM, Tom Lane wrote:
 
  Another relevant question is why you are expecting to get this
  information through pgstats and not by looking in the postmaster log.
 
 The pgstats were originally designed to give hints for tuning. That's 
 why they cover cache hits vs. misses per table and numbers that can be 
 used to point out missing as well as obsolete indexes. That was what led 
 to the design of the pgstats file, the UDP communication and those fixed 
 sizes. The goal was to let it have as little impact on the server 
 performance as possible. The whole current query stuff was added later 
 on request.
 

OK, and thanks for writing it. 

Evolution is a wonderful thing... our original design point was slightly
away from where we are now.

 In my opinion it is quite pointless to attempt to transmit the last byte 
 of every single query sent to the backend, when all you can get out of 
 that view is a random query every 500 milliseconds.

If you are certain you have no queries whose text is  1 KB, or you have
no SQL that lasts  a few seconds, then increasing the UDP limit would
just be a painful waste, I agree.

My intention was towards a data warehouse situation, and my comments are
only relevant in that context. Possibly 25+% of the user base use this
style of processing. In that case, I expect queries to run for minutes
or hours.

What are the alternatives when a query is still running when you return
from lunch? Kill it? Hope? These rogue queries can be a problem, using
up much of the capacity of the system for hours.

Many such queries are generated by applications and hard to recreate.
The server is running it, so we should be able to access the SQL and
diagnose.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Increasing the length of

2004-11-10 Thread Richard Huxton
Simon Riggs wrote:
On Mon, 2004-11-08 at 22:32, Tom Lane wrote:
Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

This is only available if you log all queries, which isn't normally done
while you are in production. When you hit a long running query, you do
wish you had that enabled, and if it was you could look there. 

It would be best to leave the postmaster logging turned off, then allow
dynamic inspection of the query iff you have a rogue query.
Isn't that:
log_min_duration_statement (integer)
 Sets a minimum statement execution time (in milliseconds) for 
statement to be logged. All SQL statements that run in the time 
specified or longer will be logged with their duration. Setting this to 
zero will print all queries and their durations. Minus-one (the default) 
disables this. For example, if you set it to 250 then all SQL statements 
that run 250ms or longer will be logged. Enabling this option can be 
useful in tracking down unoptimized queries in your applications. Only 
superusers can increase this or set it to minus-one if this option is 
set by the administrator.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Increasing the length of

2004-11-10 Thread Simon Riggs
Hi Richard

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:
 Simon Riggs wrote:
  On Mon, 2004-11-08 at 22:32, Tom Lane wrote:
  
 Another relevant question is why you are expecting to get this
 information through pgstats and not by looking in the postmaster log.
  
  
  This is only available if you log all queries, which isn't normally done
  while you are in production. When you hit a long running query, you do
  wish you had that enabled, and if it was you could look there. 
  
  It would be best to leave the postmaster logging turned off, then allow
  dynamic inspection of the query iff you have a rogue query.
 
 Isn't that:
 log_min_duration_statement (integer)
 
   Sets a minimum statement execution time (in milliseconds) for 
 statement to be logged. All SQL statements that run in the time 
 specified or longer will be logged with their duration. Setting this to 
 zero will print all queries and their durations. Minus-one (the default) 
 disables this. For example, if you set it to 250 then all SQL statements 
 that run 250ms or longer will be logged. Enabling this option can be 
 useful in tracking down unoptimized queries in your applications. Only 
 superusers can increase this or set it to minus-one if this option is 
 set by the administrator.

That gets written when a statement completes, not during execution.

It's great for finding out if you have them AFTER the fact, but no good
for telling what's going on in the middle of execution.

(It's great BTW, thanks to whoever wrote it)

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Beta time early next week

2004-11-10 Thread Bruce Momjian
FYI, I am going to need a few more days to apply patches submitted in
the past few days and Magnus needs a few more days to fix the windows
signal problems.  So, rather than planning a beta for later this week, I
think we should focus on early next week.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Mark Kirkwood
I am looking at implementing this TODO item. e.g. (max case):
rewrite
SELECT max(foo) FROM bar
as
SELECT foo FROM bar ORDER BY foo DESC LIMIT 1
if there is an index on bar(foo)
Suggestions about the most suitable point in the parser/planner stage to 
perform this sort of rewrite would be most welcome! (as this would be my 
first non trivial getting of hands dirty in the code).

My initial thoughts revolved around extending the existing RULE system 
to be able to handle more general types of rewrite - like conditionals 
in SELECT rules and rewrites that change elements of the query other 
than the target relation.

Planning for future note: I would like whatever mechanism that is added 
for this MAX/MIN stuff to be amenable to more subtle things like 
aggregate navigation (see R.Kimball's article 
http://www.dbmsmag.com/9608d54.html).

regards
Mark
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Increasing the length of

2004-11-10 Thread Andrew Sullivan
On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote:
 On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:
  
  Isn't that:
  log_min_duration_statement (integer)
 
 That gets written when a statement completes, not during execution.

I've been following this thread, and I was thinking the same thing. 
I wonder how much work it'd be to have another log setting -- say
log_statement_after_min_duration (integer) -- which did what Simon
wants.  That'd more than satisfy my need, for sure.  Might the cost
of that be too high, though?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Increasing the length of

2004-11-10 Thread Andrew Sullivan
On Wed, Nov 10, 2004 at 05:51:01PM -0500, Andrew Sullivan wrote:
 log_statement_after_min_duration (integer) -- which did what Simon
 wants.  

Uh, well, not what Simon wants, of course, but which gave us a useful
capability anyway.  I agree that the full-bore profiling for the DBA
would be awful nice.  But in its absence, if you could see your
long-running query in the log after a minute, and then go do an
EXPLAIN and realise uh-oh, that's gonna take 3 days to complete and
kill it, it would be a big help.  

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] multiline CSV fields

2004-11-10 Thread Andrew Dunstan
Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
COPY CSV with fields containing embedded line end chars if the embedded 
sequence isn't the same as those of the file containing the CSV data. In 
that case we error out when reading the data in. This means there are 
cases where we can produce a CSV data file which we can't read in, which 
is not at all pleasant.

Possible approaches to the problem:
. make it a documented limitation
. have a csv read mode for backend/commands/copy.c:CopyReadLine() that 
relaxes some of the restrictions on inconsistent line endings
. escape embedded line end chars

The last really isn't an option, because the whole point of CSVs is to 
play with other programs, and my understanding is that those that 
understand multiline fields (e.g. Excel) expect them not to be escaped, 
and do not produce them escaped.

So right now I'm tossing up in my head between the first two options. Or 
maybe there's another solution I haven't thought of.

Thoughts?
cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread Neil Conway
On Thu, 2004-11-11 at 04:11, Tom Lane wrote:
 You can't override a builtin C function that way because there is a
 built-in map from function OID to builtin function address, and it's
 consulted before trying to look in pg_proc.
 
 This behavior is not really open to negotiation

Then shouldn't we disallow the CREATE OR REPLACE FUNCTION?

-Neil



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] CREATE or REPLACE function pg_catalog.*

2004-11-10 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 On Thu, 2004-11-11 at 04:11, Tom Lane wrote:
 You can't override a builtin C function that way because there is a
 built-in map from function OID to builtin function address, and it's
 consulted before trying to look in pg_proc.

 Then shouldn't we disallow the CREATE OR REPLACE FUNCTION?

We shouldn't disallow it completely; for instance you could validly
change the volatility or other attributes that way.

There might be an argument for rejecting an attempt to replace the
prolang or prosrc values of a built-in, but frankly I think it's a waste
of effort to code up such a thing ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 I am looking at implementing this TODO item. e.g. (max case):

 My initial thoughts revolved around extending the existing RULE system 
 to be able to handle more general types of rewrite - like conditionals 
 in SELECT rules and rewrites that change elements of the query other 
 than the target relation.

The rule rewriter is almost certainly the wrong place, because it has
only the most superficial understanding of a query's semantics.
Doing this processing there would require re-inventing (or at least
duplicating the execution of) a lot of the planner's query analysis
work.

My thoughts would run towards doing this after the prepqual and
prepjointree steps (probably somewhere in grouping_planner).  Even there
is a bit early since you'd have to duplicate plancat.c's extraction of
information about related indexes; but possibly it'd be reasonable to
move the add_base_rels_to_query() call out of query_planner and do it in
grouping_planner.

A more radical way of handling it would be to detect the relevance of an
indexscan in indxpath.c and generate a special kind of Path node; this
would not generalize to other sorts of things as you were hoping, but
I'm unconvinced that the mechanism is going to be very general-purpose
anyway.  The major advantage is that this would work conveniently for
comparing the cost of a rewritten query to a non-rewritten one.

How are you planning to represent the association between MIN/MAX and
particular index orderings in the system catalogs?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] sp-gist porting to postgreSQL

2004-11-10 Thread Ramy M. Hassan

On Wed, 10 Nov 2004, Oleg Bartunov wrote:
On Wed, 10 Nov 2004, Ramy M. Hassan wrote:
I believe that it is still possible to have several index access methods 
for
the same type and the same operations. But this requires that each index
access method has its own tuple in the pg_am relation and therefore
postgresql recognizes it by itself. But this is not the case with GiST 
based
indices. They are all recognized by postgresql as same index access method,
and from here comes the limitation.
It's possible, see contrib/intarray, for example. You can specify
opclass in CREATE INDEX command:
CREATE INDEX text_idx  on test__int using gist ( a gist__int_ops );
CREATE INDEX text_idx2 on test__int using gist ( a gist__intbig_ops );
Here  gist__int_ops and gist__intbig_ops  are different opclasses for the
same type and intended to use with different cardinality. The problem is how 
to use them (indices) automatically, how planner/optimizer could
select which indices to use.
This is great. I didn't know that. Thanks.

Also, I think GiST and SP-GiST are better viewed as index classes not as
indices by themselves. So may be it is more logical to say:
CREATE INDEX index_name ON table_name USING spgist_trie(field)
Where spgist_trie is an spgist based index instance.
Than to say:
CREATE INDEX index_name ON table_name USING spgist(field)
And depend on the operator classes to define the required support methods
for the trie function.

why not use existed syntax  ?
CREATE INDEX index_name ON table_name USING spgist (fiels trie_ops)
That's ok now.
The only concern now is the portability of the extensions. Currently Are there any 
plans to introduce GiST to some other DBMS ? If yes, then I think all GiST 
based indexes will have to be rewritten or atleast modified to a great 
extent, as they depend on postgresql API and how index access methods work in postgresql.
Do you see any value in defining an SP-GiST API for the extensions 
to completely isolate the extensions code from postgresql ?
Such isolation will require that SP-GiST code loads the extensions instead 
of relying on postgresql to do that so it will no longer be a matter of 
operator classes that defines extension.



I am not sure I have a complete vision, but this is what I see. I would
appreciate your opinions regarding to this design issue.
Teodor is rather busy right now, but he certainly knows better GiST 
internals,
so we'll wait his comments.

Thanks
Ramy

-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 5:21 AM
To: Ramy M. Hassan
Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref'
Subject: RE: sp-gist porting to postgreSQL
On Wed, 10 Nov 2004, Ramy M. Hassan wrote:
Oleg,
Thanks for your prompt reply.
Actually, I am able to create a new access method for testing and add an
operator class for the type integer using the new access method. Then
created a table with two integer fields, one indexed using the new access
method and the other using a btree index, and everything is ok so far.
Even
using EXPLAIN statement for queries show that the indexes are used
correctly
as they should.
I am using postgresql version 8.0.0beta3 from CVS.
I was wrong, Ramy. You could have several indices for the same type as soon
as they support different operations. I don't know if it's possible
to have them for the same operation but for different conditions.
Thanks
Ramy

-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:35 AM
To: Ramy M. Hassan; Pgsql Hackers
Cc: Teodor Sigaev; Walid G. Aref
Subject: Re: sp-gist porting to postgreSQL
Ramy,
glad to hear from you !
AFAIK, posgresql doesnt' supports several indices for the same type.
I think this is a problem of optimizer. Probably other hackers know
better. I forward your message to -hackers mailing list which is a
relevant place for GiST discussion.
regards,
Oleg
On Tue, 9 Nov 2004, Ramy M. Hassan wrote:
Dear Oleg and Teodor,
Thanks for offering help.
I have a design question for now.
Currently in the postgresql GiST implementation, I noticed that the way
to

have a GiST based index is to define an operator class for a certain type
using GiST index. There is no new index type defined from the point of
view
of postgresql ( nothing is added to pg_am ). This means that for a
certain

type there could only be one GiST based index. I mean that there is no
way
in
the same server to use gist to implement an xtree index and a ytree  for
the
same  type even if they index different fields in different relations. is
that correct ?
What about doing it the other way ( I am talking about SP-GiST now ) , by
providing the extension writer with an API to use it to instantiate a
standalone SP-GiST based index ( for example trie index ) that has a
record
in the pg_am relation. In my point of view this would give more
flexibility,
and also would not require the extension writer to learn the postgresql
API (
maybe oneday SP-GiST 

Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Alvaro Herrera
On Wed, Nov 10, 2004 at 07:18:59PM -0500, Tom Lane wrote:

 A more radical way of handling it would be to detect the relevance of an
 indexscan in indxpath.c and generate a special kind of Path node; this
 would not generalize to other sorts of things as you were hoping, but
 I'm unconvinced that the mechanism is going to be very general-purpose
 anyway.  The major advantage is that this would work conveniently for
 comparing the cost of a rewritten query to a non-rewritten one.

What about having a new column in pg_aggregate which would point to a
function that would try to optimize the aggregate's handling?

There could be multiple calls to that function along the query's way to
executor, each one at a different point (with a parameter specifying
which one it is), that would try to rewrite the query optimizing the
aggregate.  So we could optimize some aggregates at one point, and
others at a different point, whichever makes the most sense.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede (Mark Twain)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Reorganization of the translation files

2004-11-10 Thread Alvaro Herrera
On Tue, Nov 09, 2004 at 12:00:46PM -0500, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  I was thinking about organizing the translation files in a more efficient 
  manner. (not for 8.0, obviously)
  [snip]
  And it would easily solve issues like translatable strings 
  appearing in the pgport library, which has no makefile structure to support 
  translations (which would be a waste for like 5 strings), and no run-time 
  support either.
 
 Sounds like a win to me on that grounds alone; but probably the
 translators need to have the biggest say here, since they'll be affected
 the most.

The biggest downside I see is the long execution time for msgmerge.
I can live with that, especially if it will save me translating out of
memory six or seven times.

The other downside is that the cool status table will be reduced to one
line.  I can live with that too ...

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La tristeza es un muro entre dos jardines (Khalil Gibran)


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Mark Kirkwood
Tom Lane wrote:
A more radical way of handling it would be to detect the relevance of an
indexscan in indxpath.c and generate a special kind of Path node; this
would not generalize to other sorts of things as you were hoping, but
I'm unconvinced that the mechanism is going to be very general-purpose
anyway.  The major advantage is that this would work conveniently for
comparing the cost of a rewritten query to a non-rewritten one.
 

I like this point - it makes sense to check that the rewritten query is 
less costly to execute than the original!

How are you planning to represent the association between MIN/MAX and
particular index orderings in the system catalogs?
 

That is the next item to think on, we could have a rewrite catalog that 
holds possible transformations for certain functions (certain aggregates 
at this stage I guess). This is a bit like Alvaro's idea - however it 
may be better to represent it the way he suggested! 

regards
Mark
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Bruno Wolff III
On Wed, Nov 10, 2004 at 22:21:31 -0300,
  Alvaro Herrera [EMAIL PROTECTED] wrote:
 On Wed, Nov 10, 2004 at 07:18:59PM -0500, Tom Lane wrote:
 
  A more radical way of handling it would be to detect the relevance of an
  indexscan in indxpath.c and generate a special kind of Path node; this
  would not generalize to other sorts of things as you were hoping, but
  I'm unconvinced that the mechanism is going to be very general-purpose
  anyway.  The major advantage is that this would work conveniently for
  comparing the cost of a rewritten query to a non-rewritten one.
 
 What about having a new column in pg_aggregate which would point to a
 function that would try to optimize the aggregate's handling?

I think you want to store an operator class and a direction. This allows
you to figure out what indexes might be usable. This could be used
on all of the max and min aggregates and the boolean and and or aggregates.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Reorganization of the translation files

2004-11-10 Thread Aleksander Kmetec
Alvaro Herrera wrote:
The biggest downside I see is the long execution time for msgmerge.
I can live with that, especially if it will save me translating out of
memory six or seven times.
I'm using poedit which can replace empty translations with previous 
translated values for the same/similar strings. This can save you some 
time when translating common errors and command-line options; 
unfotunately, importing strings from other projects (kde, gnome, ...) 
doesn't help much. :(

Regards,
Aleksander
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Reorganization of the translation files

2004-11-10 Thread Aleksander Kmetec
I'm not sure about other translators, but I'd like to see two files: one 
for the backend and one for the utilities.

Translating strings can be a slow and tedious process and completing 
them all at the same time is not likely. With two files you can choose 
to translate the tools during one development cycle and the backend 
during the next one, while keeping some overview over your progress at 
the same time.

Regards,
Aleksander
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Jim C. Nasby
On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote:
 I am looking at implementing this TODO item. e.g. (max case):
 
 rewrite
 SELECT max(foo) FROM bar
 as
 SELECT foo FROM bar ORDER BY foo DESC LIMIT 1
 if there is an index on bar(foo)
 
Out of curiosity, will you be doing this in such a way that 

SELECT min(foo), max(foo) FROM bar

will end up as

SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC
LIMIT 1)

?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category

2004-11-10 Thread Greg Stark

Stephan Szabo [EMAIL PROTECTED] writes:

  Would that mean I wouldn't be able to change the logging level on the fly at
  all?
 
 I would think you'd still be able to do it through a security definer
 wrapper function owned by a superuser.

Oh yeah, well that would be sufficient for my purposes. 

I must say I thought the behaviour of being able to raise but not lower
logging levels beyond what the system had set was pretty slick when I first
found out about it. But it's not the most important thing in the world, as
long as there's an escape hatch.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Mark Kirkwood
Your example and ones like :
SELECT max(foo), count(foo) FROM bar
SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b
have made me realize that the scope of what should be optimized is 
somewhat subtle.

I am inclined to keep it simple (i.e rather limited) for a first cut, 
and if that works well, then look at extending to more complex rewrites.

What do you think?
Jim C. Nasby wrote:
On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote:
 

I am looking at implementing this TODO item. e.g. (max case):
rewrite
SELECT max(foo) FROM bar
as
SELECT foo FROM bar ORDER BY foo DESC LIMIT 1
if there is an index on bar(foo)
   

Out of curiosity, will you be doing this in such a way that 

SELECT min(foo), max(foo) FROM bar
will end up as
SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC
LIMIT 1)
?
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Increasing the length of pg_stat_activity.current_query...

2004-11-10 Thread Greg Stark

Simon Riggs [EMAIL PROTECTED] writes:

 My intention was towards a data warehouse situation, and my comments are
 only relevant in that context. Possibly 25+% of the user base use this
 style of processing. In that case, I expect queries to run for minutes
 or hours.

I come from the opposite angle but have also ended up with the same
conclusion. 

In an OLTP environment you can't be trying to save every single SQL query in
the log file. And saving only queries that take longer than some arbitrary
amount of time might not be capturing enough to give a good picture of what's
going on.

I like the idea of a stats daemon that's isolated from the server by something
like UDP and keeps statistics. It would let me turn off logging while still
being able to peek into what queries are running, which take the longest,
which are being executed the most often, and which are taking the most
cumulative time (which isn't necessarily the same thing as either of the other
two).

The idea of tracking cache misses is great, though in the current design a
postgres buffer cache miss doesn't necessarily mean a cache miss. If Postgres
moves to O_DIRECT then it would be a valuable statistic, or if instrumentation
to test for timing of cache hits and misses is added then it could be a good
statistic to have.

I can say that with Oracle it was *incredibly* useful to have the queries
being executed and cached queryable in the cache. The ora_explain tool that
comes with DBD::Oracle makes it extremely easy to identify queries consuming
resources, experiment with rewrites, and then copy the new query into the
application.

It would be great to have something equivalent for Postgres. It would be
extremely kludgy by comparison to have to dig through the log files for
queries. Much better would be to have an interface to access the data pgstats
gathers. But that only works if the entire query is there.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Vacuum info clarification

2004-11-10 Thread Bruce Momjian
The last two lines of VACUUM VERBOSE are:

  INFO:  free space map: 49 relations, 32 pages stored; 784 total pages needed
  DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared 
memory.

I think total pages needed should be total pages used or total
pages requested.

I am confused about the difference between stored and requested.  I
read the comments in freespace.c but am still confused.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Bruno Wolff III
On Thu, Nov 11, 2004 at 17:57:42 +1300,
  Mark Kirkwood [EMAIL PROTECTED] wrote:
 Your example and ones like :
 
 SELECT max(foo), count(foo) FROM bar
 SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b
 
 have made me realize that the scope of what should be optimized is 
 somewhat subtle.
 
 I am inclined to keep it simple (i.e rather limited) for a first cut, 
 and if that works well, then look at extending to more complex rewrites.
 
 What do you think?

I don't think you should be rewriting queries as much as providing
alternate plans and letting the rest of the optimizer decided which
plan to use. If you just rewrite a query you might lock yourself into
using a poor plan.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 What about having a new column in pg_aggregate which would point to a
 function that would try to optimize the aggregate's handling?

I can't get very excited about this, because how would you make a
reasonably stable/narrow API for such a thing?  The function as you
propose it would have to know everything about not only the planner's
data representations but the N specific places it would be called from.

The existing selectivity functions are bad enough on this score ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-10 Thread John Hansen
Why not just change the function all together to 'select $1 from $2
order by $1 desc limit 1;'

Is there ANY situation where max(col) as it is, would be faster?

... John

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Jim C. Nasby
Certainly handling only one case is better than none. I just wanted to
bring up the multiple aggregate scenario. Also, consider that

SELECT min(a), max(a), min(b), max(c) FROM table

could be optimized as well (into 4 index scans, assuming a, b, and c all
had indexes).

I don't think any other aggregates are candidates for optimization right
now, though I guess I could be wrong.

On Thu, Nov 11, 2004 at 05:57:42PM +1300, Mark Kirkwood wrote:
 Your example and ones like :
 
 SELECT max(foo), count(foo) FROM bar
 SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b
 
 have made me realize that the scope of what should be optimized is 
 somewhat subtle.
 
 I am inclined to keep it simple (i.e rather limited) for a first cut, 
 and if that works well, then look at extending to more complex rewrites.
 
 What do you think?
 
 
 Jim C. Nasby wrote:
 
 On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote:
  
 
 I am looking at implementing this TODO item. e.g. (max case):
 
 rewrite
 SELECT max(foo) FROM bar
 as
 SELECT foo FROM bar ORDER BY foo DESC LIMIT 1
 if there is an index on bar(foo)

 
 
 Out of curiosity, will you be doing this in such a way that 
 
 SELECT min(foo), max(foo) FROM bar
 
 will end up as
 
 SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC
 LIMIT 1)
 
 ?
  
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Mark Kirkwood
There seems to be (as Tom indicated) a choice of approaches:
i) rewrite max/min querys and then plan 'em
ii) provide alternate plans based on presence of certain aggregate types 
in the query

when I first examined this TODO item, I was really thinking about i), 
but I suspect that ii) is probably the best approach.

regards
Mark

Bruno Wolff III wrote:
On Thu, Nov 11, 2004 at 17:57:42 +1300,
 Mark Kirkwood [EMAIL PROTECTED] wrote:
 

Your example and ones like :
SELECT max(foo), count(foo) FROM bar
SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b
have made me realize that the scope of what should be optimized is 
somewhat subtle.

I am inclined to keep it simple (i.e rather limited) for a first cut, 
and if that works well, then look at extending to more complex rewrites.

What do you think?
   

I don't think you should be rewriting queries as much as providing
alternate plans and letting the rest of the optimizer decided which
plan to use. If you just rewrite a query you might lock yourself into
using a poor plan.
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-10 Thread Mark Kirkwood
Probably for a small table, where the machinery of reading the index, 
followed by checking the table for non-visible tuples is more costly 
than just scanning the table!

regards
Mark
John Hansen wrote:
Why not just change the function all together to 'select $1 from $2
order by $1 desc limit 1;'
Is there ANY situation where max(col) as it is, would be faster?
... John
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster