Re: [HACKERS] source documentation tool doxygen

2006-01-17 Thread Joachim Wieland
On Tue, Jan 17, 2006 at 12:15:02AM -0500, Tom Lane wrote:
 The thing seems to have only the very vaguest grasp on whether it is
 parsing C or C++ ... or should I say that it is convinced it is parsing
 C++ despite all evidence to the contrary?  I'd be happier with the
 pretty pictures if they had some connection to reality.

True. Doxygen is developped for documenting C++ source code. There is
however an option OPTIMIZE_OUTPUT_FOR_C which is set. There are a few more
options for class and collaboration graphs as well as a switch to create a
graphical class hierarchy, all of them are turned on at the moment. So it
might get more concise but we'll have to play around with it to see what is
useful and what isn't.


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1°B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-17 Thread Joachim Wieland
On Mon, Jan 16, 2006 at 07:42:35PM -0500, Robert Treat wrote:
 This was my plan all along, just been waiting for someone to make it work
 with the postgresql code and then send instructions to the postgresql web
 team on how to set it up. 

I volunteer to tell you after I've found out for myself ;-)

Some details: What I have put online on my website occupies about 240 MBs of
disk space and gets built in 1.5h on my PIII 800 Laptop. Removing useless
graphs this can be reduced to less than 200 MBs I imagine.

Do you want to put it on the postgresql.org site nevertheless? Is it too big
to be mirrored and should be recreated on every webserver? We might need
one copy for the last version of every major release as well as one for
cvs. The latter should get updated regularly of course but I figure it would
be sufficient to do that once a week...


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1°B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-17 Thread Joachim Wieland
On Tue, Jan 17, 2006 at 09:34:06AM +0900, Michael Glaesemann wrote:
 Along those lines, I wonder if a CSS couldn't be worked up to
 integrate the look with the rest of the site.

Yes, it's stylesheet based. However I don't know yet to what extend you can
change the look. It allows for a custom header and footer as well. The
postgresql logo on top would be nice but the navigation menu on the left has
to be sacrificed for more space.


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1°B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-17 Thread Andrew Dunstan
Joachim Wieland said:
 Do you want to put it on the postgresql.org site nevertheless? Is it
 too big to be mirrored and should be recreated on every webserver? We
 might need one copy for the last version of every major release as well
 as one for cvs. The latter should get updated regularly of course but I
 figure it would be sufficient to do that once a week...



The overwhelming amount of development work gets done against HEAD. I would
start with a once a day run against HEAD, and possibly one against the
latest stable branch (currently REL8_1_STABLE in cvs). That would get you
99% of the possible benefit, I think. I don't see any virtue in doing it
against a release as opposed to a stable branch - this is to help
development efforts so it should be against what people should be basing
their development efforts on.


cheers

andrew




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Bad estimate on LIKE matching

2006-01-17 Thread Magnus Hagander
I have a table, path, which is:

 pathid | integer | not null default
nextval('path_pathid_seq'::regclass)
 path   | text| not null
Indexes:
path_pkey PRIMARY KEY, btree (pathid)
path_name_idx btree (path)

The table contains approx 1.2 million rows, of which all are unique.
(both the path field and the naem field are unique, thought he
path_name_idx index is not a unique index)


On this table, I do a query like:
SELECT * FROM path WHERE path LIKE 'f:/userdirs/s/super_73/%'

The estimate for this query is comlpetely off, which I beleive is the
cause for a very bad selection of a query plan when it's used in a big
join (creating nestloops that ends up taking 15+ minutes to complete..).


Explain analyze gives:
 QUERY PLAN

---
 Index Scan using path_name_idx on path  (cost=0.00..3.24 rows=1
width=74) (actual time=0.035..0.442 rows=214 loops=1)
   Index Cond: ((path = 'f:/userdirs/s/super'::text) AND (path 
'f:/userdirs/s/supes'::text))
   Filter: (path ~~ 'f:/userdirs/s/super_73%'::text)


No matter what I search on (when it's very selective), the estimate is
always 1 row, whereas the actual value is at least a couple of hundred.
If I try with say f:/us, the difference is 377,759 estimated vs
562,459 returned, which is percentage-wise a lot less, but...

I have tried upping the statistics target up to 1000, with no changes.

Any way to teach the planner about this?

FYI, if I push the result of the select on path into a temp table, and
then join with that one in my main table, I get a hashjoin instead, and
query time is  30 seconds instead of 15 minutes...


//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)

2006-01-17 Thread Daniel Schuchardt

Hi,

here is a testcase:

CREATE OR REPLACE FUNCTION testseq()
 RETURNS void AS
$BODY$
BEGIN
CREATE TEMP SEQUENCE test;
PERFORM testseq1();
DROP SEQUENCE test;
RETURN;
END; $BODY$
 LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION testseq() OWNER TO postgres;


CREATE OR REPLACE FUNCTION testseq1()
 RETURNS void AS
$BODY$
DECLARE I INTEGER;
BEGIN
I:= nextval('test');
RETURN;
END; $BODY$
 LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION testseq1() OWNER TO postgres;


SELECT testseq();

-- this works fine.

SELECT testseq();


ERROR:  could not open relation with OID 21152
CONTEXT:  PL/pgSQL function testseq1 line 3 at assignment
SQL statement SELECT  testseq1()
PL/pgSQL function testseq line 3 at perform



Greetings,

Daniel.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Docs off on ILIKE indexing?

2006-01-17 Thread Magnus Hagander
  http://www.postgresql.org/docs/8.1/static/indexes-types.html
  says:
  The optimizer can also use a B-tree index for queries involving the 
  pattern matching operators LIKE, ILIKE, ~, and ~*, if the 
 pattern is a 
  constant and is anchored to the beginning of the string - 
 for example, 
  col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'.
 
  But really, does it use indexes for ILIKE?
 
 That's pretty poorly phrased.  For ILIKE it'll only work if 
 there's a prefix of the pattern that's not letters (and hence 
 is unaffected by the case-folding issue).

Ahh. That explains it. Perfectly logical.
And yes, that's pretty poorly phrased - at least I didn't understand it
:-)

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-17 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote:
 But our idea of the number of batches needed can change during that
 process, resulting in some inner tuples being initially assigned to the
 wrong temp file.  This would also be true for hashagg.

 So we correct that before we start reading the outer table.

Why?  That would require a useless additional pass over the data.  With
the current design, we can process and discard at least *some* of the
data in a temp file when we read it, but a reorganization pass would
mean that it *all* goes back out to disk a second time.

Also, you assume that we can accurately tell how many tuples will fit in
memory in advance of actually processing them --- a presumption clearly
false in the hashagg case, and not that easy to do even for hashjoin.
(You can tell the overall size of a temp file, sure, but how do you know
how it will split when the batch size changes?  A perfectly even split
is unlikely.)

 OK, I see what you mean. Sounds like we should have a new definition for
 Aggregates, Sort Insensitive that allows them to work when the input
 ordering does not effect the result, since that case can be optimised
 much better when using HashAgg.

Please don't propose pushing this problem onto the user until it's
demonstrated that there's no other way.  I don't want to become the
next Oracle, with forty zillion knobs that it takes a highly trained
DBA to deal with.

 But all of them sound ugly.

I was thinking along the lines of having multiple temp files per hash
bucket.  If you have a tuple that needs to migrate from bucket M to
bucket N, you know that it arrived before every tuple that was assigned
to bucket N originally, so put such tuples into a separate temp file
and process them before the main bucket-N temp file.  This might get a
little tricky to manage after multiple hash resizings, but in principle
it seems doable.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-17 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 The overwhelming amount of development work gets done against HEAD. I would
 start with a once a day run against HEAD, and possibly one against the
 latest stable branch (currently REL8_1_STABLE in cvs). That would get you
 99% of the possible benefit, I think.

I agree --- I see no reason for us to maintain such documentation for
anything except HEAD.  If somebody really wants documentation for a
stable branch, they can build it themselves.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] function caching problem

2006-01-17 Thread uwcssa
I modified postgresql 8.1 for its optimization part. Basically I reset
the whole memory context
after query optimization (but store the query plan in top level memory
context so it is not lost).

Later at certain time, I match queries to the pre-computed query plans
(to avoid re-compiling the whole query again).  My problem is that
when optimizing
the queries, the functions (date conversion, say)  is cached into
memory (current memory context). However, this cached information is
not part of the query plan generated, so when I reset memory after
query optimization, this information is gone.  As a result, I get 
cache lookup failed for function ...  error from inside 
fmgr_info_cxt_security( ) in fmgr.c

If I wish that heap memory is never released, or i wish the heap is
never used for caching function definitions, how should I do that?  or
is there any alternatives?

thanks a lot!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.

2006-01-17 Thread Tom Lane
R, Rajesh (STSD) [EMAIL PROTECTED] writes:
 But the bottomline is the default test does not include netdb.h in the
 test code.

That's odd.  Is getaddrinfo a macro on Tru64?  If so, the appropriate
patch would probably make the test look more like the tests for finite()
and friends:

dnl Cannot use AC_CHECK_FUNC because finite may be a macro
AC_MSG_CHECKING(for finite)
AC_TRY_LINK([
#include math.h
double glob_double;
],
[return finite(glob_double) ? 0 : 1;],
[AC_DEFINE(HAVE_FINITE, 1, [Define to 1 if you have finite().])
AC_MSG_RESULT(yes)],
[AC_MSG_RESULT(no)])

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)

2006-01-17 Thread Jaime Casanova
On 1/17/06, Daniel Schuchardt [EMAIL PROTECTED] wrote:
 Hi,

 here is a testcase:

 CREATE OR REPLACE FUNCTION testseq()
  RETURNS void AS
 $BODY$
 BEGIN
  CREATE TEMP SEQUENCE test;
  PERFORM testseq1();
  DROP SEQUENCE test;
  RETURN;
 END; $BODY$
  LANGUAGE 'plpgsql' VOLATILE;
 ALTER FUNCTION testseq() OWNER TO postgres;


 CREATE OR REPLACE FUNCTION testseq1()
  RETURNS void AS
 $BODY$
 DECLARE I INTEGER;
 BEGIN
  I:= nextval('test');
  RETURN;
 END; $BODY$
  LANGUAGE 'plpgsql' VOLATILE;
 ALTER FUNCTION testseq1() OWNER TO postgres;


 SELECT testseq();

 -- this works fine.

 SELECT testseq();


 ERROR:  could not open relation with OID 21152
 CONTEXT:  PL/pgSQL function testseq1 line 3 at assignment
 SQL statement SELECT  testseq1()
 PL/pgSQL function testseq line 3 at perform



 Greetings,

 Daniel.


try this way:

CREATE OR REPLACE FUNCTION testseq() RETURNS void AS
$BODY$
 BEGIN
  EXECUTE 'CREATE TEMP SEQUENCE test';
  PERFORM testseq1();
  DROP SEQUENCE test;
  RETURN;
 END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION testseq() OWNER TO postgres;


CREATE OR REPLACE FUNCTION testseq1() RETURNS void AS
$BODY$
 DECLARE I INTEGER;
 BEGIN
  EXECUTE 'select nextval(''test'')' INTO I;
raise notice '%', I;
  RETURN;
 END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION testseq1() OWNER TO postgres;

SELECT testseq();
SELECT testseq();


is the same problem as with temp tables, you must put their creation,
and in this case even the nextval in an execute...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-17 Thread Simon Riggs
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:

 The tricky part is to preserve the existing guarantee that tuples are
 merged into their aggregate in arrival order.  (This does not matter for
 the standard aggregates but it definitely does for custom aggregates,
 and there will be unhappy villagers appearing on our doorsteps if we
 break it.)  I think this can work correctly under the above sketch but
 it needs to be verified.  It might require different handling of the
 TODO files than what hashjoin does.

You almost had me there... but there isn't any arrival order. The sort
that precedes an aggregation only sorts on the GROUP BY columns, not on
additional columns - so by the SQL standard there is not a guaranteed
ordering of the data into a aggregate. That is exactly what windowed
aggregates are for. (There isn't any way of specifying an ORDER BY yet
either).

The only way of doing this is by doing a derived table
select a, sum(b) from (select a,b order by a,b);
but AFAICS this is not part of the standard??

It is highly likely that rows are clumped together, but there just isn't
any guarantee that is the case. Any update of any row would change the
arrival order. Should we support something that has worked by luck?

I've been looking into windowed aggregates; these will provide this
functionality should people require it. I don't see how we'd be able to
do windowed aggregates and hashAgg at the same time, so this seems less
relevant. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-17 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
 The tricky part is to preserve the existing guarantee that tuples are
 merged into their aggregate in arrival order.

 You almost had me there... but there isn't any arrival order.

The fact that it's not in the spec doesn't mean we don't support it.
Here are a couple of threads on the subject:
http://archives.postgresql.org/pgsql-general/2005-11/msg00304.php
http://archives.postgresql.org/pgsql-sql/2003-06/msg00135.php

Per the second message, this has worked since 7.4, and it was requested
fairly often before that.

 Should we support something that has worked by luck?

No luck about it, and yes people are depending on it.  You don't get to
break it just because it's not in the spec.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-17 Thread Simon Riggs
On Tue, 2006-01-17 at 14:41 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
  The tricky part is to preserve the existing guarantee that tuples are
  merged into their aggregate in arrival order.
 
  You almost had me there... but there isn't any arrival order.
 
 The fact that it's not in the spec doesn't mean we don't support it.
 Here are a couple of threads on the subject:
 http://archives.postgresql.org/pgsql-general/2005-11/msg00304.php
 http://archives.postgresql.org/pgsql-sql/2003-06/msg00135.php
 
 Per the second message, this has worked since 7.4, and it was requested
 fairly often before that.

OK My interest was in expanding the role of HashAgg, which as Rod
says can be used to avoid the sort, so the overlap between those ideas
was low anyway.

On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote:
 I was thinking along the lines of having multiple temp files per hash
 bucket.  If you have a tuple that needs to migrate from bucket M to
 bucket N, you know that it arrived before every tuple that was
 assigned
 to bucket N originally, so put such tuples into a separate temp file
 and process them before the main bucket-N temp file.  This might get a
 little tricky to manage after multiple hash resizings, but in
 principle
 it seems doable.

OK, so we do need to do this when we have a defined arrival order: this
idea the best one so far. I don't see any optimization of this by
ignoring the arrival order, so it seems best to preserve the ordering
this way in all cases.

You can manage that with file naming. Rows moved from batch N to batch M
would be renamed N.M, so you'd be able to use file ordering to retrieve
all files for *.M
That scheme would work for multiple splits too, so that filenames could
grow yet retain their sort order and final target batch properties.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-17 Thread Simon Riggs
On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote:
 OK My interest was in expanding the role of HashAgg, which as Rod
 says can be used to avoid the sort, so the overlap between those ideas
 was low anyway.

Am I right in thinking that HashAgg would almost always be quicker than
SortAgg, even for large ( memory) aggregation sets? (Except where the
prior ordering has already been forced via an ORDER BY).

If that is so, then I will probably look to work on this sooner,
especially since we seem to have a clear design.

I'd originally viewed the spill-to-disk logic as a safety measure rather
than as a performance feature.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] debug_query_string and multiple statements

2006-01-17 Thread Neil Conway
While reviewing Joachim Wieland's patch to add a pg_cursors system view,
I noticed that the patch assumes that debug_query_string contains the
portion of the submitted query string that corresponds to the SQL
statement we are currently executing. That is incorrect:
debug_query_string contains the *entire* verbatim query string sent by
the client. So if the client submits the query string SELECT 1; SELECT
2;, debug_query_string will contain exactly that string. (psql actually
splits queries like the above into two separate FE/BE messages -- to see
what I'm referring to, use libpq directly, or start up a copy of the
standalone backend.)

This makes debug_query_string the wrong thing to use for the pg_cursors
and pg_prepared_statements views, but it affects other parts of the
system as well: for example, given PQexec(conn, SELECT 1; SELECT 2/0;)
and log_min_error_statement = 'error', the postmaster will log:

ERROR:  division by zero
STATEMENT:  SELECT 1; SELECT 2/0;

which seems misleading, and is inconsistent with the documentation's
description of this configuration parameter. Admittedly this isn't an
enormous problem, but I think the current behavior isn't ideal.

Unfortunately I don't see an easy way to fix this. It might be possible
to extra a semicolon separated list of query strings from the parser or
lexer, but that would likely have the effect of munging comments and
whitespace from the literal string submitted by the client, which seems
the wrong thing to do for logging purposes. An alternative might be to
do a preliminary scan to look for semicolon delimited query strings, and
then pass each of those strings into the raw_parser() separately, but
that seems quite a lot of work (and perhaps a significant runtime cost)
to fix what is at worst a minor UI wrinkle.

Thoughts?

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] test please ignore

2006-01-17 Thread Andrew Dunstan


sorry for noise

cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] debug_query_string and multiple statements

2006-01-17 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 While reviewing Joachim Wieland's patch to add a pg_cursors system view,
 I noticed that the patch assumes that debug_query_string contains the
 portion of the submitted query string that corresponds to the SQL
 statement we are currently executing. That is incorrect:

Yeah, this has annoyed me for some time.  debug_query_string is really
just a quick hack and has already been stretched well past its intended
use.  I don't think you get the right result for situations where the
active query has been submitted via SPI, either: debug_query_string will
show the outermost interactive command, which may have little to do with
the query that got the error.

The appropriate way to fix it IMHO is to get the lexer+grammar to
identify the statement boundary(s) in the source string and add explicit
support in the parser for saving away the appropriate strings.

This would tie nicely into something else I'd like to do someday, which
is improve parse-analysis error reports by being able to finger the
offending construct more precisely.  When we report something like an
unknown variable name in a huge query, it really sucks that we can't
give an error cursor the way simple lexer or grammar errors do.  To
fix that, tokens generated by the lexer/grammar need to carry along
text positions ... and the position of the semicolon token is just what
we'd need to create proper statement strings, too.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)

2006-01-17 Thread Michael Fuhr
On Tue, Jan 17, 2006 at 01:28:03PM -0500, Jaime Casanova wrote:
 is the same problem as with temp tables, you must put their creation,
 and in this case even the nextval in an execute...

Curious that it works in 8.0, though.  I wonder if the failure in
8.1 is an artifact of changing sequence functions like nextval()
to take a regclass argument (the sequence OID) instead of a text
argument (the sequence name); that would affect what gets put in
the function's cached plan.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org