Re: [HACKERS] source documentation tool doxygen
On Tue, Jan 17, 2006 at 12:15:02AM -0500, Tom Lane wrote: The thing seems to have only the very vaguest grasp on whether it is parsing C or C++ ... or should I say that it is convinced it is parsing C++ despite all evidence to the contrary? I'd be happier with the pretty pictures if they had some connection to reality. True. Doxygen is developped for documenting C++ source code. There is however an option OPTIMIZE_OUTPUT_FOR_C which is set. There are a few more options for class and collaboration graphs as well as a switch to create a graphical class hierarchy, all of them are turned on at the moment. So it might get more concise but we'll have to play around with it to see what is useful and what isn't. Joachim -- Joachim Wieland [EMAIL PROTECTED] C/ Usandizaga 12 1°B ICQ: 37225940 20002 Donostia / San Sebastian (Spain) GPG key available ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [pgsql-www] [HACKERS] source documentation tool doxygen
On Mon, Jan 16, 2006 at 07:42:35PM -0500, Robert Treat wrote: This was my plan all along, just been waiting for someone to make it work with the postgresql code and then send instructions to the postgresql web team on how to set it up. I volunteer to tell you after I've found out for myself ;-) Some details: What I have put online on my website occupies about 240 MBs of disk space and gets built in 1.5h on my PIII 800 Laptop. Removing useless graphs this can be reduced to less than 200 MBs I imagine. Do you want to put it on the postgresql.org site nevertheless? Is it too big to be mirrored and should be recreated on every webserver? We might need one copy for the last version of every major release as well as one for cvs. The latter should get updated regularly of course but I figure it would be sufficient to do that once a week... Joachim -- Joachim Wieland [EMAIL PROTECTED] C/ Usandizaga 12 1°B ICQ: 37225940 20002 Donostia / San Sebastian (Spain) GPG key available ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [pgsql-www] [HACKERS] source documentation tool doxygen
On Tue, Jan 17, 2006 at 09:34:06AM +0900, Michael Glaesemann wrote: Along those lines, I wonder if a CSS couldn't be worked up to integrate the look with the rest of the site. Yes, it's stylesheet based. However I don't know yet to what extend you can change the look. It allows for a custom header and footer as well. The postgresql logo on top would be nice but the navigation menu on the left has to be sacrificed for more space. Joachim -- Joachim Wieland [EMAIL PROTECTED] C/ Usandizaga 12 1°B ICQ: 37225940 20002 Donostia / San Sebastian (Spain) GPG key available ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [pgsql-www] [HACKERS] source documentation tool doxygen
Joachim Wieland said: Do you want to put it on the postgresql.org site nevertheless? Is it too big to be mirrored and should be recreated on every webserver? We might need one copy for the last version of every major release as well as one for cvs. The latter should get updated regularly of course but I figure it would be sufficient to do that once a week... The overwhelming amount of development work gets done against HEAD. I would start with a once a day run against HEAD, and possibly one against the latest stable branch (currently REL8_1_STABLE in cvs). That would get you 99% of the possible benefit, I think. I don't see any virtue in doing it against a release as opposed to a stable branch - this is to help development efforts so it should be against what people should be basing their development efforts on. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Bad estimate on LIKE matching
I have a table, path, which is: pathid | integer | not null default nextval('path_pathid_seq'::regclass) path | text| not null Indexes: path_pkey PRIMARY KEY, btree (pathid) path_name_idx btree (path) The table contains approx 1.2 million rows, of which all are unique. (both the path field and the naem field are unique, thought he path_name_idx index is not a unique index) On this table, I do a query like: SELECT * FROM path WHERE path LIKE 'f:/userdirs/s/super_73/%' The estimate for this query is comlpetely off, which I beleive is the cause for a very bad selection of a query plan when it's used in a big join (creating nestloops that ends up taking 15+ minutes to complete..). Explain analyze gives: QUERY PLAN --- Index Scan using path_name_idx on path (cost=0.00..3.24 rows=1 width=74) (actual time=0.035..0.442 rows=214 loops=1) Index Cond: ((path = 'f:/userdirs/s/super'::text) AND (path 'f:/userdirs/s/supes'::text)) Filter: (path ~~ 'f:/userdirs/s/super_73%'::text) No matter what I search on (when it's very selective), the estimate is always 1 row, whereas the actual value is at least a couple of hundred. If I try with say f:/us, the difference is 377,759 estimated vs 562,459 returned, which is percentage-wise a lot less, but... I have tried upping the statistics target up to 1000, with no changes. Any way to teach the planner about this? FYI, if I push the result of the select on path into a temp table, and then join with that one in my main table, I get a hashjoin instead, and query time is 30 seconds instead of 15 minutes... //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)
Hi, here is a testcase: CREATE OR REPLACE FUNCTION testseq() RETURNS void AS $BODY$ BEGIN CREATE TEMP SEQUENCE test; PERFORM testseq1(); DROP SEQUENCE test; RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq() OWNER TO postgres; CREATE OR REPLACE FUNCTION testseq1() RETURNS void AS $BODY$ DECLARE I INTEGER; BEGIN I:= nextval('test'); RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq1() OWNER TO postgres; SELECT testseq(); -- this works fine. SELECT testseq(); ERROR: could not open relation with OID 21152 CONTEXT: PL/pgSQL function testseq1 line 3 at assignment SQL statement SELECT testseq1() PL/pgSQL function testseq line 3 at perform Greetings, Daniel. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Docs off on ILIKE indexing?
http://www.postgresql.org/docs/8.1/static/indexes-types.html says: The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE, ILIKE, ~, and ~*, if the pattern is a constant and is anchored to the beginning of the string - for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. But really, does it use indexes for ILIKE? That's pretty poorly phrased. For ILIKE it'll only work if there's a prefix of the pattern that's not letters (and hence is unaffected by the case-folding issue). Ahh. That explains it. Perfectly logical. And yes, that's pretty poorly phrased - at least I didn't understand it :-) //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote: But our idea of the number of batches needed can change during that process, resulting in some inner tuples being initially assigned to the wrong temp file. This would also be true for hashagg. So we correct that before we start reading the outer table. Why? That would require a useless additional pass over the data. With the current design, we can process and discard at least *some* of the data in a temp file when we read it, but a reorganization pass would mean that it *all* goes back out to disk a second time. Also, you assume that we can accurately tell how many tuples will fit in memory in advance of actually processing them --- a presumption clearly false in the hashagg case, and not that easy to do even for hashjoin. (You can tell the overall size of a temp file, sure, but how do you know how it will split when the batch size changes? A perfectly even split is unlikely.) OK, I see what you mean. Sounds like we should have a new definition for Aggregates, Sort Insensitive that allows them to work when the input ordering does not effect the result, since that case can be optimised much better when using HashAgg. Please don't propose pushing this problem onto the user until it's demonstrated that there's no other way. I don't want to become the next Oracle, with forty zillion knobs that it takes a highly trained DBA to deal with. But all of them sound ugly. I was thinking along the lines of having multiple temp files per hash bucket. If you have a tuple that needs to migrate from bucket M to bucket N, you know that it arrived before every tuple that was assigned to bucket N originally, so put such tuples into a separate temp file and process them before the main bucket-N temp file. This might get a little tricky to manage after multiple hash resizings, but in principle it seems doable. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [pgsql-www] [HACKERS] source documentation tool doxygen
Andrew Dunstan [EMAIL PROTECTED] writes: The overwhelming amount of development work gets done against HEAD. I would start with a once a day run against HEAD, and possibly one against the latest stable branch (currently REL8_1_STABLE in cvs). That would get you 99% of the possible benefit, I think. I agree --- I see no reason for us to maintain such documentation for anything except HEAD. If somebody really wants documentation for a stable branch, they can build it themselves. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] function caching problem
I modified postgresql 8.1 for its optimization part. Basically I reset the whole memory context after query optimization (but store the query plan in top level memory context so it is not lost). Later at certain time, I match queries to the pre-computed query plans (to avoid re-compiling the whole query again). My problem is that when optimizing the queries, the functions (date conversion, say) is cached into memory (current memory context). However, this cached information is not part of the query plan generated, so when I reset memory after query optimization, this information is gone. As a result, I get cache lookup failed for function ... error from inside fmgr_info_cxt_security( ) in fmgr.c If I wish that heap memory is never released, or i wish the heap is never used for caching function definitions, how should I do that? or is there any alternatives? thanks a lot! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.
R, Rajesh (STSD) [EMAIL PROTECTED] writes: But the bottomline is the default test does not include netdb.h in the test code. That's odd. Is getaddrinfo a macro on Tru64? If so, the appropriate patch would probably make the test look more like the tests for finite() and friends: dnl Cannot use AC_CHECK_FUNC because finite may be a macro AC_MSG_CHECKING(for finite) AC_TRY_LINK([ #include math.h double glob_double; ], [return finite(glob_double) ? 0 : 1;], [AC_DEFINE(HAVE_FINITE, 1, [Define to 1 if you have finite().]) AC_MSG_RESULT(yes)], [AC_MSG_RESULT(no)]) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)
On 1/17/06, Daniel Schuchardt [EMAIL PROTECTED] wrote: Hi, here is a testcase: CREATE OR REPLACE FUNCTION testseq() RETURNS void AS $BODY$ BEGIN CREATE TEMP SEQUENCE test; PERFORM testseq1(); DROP SEQUENCE test; RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq() OWNER TO postgres; CREATE OR REPLACE FUNCTION testseq1() RETURNS void AS $BODY$ DECLARE I INTEGER; BEGIN I:= nextval('test'); RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq1() OWNER TO postgres; SELECT testseq(); -- this works fine. SELECT testseq(); ERROR: could not open relation with OID 21152 CONTEXT: PL/pgSQL function testseq1 line 3 at assignment SQL statement SELECT testseq1() PL/pgSQL function testseq line 3 at perform Greetings, Daniel. try this way: CREATE OR REPLACE FUNCTION testseq() RETURNS void AS $BODY$ BEGIN EXECUTE 'CREATE TEMP SEQUENCE test'; PERFORM testseq1(); DROP SEQUENCE test; RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq() OWNER TO postgres; CREATE OR REPLACE FUNCTION testseq1() RETURNS void AS $BODY$ DECLARE I INTEGER; BEGIN EXECUTE 'select nextval(''test'')' INTO I; raise notice '%', I; RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; ALTER FUNCTION testseq1() OWNER TO postgres; SELECT testseq(); SELECT testseq(); is the same problem as with temp tables, you must put their creation, and in this case even the nextval in an execute... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote: The tricky part is to preserve the existing guarantee that tuples are merged into their aggregate in arrival order. (This does not matter for the standard aggregates but it definitely does for custom aggregates, and there will be unhappy villagers appearing on our doorsteps if we break it.) I think this can work correctly under the above sketch but it needs to be verified. It might require different handling of the TODO files than what hashjoin does. You almost had me there... but there isn't any arrival order. The sort that precedes an aggregation only sorts on the GROUP BY columns, not on additional columns - so by the SQL standard there is not a guaranteed ordering of the data into a aggregate. That is exactly what windowed aggregates are for. (There isn't any way of specifying an ORDER BY yet either). The only way of doing this is by doing a derived table select a, sum(b) from (select a,b order by a,b); but AFAICS this is not part of the standard?? It is highly likely that rows are clumped together, but there just isn't any guarantee that is the case. Any update of any row would change the arrival order. Should we support something that has worked by luck? I've been looking into windowed aggregates; these will provide this functionality should people require it. I don't see how we'd be able to do windowed aggregates and hashAgg at the same time, so this seems less relevant. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote: The tricky part is to preserve the existing guarantee that tuples are merged into their aggregate in arrival order. You almost had me there... but there isn't any arrival order. The fact that it's not in the spec doesn't mean we don't support it. Here are a couple of threads on the subject: http://archives.postgresql.org/pgsql-general/2005-11/msg00304.php http://archives.postgresql.org/pgsql-sql/2003-06/msg00135.php Per the second message, this has worked since 7.4, and it was requested fairly often before that. Should we support something that has worked by luck? No luck about it, and yes people are depending on it. You don't get to break it just because it's not in the spec. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
On Tue, 2006-01-17 at 14:41 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote: The tricky part is to preserve the existing guarantee that tuples are merged into their aggregate in arrival order. You almost had me there... but there isn't any arrival order. The fact that it's not in the spec doesn't mean we don't support it. Here are a couple of threads on the subject: http://archives.postgresql.org/pgsql-general/2005-11/msg00304.php http://archives.postgresql.org/pgsql-sql/2003-06/msg00135.php Per the second message, this has worked since 7.4, and it was requested fairly often before that. OK My interest was in expanding the role of HashAgg, which as Rod says can be used to avoid the sort, so the overlap between those ideas was low anyway. On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote: I was thinking along the lines of having multiple temp files per hash bucket. If you have a tuple that needs to migrate from bucket M to bucket N, you know that it arrived before every tuple that was assigned to bucket N originally, so put such tuples into a separate temp file and process them before the main bucket-N temp file. This might get a little tricky to manage after multiple hash resizings, but in principle it seems doable. OK, so we do need to do this when we have a defined arrival order: this idea the best one so far. I don't see any optimization of this by ignoring the arrival order, so it seems best to preserve the ordering this way in all cases. You can manage that with file naming. Rows moved from batch N to batch M would be renamed N.M, so you'd be able to use file ordering to retrieve all files for *.M That scheme would work for multiple splits too, so that filenames could grow yet retain their sort order and final target batch properties. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)
On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote: OK My interest was in expanding the role of HashAgg, which as Rod says can be used to avoid the sort, so the overlap between those ideas was low anyway. Am I right in thinking that HashAgg would almost always be quicker than SortAgg, even for large ( memory) aggregation sets? (Except where the prior ordering has already been forced via an ORDER BY). If that is so, then I will probably look to work on this sooner, especially since we seem to have a clear design. I'd originally viewed the spill-to-disk logic as a safety measure rather than as a performance feature. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] debug_query_string and multiple statements
While reviewing Joachim Wieland's patch to add a pg_cursors system view, I noticed that the patch assumes that debug_query_string contains the portion of the submitted query string that corresponds to the SQL statement we are currently executing. That is incorrect: debug_query_string contains the *entire* verbatim query string sent by the client. So if the client submits the query string SELECT 1; SELECT 2;, debug_query_string will contain exactly that string. (psql actually splits queries like the above into two separate FE/BE messages -- to see what I'm referring to, use libpq directly, or start up a copy of the standalone backend.) This makes debug_query_string the wrong thing to use for the pg_cursors and pg_prepared_statements views, but it affects other parts of the system as well: for example, given PQexec(conn, SELECT 1; SELECT 2/0;) and log_min_error_statement = 'error', the postmaster will log: ERROR: division by zero STATEMENT: SELECT 1; SELECT 2/0; which seems misleading, and is inconsistent with the documentation's description of this configuration parameter. Admittedly this isn't an enormous problem, but I think the current behavior isn't ideal. Unfortunately I don't see an easy way to fix this. It might be possible to extra a semicolon separated list of query strings from the parser or lexer, but that would likely have the effect of munging comments and whitespace from the literal string submitted by the client, which seems the wrong thing to do for logging purposes. An alternative might be to do a preliminary scan to look for semicolon delimited query strings, and then pass each of those strings into the raw_parser() separately, but that seems quite a lot of work (and perhaps a significant runtime cost) to fix what is at worst a minor UI wrinkle. Thoughts? -Neil ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] test please ignore
sorry for noise cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] debug_query_string and multiple statements
Neil Conway [EMAIL PROTECTED] writes: While reviewing Joachim Wieland's patch to add a pg_cursors system view, I noticed that the patch assumes that debug_query_string contains the portion of the submitted query string that corresponds to the SQL statement we are currently executing. That is incorrect: Yeah, this has annoyed me for some time. debug_query_string is really just a quick hack and has already been stretched well past its intended use. I don't think you get the right result for situations where the active query has been submitted via SPI, either: debug_query_string will show the outermost interactive command, which may have little to do with the query that got the error. The appropriate way to fix it IMHO is to get the lexer+grammar to identify the statement boundary(s) in the source string and add explicit support in the parser for saving away the appropriate strings. This would tie nicely into something else I'd like to do someday, which is improve parse-analysis error reports by being able to finger the offending construct more precisely. When we report something like an unknown variable name in a huge query, it really sucks that we can't give an error cursor the way simple lexer or grammar errors do. To fix that, tokens generated by the lexer/grammar need to carry along text positions ... and the position of the semicolon token is just what we'd need to create proper statement strings, too. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Error working with Temporary Sequences in plpgsql in 8.1 (8.0 works fine)
On Tue, Jan 17, 2006 at 01:28:03PM -0500, Jaime Casanova wrote: is the same problem as with temp tables, you must put their creation, and in this case even the nextval in an execute... Curious that it works in 8.0, though. I wonder if the failure in 8.1 is an artifact of changing sequence functions like nextval() to take a regclass argument (the sequence OID) instead of a text argument (the sequence name); that would affect what gets put in the function's cached plan. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org