Re: [PERFORM] Join on incompatible types
Laurent Martelli wrote: Shridhar == Shridhar Daithankar [EMAIL PROTECTED] writes: Shridhar Laurent Martelli wrote: [...] Should I understand that a join on incompatible types (such as integer and varchar) may lead to bad performances ? Shridhar Conversely, you should enforce strict type compatibility Shridhar in comparisons for getting any good plans..:-) Ha ha, now I understand why a query of mine was so sluggish. Is there a chance I could achieve the good perfs without having he same types ? I've tried a CAST in the query, but it's even a little worse than without it. However, using a view to cast integers into varchar gives acceptable results (see at the end). I'm using Postgresql 7.3.4. I am stripping the analyze outputs and directly jumping to the end. Can you try following? 1. Make all fields integer in all the table. 2. Try following query EXPLAIN ANALYZE SELECT * from lists join classes on classes.id=lists.value where lists.id='16'::integer; How does it affect the runtime? Shridhar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] TEXT column and indexing
I have this table: db= \d object_property_value Table db.object_property_value Column | Type | Modifiers ---++ obj_property_value_id | integer| not null default nextval(... obj_property_id | integer| not null value | text | Indexes: object_property_value_pkey primary key, btree (obj_property_value_id) opv_obj_property_id_ix btree (obj_property_id) opv_v_ix btree (substr(value, 1, 128)) Foreign-key constraints: object_property_fkey FOREIGN KEY (obj_property_id) REFERENCES object_property(obj_property_id) ON UPDATE CASCADE ON DELETE CASCADE (long lines edited for readability). The table contains about 250,000 records and will grow at regular intervals. The 'value' column contains text of various lengths. The table is VACUUMed and ANALYZEd regularly and waxed on Sunday mornings. Database encoding is Unicode. Server is 7.4RC1 or 7.4RC2 and will be 7.4 ASAP. I want to query this table to match a specific value along the lines of: SELECT obj_property_id FROM object_property_value opv WHERE opv.value = 'foo' There will only be a few (at the moment 2 or 3) rows exactly matching 'foo'. This query will only be performed with values containing less than around 100 characters, which account for ca. 10% of all rows in the table. The performance is of course lousy: db= EXPLAIN db- SELECT obj_property_id db- FROM object_property_value opv db- WHERE opv.value = 'foo'; QUERY PLAN - Seq Scan on object_property_value opv (cost=0.00..12258.26 rows=2 width=4) Filter: (value = 'foo'::text) (2 rows) However, if I create a VARCHAR field containing the first 128 characters of the text field and index that, an index scan is used: db= EXPLAIN db- SELECT obj_property_id db- FROM object_property_value opv db- WHERE opv.opv_vc = 'foo'; QUERY PLAN --- Index Scan using opv_vc_ix on object_property_value opv (cost=0.00..6.84 rows=2 width=4) Index Cond: ((opv_vc)::text = 'foo'::text) The question is therefore: can I get an index to work on the TEXT column? It is currently indexed with: opv_v_ix btree (substr(value, 1, 128)) which doesn't appear to have any effect. I am probably missing something obvious though. I can live with maintaining an extra VARCHAR column but would like to keep the table as simple as possible. (For anyone wondering: yes, I can access the data using tsearch2 - via a different table in this case - but this is not always appropriate). Thanks for any hints. Ian Barwick [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Join on incompatible types
Laurent Martelli wrote: Shridhar == Shridhar Daithankar [EMAIL PROTECTED] writes: [...] Shridhar 2. Try following query EXPLAIN ANALYZE SELECT * from lists Shridhar join classes on classes.id=lists.value where Shridhar lists.id='16'::integer; Shridhar classes.id=lists.value::integer. With classes.id of type integer and lists.value of type varchar, I get ERROR: Cannot cast type character varying to integer, which is not such a surprise. Try to_numbr function to get a number out of string. Then cast it to integer. http://developer.postgresql.org/docs/postgres/functions-formatting.html I hope that works. Don't have postgresql installation handy here.. Shridhar ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] constant vs function param differs in performance
Dear Tom, Thanks for your early response. An addition: the nastier difference increased by adding an index (it was an essential index for this query): func with param improved from 2700ms to 2300ms func with constant improved from 400ms to 31ms inline query improved from 390ms to 2ms So am I reading correct and it is completely normal and can't be helped? (couldn't have tried 7.4 yet) In case it reveals something: --- cut here --- SELECT field FROM (SELECT field, sum(something)=0 AS boolvalue FROM (SELECT * FROM subselect1 NATURAL LEFT JOIN subselect2 UNION SELECT * FROM subselect3 NATURAL LEFT JOIN subselect4 ) AS u GROUP BY field) AS t WHERE not boolvalue ORDER BY simple_sql_func_returns_bool(field) DESC LIMIT 1; --- cut here --- G. - Original Message - From: Tom Lane [EMAIL PROTECTED] Sent: Friday, November 14, 2003 9:59 PM =?iso-8859-2?B?U1rbQ1MgR+Fib3I=?= [EMAIL PROTECTED] writes: I have two SQL function that produce different times and I can't understand why. The planner often produces different plans when there are constants in WHERE clauses than when there are variables, because it can get more accurate ideas of how many rows will be retrieved. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] problem with select count(*) ..
On Thu, Nov 20, 2003 at 07:07:30 +0530, Rajesh Kumar Mallah [EMAIL PROTECTED] wrote: If i dump and reload the performance improves and it takes 1 sec. This is what i have been doing since the upgrade. But its not a solution. The Vacuum full is at the end of a loading batch SQL file which makes lot of insert , deletes and updates. If a dump and reload fixes your problem, most likely you have a lot of dead tuples in the table. You might need to run vacuum more often. You might have an open transaction that is preventing vacuum full from cleaning up the table. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] More detail on settings for pgavd?
Shridhar, Will look into it. Give me a day or so. I am planning couple of other patches as well. May be over week end. Thanks, appreciated. As I said, I don't think the settings themselves are wrong, I think the documentation is. What are you patching? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] TEXT column and indexing
On Wed, 19 Nov 2003, Ian Barwick wrote: I have this table: db= \d object_property_value Table db.object_property_value Column | Type | Modifiers ---++ obj_property_value_id | integer| not null default nextval(... obj_property_id | integer| not null value | text | Indexes: object_property_value_pkey primary key, btree (obj_property_value_id) opv_obj_property_id_ix btree (obj_property_id) opv_v_ix btree (substr(value, 1, 128)) Foreign-key constraints: object_property_fkey FOREIGN KEY (obj_property_id) REFERENCES object_property(obj_property_id) ON UPDATE CASCADE ON DELETE CASCADE I want to query this table to match a specific value along the lines of: SELECT obj_property_id FROM object_property_value opv WHERE opv.value = 'foo' The question is therefore: can I get an index to work on the TEXT column? It is currently indexed with: opv_v_ix btree (substr(value, 1, 128)) which doesn't appear to have any effect. I am probably missing something obvious though. I can live with maintaining an extra VARCHAR column but You probably need to be querying like: WHERE substr(value,1,128)='foo'; in order to use that index. While substr(txtcol, 1,128) happens to have the property that it would be probably be useful in a search against a short constant string, that's an internal property of that function. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] TEXT column and indexing
On Wed, 19 Nov 2003 10:18:18 +0100, Ian Barwick [EMAIL PROTECTED] wrote: Indexes: [...] opv_v_ix btree (substr(value, 1, 128)) SELECT obj_property_id FROM object_property_value opv WHERE opv.value = 'foo' Try ... WHERE substr(opv.value, 1, 128) = 'foo' HTH. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] duration logging setting in 7.4
Ryszard Lach wrote: If I turn on log_min_duration_statement (i.e. set to 0), log_statement and log_duration, then I receive something like that Nov 17 22:33:27 postgres[22945]: [29231-1] LOG: statement: Nov 17 22:33:27 postgres[22945]: [29232-1] LOG: duration: 0.198 ms Nov 17 22:33:27 postgres[22945]: [29233-1] LOG: duration: 0.198 ms statement: Nov 17 22:33:27 postgres[22946]: [29231-1] LOG: statement: Nov 17 22:33:27 postgres[22946]: [29232-1] LOG: duration: 0.191 ms Nov 17 22:33:27 postgres[22946]: [29233-1] LOG: duration: 0.191 ms statement: Nov 17 22:33:27 postgres[22678]: [147134-1] LOG: statement: select * from cms where id=1465 Nov 17 22:33:27 postgres[22679]: [154907-1] LOG: statement: Nov 17 22:33:27 postgres[22679]: [154908-1] LOG: duration: 0.867 ms Nov 17 22:33:27 postgres[22679]: [154909-1] LOG: duration: 0.867 ms statement: Nov 17 22:33:27 postgres[22678]: [147135-1] LOG: duration: 1.458 ms Nov 17 22:33:27 postgres[22678]: [147136-1] LOG: duration: 1.458 ms statement: select * from cms where id=1465 Nov 17 22:33:27 postgres[22680]: [158366-1] LOG: statement: Nov 17 22:33:27 postgres[22680]: [158367-1] LOG: duration: 0.620 ms Nov 17 22:33:27 postgres[22680]: [158368-1] LOG: duration: 0.620 ms statement: Nov 17 22:33:27 postgres[22681]: [161294-1] LOG: statement: Nov 17 22:33:27 postgres[22681]: [161295-1] LOG: duration: 0.650 ms It seems, that log_duration is responsible only for duration: lines, log_statement - for statement: ones, and log_min_duration_statement - for duration: .* statement:. I think, that the above output should exclude losing of data by syslog from further delibarations. Do you thing that could be a bug? Yes, the problem is not related to syslog. Are you using prepared queries, perhaps? I don't think those show the query, but it seems we should display something better than blanks. There is another one thing: logs from the same database running on 7.3 and the same application contained lines like 'select getdatabaseencoding()', 'select datestyle()' and similar (not used by application explicite, probably added by JDBC driver), now they are missed - maybe this is the problem? No, those are missing because the new 7.4 wire protocol doesn't require those queries anymore --- the data is send automatically. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] TEXT column and indexing
On Wednesday 19 November 2003 17:35, Stephan Szabo wrote: On Wed, 19 Nov 2003, Ian Barwick wrote: I have this table: (...) You probably need to be querying like: WHERE substr(value,1,128)='foo'; in order to use that index. While substr(txtcol, 1,128) happens to have the property that it would be probably be useful in a search against a short constant string, that's an internal property of that function. That's the one :-). Thanks! Ian Barwick [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] TEXT column and indexing
On Wednesday 19 November 2003 17:26, you wrote: On Wed, 19 Nov 2003 10:18:18 +0100, Ian Barwick [EMAIL PROTECTED] wrote: Indexes: [...] opv_v_ix btree (substr(value, 1, 128)) SELECT obj_property_id FROM object_property_value opv WHERE opv.value = 'foo' Try ... WHERE substr(opv.value, 1, 128) = 'foo' HTH. Yup: db= explain db- SELECT obj_property_id db- FROM object_property_value opv db- WHERE substr(opv.value,1,128) = 'foo'; QUERY PLAN Index Scan using opv_v_ix on object_property_value opv (cost=0.00..4185.78 rows=1101 width=4) Index Cond: (substr(value, 1, 128) = 'foo'::text) (2 rows) Many thanks Ian Barwick [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] duration logging setting in 7.4
On Tue, Nov 18, 2003 at 10:07:48AM -0500, Bruce Momjian wrote: Wow, that is strange. If you don't use syslog, do you see the proper output? I've just checked this. It behaves exactly the same way. If you turn on log_statement, do you see the statements? If I turn on log_min_duration_statement (i.e. set to 0), log_statement and log_duration, then I receive something like that Nov 17 22:33:27 postgres[22945]: [29231-1] LOG: statement: Nov 17 22:33:27 postgres[22945]: [29232-1] LOG: duration: 0.198 ms Nov 17 22:33:27 postgres[22945]: [29233-1] LOG: duration: 0.198 ms statement: Nov 17 22:33:27 postgres[22946]: [29231-1] LOG: statement: Nov 17 22:33:27 postgres[22946]: [29232-1] LOG: duration: 0.191 ms Nov 17 22:33:27 postgres[22946]: [29233-1] LOG: duration: 0.191 ms statement: Nov 17 22:33:27 postgres[22678]: [147134-1] LOG: statement: select * from cms where id=1465 Nov 17 22:33:27 postgres[22679]: [154907-1] LOG: statement: Nov 17 22:33:27 postgres[22679]: [154908-1] LOG: duration: 0.867 ms Nov 17 22:33:27 postgres[22679]: [154909-1] LOG: duration: 0.867 ms statement: Nov 17 22:33:27 postgres[22678]: [147135-1] LOG: duration: 1.458 ms Nov 17 22:33:27 postgres[22678]: [147136-1] LOG: duration: 1.458 ms statement: select * from cms where id=1465 Nov 17 22:33:27 postgres[22680]: [158366-1] LOG: statement: Nov 17 22:33:27 postgres[22680]: [158367-1] LOG: duration: 0.620 ms Nov 17 22:33:27 postgres[22680]: [158368-1] LOG: duration: 0.620 ms statement: Nov 17 22:33:27 postgres[22681]: [161294-1] LOG: statement: Nov 17 22:33:27 postgres[22681]: [161295-1] LOG: duration: 0.650 ms It seems, that log_duration is responsible only for duration: lines, log_statement - for statement: ones, and log_min_duration_statement - for duration: .* statement:. I think, that the above output should exclude losing of data by syslog from further delibarations. Do you thing that could be a bug? There is another one thing: logs from the same database running on 7.3 and the same application contained lines like 'select getdatabaseencoding()', 'select datestyle()' and similar (not used by application explicite, probably added by JDBC driver), now they are missed - maybe this is the problem? Richard. -- First they ignore you. Then they laugh at you. Then they fight you. Then you win. - Mohandas Gandhi. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] duration logging setting in 7.4
On Wed, Nov 19, 2003 at 01:58:27PM -0500, Bruce Momjian wrote: Ryszard Lach wrote: There is another one thing: logs from the same database running on 7.3 and the same application contained lines like 'select getdatabaseencoding()', 'select datestyle()' and similar (not used by application explicite, probably added by JDBC driver), now they are missed - maybe this is the problem? No, those are missing because the new 7.4 wire protocol doesn't require those queries anymore --- the data is send automatically. Mayby this is a solution? Because of some charset-related problems we are still using an old (AFAiR modified) version of JDBC driver. I'm not a programmer, but I think and don't know what JDBC driver does, but maybe it sends from client side those queries and server doesn't know what to do with them? I'll ask our programmers to try with 7.4 driver and tell you about results. Richard. -- First they ignore you. Then they laugh at you. Then they fight you. Then you win. - Mohandas Gandhi. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] duration logging setting in 7.4
Ryszard Lach wrote: On Wed, Nov 19, 2003 at 01:58:27PM -0500, Bruce Momjian wrote: Ryszard Lach wrote: There is another one thing: logs from the same database running on 7.3 and the same application contained lines like 'select getdatabaseencoding()', 'select datestyle()' and similar (not used by application explicite, probably added by JDBC driver), now they are missed - maybe this is the problem? No, those are missing because the new 7.4 wire protocol doesn't require those queries anymore --- the data is send automatically. Mayby this is a solution? Because of some charset-related problems we are still using an old (AFAiR modified) version of JDBC driver. I'm not a programmer, but I think and don't know what JDBC driver does, but maybe it sends from client side those queries and server doesn't know what to do with them? I'll ask our programmers to try with 7.4 driver and tell you about results. Also, try plain psql and issue a query and see if it appears. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Shridhar, I was looking at the -V/-v and -A/-a settings in pgavd, and really don't understand how the calculation works. According to the readme, if I set -v to 1000 and -V to 2 (the defaults) for a table with 10,000 rows, pgavd would only vacuum after 21,000 rows had been updated. This seems wrong. No. that is correct. It is calculated as threshold = base + scale*numebr of current rows Which translates to 21,000 = 1000 + 2*1000 However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. I would rather vacuum the table at 2000 updates, which is what you probably want. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Can you clear this up a little? I'd like to tweak these settings but can't without being better aquainted with the calculation. What did you expected in above example? It is not difficult to tweak pg_autovacuum calculations. For testing we can play around. Also, you may want to reverse your default ratio for Vacuum/analyze frequency. True, analyze is a less expensive operation than Vacuum, but it's also needed less often -- only when the *distribution* of data changes.I've seen databases where the optimal vacuum/analyze frequency was every 10 min/once per day. OK vacuum and analyze thresholds are calculated with same formula as shown above but with different parameters as follows. vacthresh = vacbase + vacscale*ntuples anathresh = anabase + anascale*ntuples What you are asking for is vacthresh = vacbase*vacscale anathresh = anabase + anascale*ntuples Would that tilt the favour the way you want? i.e. an analyze is triggered when a fixed *percentage* of table changes but a vacuum is triggered when a fixed *number of rows* are changed. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. And BTW, this is all brain child of Mathew O.Connor(Correct? I am not good at either names or spellings). The way I wrote pgavd originally, each table got to get separate threshold..:-). That was rather a brute force approach. Shridhar ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] More detail on settings for pgavd?
Shridhar, However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. No, I think the logic's fine, it's the numbers which are wrong. We want to vacuum when updates reach between 5% and 15% of total rows. NOT when updates reach 110% of total rows ... that's much too late. Hmmm ... I also think the threshold level needs to be lowered; I guess the purpose was to prevent continuous re-vacuuuming of small tables? Unfortunately, in the current implementation, the result is tha small tables never get vacuumed at all. So for defaults, I would peg -V at 0.1 and -v at 100, so our default calculation for a table with 10,000 rows is: 100 + ( 0.1 * 10,000 ) = 1100 rows. I would rather vacuum the table at 2000 updates, which is what you probably want. Not necessarily. This would be painful if the table has 10,000,000 rows. It *should* be based on a % of rows. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Yes. Vacuum threshold is counting deletes, I hope? What did you expected in above example? It is not difficult to tweak pg_autovacuum calculations. For testing we can play around. Can I set the settings to decimals, or are they integers? vacthresh = vacbase*vacscale anathresh = anabase + anascale*ntuples Nope, see above. My comment about the frequency of vacuums vs. analyze is that currently the *default* is to analyze twice as often as you vacuum.Based on my experiece as a PG admin on a variety of databases, I believe that the default should be to analyze half as often as you vacuum. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. I will have real data very soon . -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Backup/restore of pg_statistics
Joel Jacobson [EMAIL PROTECTED] writes: I understand that it is not possible to occasionally re-plan the queries in a PL/pgSQL function without dropping and re-creating the function. Huh? You only need to start a fresh connection. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend