Re: [PERFORM] arrays and indexes
Ross J. Reedstrom [EMAIL PROTECTED] writes: In the new schema, the same thing is: SELECT * from content where 42 = ANY (authors); Works fine, but for the life of me I can't find nor figure out how to build an index that will be used to speed this along. Any ideas? Well that's basically the problem with denormalized data like this. Have you resolved what you're going to do if two sessions try to add a user to the same group at the same time? Or how you'll go about removing a user from all his groups in one shot? Basically, if you denormalize in this fashion it becomes hard to use the groups as anything but single monolithic objects. Whereas normalized data can be queried and updated from other points of view like in the case you name above. Postgres does have a way to do what you ask, though. It involves GiST indexes and the operators from the contrib/intarray directory from the Postgres source. However I warn you in advance that this is fairly esoteric stuff and will take some time to get used to. And at least in my case I found the indexes didn't actually help much for my data sets, probably because they just weren't big enough to benefit. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] hardware raid suggestions
Brian Hirt wrote: I've been using the adaptec ZCR raid cards in our servers for a while now, mostly small systems with 3 or 6 disks, and we've been very happy with them. However, we're building a new DB machine with 14 U320 15K SCA drives, and we've run into a performance bottlenkeck with the ZCR card where it just won't scale well. Without going into too many details, we've tested RAID5, RAID10 and RAID50 on pretty much every array size from 4-14 disks (raid 50 tests used more drives), using JFS, reiserfs and EXT3. With every different configuration, performance didn't improve after array size became greater than 6 disks.. We used various benchmarks, including pgbench with scale factors of 10, 100, 1000, 5000 and clients of 10, 15, 30 and 45. We've done many other tests and monitoring tools, and we've come to the conclusion that the ZCR is the problem. We're looking into getting an Adaptec 2200S or the Megaraid 320 2x which have better processors, and hopefully better performance. We feel that the use of the AIC7930 as the CPU on the ZCR just doesn't cut it and a faster raid controller would work better. Does anyone out there have any experience with these cards with postgresql and linux? If so, would you be willing to share your experiences and possibly give a recommendation? Did you consider the option of use an external storage array ? We are using the dell emc CX600 http://www.dell.com/downloads/emea/products/pvaul/en/Dell_EMC_cx600_specs.pdf and I'm forgotting to have a disk behind... Regards Gaetano Mendola ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Insert are going slower ...
Josh Berkus wrote: Herve' I forgot to ask about your hardware. How much RAM, and what's your disk setup? CPU? sort_mem = 512000 Huh? Sort_mem is in K. The above says that you've allocated 512MB sort mem. Is this process the *only* thing going on on the machine? And also is not system wide but let me say for backend... Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Insert are going slower ...
Hervé Piedvache wrote: Josh, Le mardi 13 Juillet 2004 19:10, Josh Berkus a écrit : What can I do to get better results ?? (configuration option, and/or hardware update ?) What can I give you to get more important informations to help me ? 1) What PostgreSQL version are you using? v7.4.3 2) What's your VACUUM, ANALYZE, VACUUM FULL, REINDEX schedule? VACUUM FULL VERBOSE ANALYZE; Every day after the calculation I was talking about ... 3) Can you list the non-default settings in your PostgreSQL.conf? Particularly, shared_buffers, sort_mem, checkpoint_segments, estimated_cache, and max_fsm_pages? sort_mem = 512000 This is too much, you are instructing Postgres to use 512MB for each backend ( some time each backend can use this quantity more then one ) vacuum_mem = 409600 max_fsm_pages = 5000 max_fsm_relations = 2000 50 milions ? HUG. what tell you postgres in the log after performing a vacuum full ? max_files_per_process = 2000 wal_buffers = 1000 checkpoint_segments = 3 For massive insert you have to increase this number, pump it up to 16 effective_cache_size = 500 5GB for 8 GB system is too much random_page_cost = 3 on your HW you can decrease it to 2 and also decrease the other cpu costs Regards Gaetano Mendola BTW, I live in Paris too, if you need a hand... ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Timestamp-based indexing
Hello -- To increase query (i.e. select) performance, we're trying to get postgres to use an index based on a timestamp column in a given table. Event-based data is put into this table several times a minute, with the timestamp indicating when a particular row was placed in the table. The table is purged daily, retaining only the rows that are less than 7 days old. That is, any row within the table is less than 1 week old (+ 1 day, since the purge is daily). A typical number of rows in the table is around 400,000. A VACUUM FULL ANALYZE is performed every 3 hours. The problem: We often query the table to extract those rows that are, say, 10 minutes old or less. Given there are 10080 minutes per week, the planner could, properly configured, estimate the number of rows returned by such a query to be: 10 min/ 10080 min * 400,000 = 0.001 * 400,000 = 400. Making an index scan, with the timestamp field the index, far faster then a sequential scan. However, we can't get the planner to do an timestamp-based index scan. Anyone know what to do? Here's the table specs: monitor=# \d eventtable Table public.eventtable Column |Type | Modifiers ---+-+-- timestamp | timestamp without time zone | not null default ('now'::text)::timestamp(6) with time zone key | bigint | not null default nextval('public.eventtable_key_seq'::text) propagate | boolean | facility | character(10) | priority | character(10) | host | character varying(128) | not null message | text| not null Indexes: eventtable_pkey primary key, btree (timestamp, key) eventtable_host btree (host) eventtable_timestamp btree (timestamp) Here's a query (with explain analyze): monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN Seq Scan on eventtable (cost=0.00..19009.97 rows=136444 width=155) (actual time=11071.073..11432.522 rows=821 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 11433.384 ms (3 rows) Here's something strange. We try to disable sequential scans, but to no avail. The estimated cost skyrockets, though: monitor=# set enable_seqscan = false; SET monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN - Seq Scan on eventtable (cost=1.00..100019009.97 rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 9934.353 ms (3 rows) monitor=# set enable_seqscan = true; SET monitor=# Any help is greatly appreciated :) -- Harmon ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] arrays and indexes
Ross wrote: Hi all - I've got a schema I'm working on modifying, nad I need some help getting the best performance out. The orginal schema has a many to many linkage between a couple tables, using a two column linkage table. This is used to represent groups of people and their relationship to an object (authors, copyrightholders, maintainers) This worked fine, and, with the right indixes, is quite zippy. Approximate schems: table content ( contentid serial, name text, ... authorgroupid int, cpholdergroupid int, maintgroupid int) table groups ( personid text, groupid int) Note that neither grouid nor personid are unique. Now the users want not just groups, but ordered lists. Well, that's just fine: we could do it with another column in the groups linkage table, and some additional logic in the middleware for detecting identical groups, but it occured to me that PG's array types are just the ticket for ordered lists like this. So, by dropping arrays of personids (authors, copyrightholders, maintainers, ...) into the content table, I can do everything I need. Only one problem. Retreiving all the content for a particular person/role is fairly common. Queries of the form: SELECT * from content c join groups g on c.authorgroupid = g.personid where personid = 'ross'; work fine and use the index on groups.personid. In the new schema, the same thing is: SELECT * from content where 42 = ANY (authors); Works fine, but for the life of me I can't find nor figure out how to build an index that will be used to speed this along. Any ideas? I'm using 7.4.3, BTW. Arrays are usually a bad choice to put in your tables with a couple of exceptions. Keep in mind that you can generate the array in the query stage using custom aggregates if you prefer to deal with them on the client side. The basic problem is they introduce flexibility issues and are usually better handled by moving the data to a dependant table. Here are cases you might want to consider using arrays in your tables: 1. Your array bounds are small and known at design time (think: pay by quarter example in the docs). 2. Your array will not contain more than one or two dependant elements. 3. You are dealing with an extreme performance situation and you have tried doing things the proper way first. There are other exceptions...arrays can be a powerful tool albeit a dangerous one...just know what you are getting into. A firm understanding of relational principles are a tremendous help. If your array bounds are known, it possible to get around the index problem in limited cases by using a custom function (but only when the array bounds are known: create function any_quarter_over_10k (numeric[]) returns boolean as ' select case when $1[1] = 1 then true when $1[2] = 1 then true when $1[3] = 1 then true when $1[4] = 1 then true else false end; ' language 'sql' IMMUTABLE; create index t_q_10k_idx on t(any_quarter_over_10k(salary_qtr)); select * from t where any_quarter_over_10k(t.salary_qtr) = true; Good luck! Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Timestamp-based indexing
Harmon S. Nine wrote: monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN Try SELECT * FROM eventtable where timestamp BETWEEN (CURRENT_TIMESTAMP - INTERVAL '10 minutes') AND CURRENT_TIMESTAMP; This should will use a range off valid times. What your query is doing is looking for 10 minutes ago to an infinate future. Statically speaking that should encompass most of the table because you have an infinate range. No index will be used. If you assign a range the planner can fiqure out what you are looking for. -- Kevin Barnard Speed Fulfillment and Call Center [EMAIL PROTECTED] 214-258-0120 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Timestamp-based indexing
VACUUM FULL ANALYZE every 3 hours seems a little severe. You will probably be be served just as well by VACUUM ANALYZE. But you probably don't need the VACUUM part most of the time. You might try doing an ANALYZE on the specific tables you are having issues with. Since ANALYZE should be much quicker and not have the performance impact of a VACUUM, you could do it every hour, or even every 15 minutes. Good luck... Harmon S. Nine wrote: Hello -- To increase query (i.e. select) performance, we're trying to get postgres to use an index based on a timestamp column in a given table. Event-based data is put into this table several times a minute, with the timestamp indicating when a particular row was placed in the table. The table is purged daily, retaining only the rows that are less than 7 days old. That is, any row within the table is less than 1 week old (+ 1 day, since the purge is daily). A typical number of rows in the table is around 400,000. A VACUUM FULL ANALYZE is performed every 3 hours. The problem: We often query the table to extract those rows that are, say, 10 minutes old or less. Given there are 10080 minutes per week, the planner could, properly configured, estimate the number of rows returned by such a query to be: 10 min/ 10080 min * 400,000 = 0.001 * 400,000 = 400. Making an index scan, with the timestamp field the index, far faster then a sequential scan. However, we can't get the planner to do an timestamp-based index scan. Anyone know what to do? Here's the table specs: monitor=# \d eventtable Table public.eventtable Column |Type | Modifiers ---+-+-- timestamp | timestamp without time zone | not null default ('now'::text)::timestamp(6) with time zone key | bigint | not null default nextval('public.eventtable_key_seq'::text) propagate | boolean | facility | character(10) | priority | character(10) | host | character varying(128) | not null message | text| not null Indexes: eventtable_pkey primary key, btree (timestamp, key) eventtable_host btree (host) eventtable_timestamp btree (timestamp) Here's a query (with explain analyze): monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN Seq Scan on eventtable (cost=0.00..19009.97 rows=136444 width=155) (actual time=11071.073..11432.522 rows=821 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 11433.384 ms (3 rows) Here's something strange. We try to disable sequential scans, but to no avail. The estimated cost skyrockets, though: monitor=# set enable_seqscan = false; SET monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN - Seq Scan on eventtable (cost=1.00..100019009.97 rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 9934.353 ms (3 rows) monitor=# set enable_seqscan = true; SET monitor=# Any help is greatly appreciated :) -- Harmon ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Timestamp-based indexing
On Mon, 26 Jul 2004, Harmon S. Nine wrote: However, we can't get the planner to do an timestamp-based index scan. Anyone know what to do? I'd wonder if the type conversion is causing you problems. CURRENT_TIMESTAMP - INTERVAL '10 minutes' is a timestamp with time zone while the column is timestamp without time zone. Casting CURRENT_TIMESTAMP to timestamp without time zone seemed to make it able to choose an index scan on 7.4. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Timestamp-based indexing
Matthew T. O'Connor [EMAIL PROTECTED] writes: VACUUM FULL ANALYZE every 3 hours seems a little severe. If rows are only deleted once a day, that's a complete waste of time, indeed. I'd suggest running a plain VACUUM just after the deletion pass is done. ANALYZEs are a different matter and possibly need to be done every few hours, seeing that your maximum timestamp value is constantly changing. monitor=# set enable_seqscan = false; SET monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN - Seq Scan on eventtable (cost=1.00..100019009.97 rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 9934.353 ms You've got some datatype confusion, too. CURRENT_TIMESTAMP yields timestamp with time zone, and since you made the timestamp column timestamp without time zone, you've got a cross-type comparison which is not indexable (at least not in 7.4). My opinion is that you chose the wrong type for the column. Values that represent specific real-world instants should always be timestamp with time zone, so that they mean the same thing if you look at them in a different time zone. Another issue here is that because CURRENT_TIMESTAMP - INTERVAL '10 minutes' isn't a constant, the planner isn't able to make use of the statistics gathered by ANALYZE anyway. That's why the rowcount estimate has nothing to do with reality. Unless you force the decision with set enable_seqscan, the planner will never pick an indexscan with this rowcount estimate. The standard advice for getting around this is to hide the nonconstant calculation inside a function that's deliberately mislabeled immutable. For example, create function ago(interval) returns timestamp with time zone as 'select now() - $1' language sql strict immutable; select * from eventtable where timestamp ago('10 minutes'); The planner folds the ago('10 minutes') to a constant, checks the statistics, and should do the right thing. Note however that this technique may break if you put a call to ago() inside a function or prepared statement --- it's only safe in interactive queries, where you don't care that the value is reduced to a constant during planning instead of during execution. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Insert are going slower ...
On Mon, 2004-07-26 at 08:20, Gaetano Mendola wrote: Herv Piedvache wrote: SNIP sort_mem = 512000 This is too much, you are instructing Postgres to use 512MB for each backend ( some time each backend can use this quantity more then one ) agreed. If any one process needs this much sort mem, you can set it in that sessions with set sort_mem anyway, so to let every sort consume up to 512 meg is asking for trouble. effective_cache_size = 500 5GB for 8 GB system is too much No, it's not. Assuming that postgresql with all it's shared buffers is using 2 gig, it's quite likely that the kernel is caching at least 5 gigs of disk data. Effective cache size doesn't set any cache size, it tells the planner about how much the kernel is caching. random_page_cost = 3 on your HW you can decrease it to 2 and also decrease the other cpu costs On fast machines it often winds up needing to be set somewhere around 1.2 to 2.0 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Timestamp-based indexing
THAT WAS IT!! Thank you very much. Is there a way to change the type of "CURRENT_TIMESTAMP" to "timestamp without time zone" so that casting isn't needed? BTW, isn't this a bug? -- Harmon Stephan Szabo wrote: On Mon, 26 Jul 2004, Harmon S. Nine wrote: However, we can't get the planner to do an timestamp-based index scan. Anyone know what to do? I'd wonder if the type conversion is causing you problems. CURRENT_TIMESTAMP - INTERVAL '10 minutes' is a timestamp with time zone while the column is timestamp without time zone. Casting CURRENT_TIMESTAMP to timestamp without time zone seemed to make it able to choose an index scan on 7.4.
Re: [PERFORM] arrays and indexes
On Mon, Jul 26, 2004 at 02:27:20AM -0400, Greg Stark wrote: Ross J. Reedstrom [EMAIL PROTECTED] writes: In the new schema, the same thing is: SELECT * from content where 42 = ANY (authors); Works fine, but for the life of me I can't find nor figure out how to build an index that will be used to speed this along. Any ideas? Well that's basically the problem with denormalized data like this. Have you resolved what you're going to do if two sessions try to add a user to the same group at the same time? Or how you'll go about removing a user from all his groups in one shot? We've got plenty of interlocks in the middleware to handle the first (mainly because this is an authoring system where everyone has to agree to participate, and acknowledge the open license on the materials) Second, they _can't_ be removed: we're effectively a write only archive. Even if we weren't it would be a rare event and could go slowly (loop over groups in the middleware, probably) Basically, if you denormalize in this fashion it becomes hard to use the groups as anything but single monolithic objects. Whereas normalized data can be queried and updated from other points of view like in the case you name above. These groups _really are_ ideal for Joe Conway's work on arrays: we need ordered vectors, so we'd be sorting all the time, otherwise. They're static, and they're read only. The one thing they're not is fixed, known size (Sorry Merlin). They work fine for the query as shown: the only issue is performance. Postgres does have a way to do what you ask, though. It involves GiST indexes and the operators from the contrib/intarray directory from the Postgres source. Well, yes, that's how it used to be done. I figured the new array support should be able to handle it without the addon, however. However I warn you in advance that this is fairly esoteric stuff and will take some time to get used to. And at least in my case I found the indexes didn't actually help much for my data sets, probably because they just weren't big enough to benefit. I know that they should help in this case: we've got lots of content. Any particular author or maintainter will be in a small fraction of those. i.e.: it's ideal for an index. And the current joined case uses an index, when it's available. I'll take a look at the GiST/contrib work, anyway. Thanks - Ross -- Ross Reedstrom, Ph.D. [EMAIL PROTECTED] Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.rice.edu fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] arrays and indexes
SELECT * from content where 42 = ANY (authors); Postgres does have a way to do what you ask, though. It involves GiST indexes and the operators from the contrib/intarray directory from the Postgres source. I have tried to use these indexes, and the performance was very good. It can be faster (in fact much faster) than a join with an additional table, because you don't have a join. The SQL array syntax is a pain, though. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] arrays and indexes
Ross J. Reedstrom [EMAIL PROTECTED] writes: These groups _really are_ ideal for Joe Conway's work on arrays: we need ordered vectors, so we'd be sorting all the time, otherwise. They're static, and they're read only. The one thing they're not is fixed, known size (Sorry Merlin). They work fine for the query as shown: the only issue is performance. Well just as long as you understand the trade-offs. Denormalizing can be useful but you need to know what flexibility you're losing too. Postgres does have a way to do what you ask, though. It involves GiST indexes and the operators from the contrib/intarray directory from the Postgres source. Well, yes, that's how it used to be done. I figured the new array support should be able to handle it without the addon, however. I think you can btree index arrays now, which is new, but it's not useful for the kind of lookup you're doing. It would only be useful for joining on array types or looking for groups with given content, or things like that. However I warn you in advance that this is fairly esoteric stuff and will take some time to get used to. And at least in my case I found the indexes didn't actually help much for my data sets, probably because they just weren't big enough to benefit. I know that they should help in this case: we've got lots of content. Any particular author or maintainter will be in a small fraction of those. i.e.: it's ideal for an index. And the current joined case uses an index, when it's available. I'll take a look at the GiST/contrib work, anyway. I would be curious to know how it goes. My own project uses denormalized sets stored as arrays as well, though in my case they're precalculated from the fully normalized data. I tried to use GiST indexes but ran into problems combining the btree-GiST code with array GiST code in a multicolumn index. I still don't really know why it failed, but after two days building the index I gave up. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] arrays and indexes
Greg Stark [EMAIL PROTECTED] writes: I would be curious to know how it goes. My own project uses denormalized sets stored as arrays as well, though in my case they're precalculated from the fully normalized data. I tried to use GiST indexes but ran into problems combining the btree-GiST code with array GiST code in a multicolumn index. I still don't really know why it failed, but after two days building the index I gave up. Sounds like a bug to me. Could you put together a test case? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Timestamp-based indexing
Hi, How about changing: CURRENT_TIMESTAMP - INTERVAL '10 minutes' to 'now'::timestamptz - INTERVAL '10 minutes' It seems to me that Postgres will treat it as a constant. Thanks, --- Tom Lane [EMAIL PROTECTED] wrote: Matthew T. O'Connor [EMAIL PROTECTED] writes: VACUUM FULL ANALYZE every 3 hours seems a little severe. If rows are only deleted once a day, that's a complete waste of time, indeed. I'd suggest running a plain VACUUM just after the deletion pass is done. ANALYZEs are a different matter and possibly need to be done every few hours, seeing that your maximum timestamp value is constantly changing. monitor=# set enable_seqscan = false; SET monitor=# explain analyze select * from eventtable where timestamp CURRENT_TIMESTAMP - INTERVAL '10 minutes'; QUERY PLAN - Seq Scan on eventtable (cost=1.00..100019009.97 rows=136444 width=155) (actual time=9909.847..9932.438 rows=1763 loops=1) Filter: ((timestamp)::timestamp with time zone (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval)) Total runtime: 9934.353 ms You've got some datatype confusion, too. CURRENT_TIMESTAMP yields timestamp with time zone, and since you made the timestamp column timestamp without time zone, you've got a cross-type comparison which is not indexable (at least not in 7.4). My opinion is that you chose the wrong type for the column. Values that represent specific real-world instants should always be timestamp with time zone, so that they mean the same thing if you look at them in a different time zone. Another issue here is that because CURRENT_TIMESTAMP - INTERVAL '10 minutes' isn't a constant, the planner isn't able to make use of the statistics gathered by ANALYZE anyway. That's why the rowcount estimate has nothing to do with reality. Unless you force the decision with set enable_seqscan, the planner will never pick an indexscan with this rowcount estimate. The standard advice for getting around this is to hide the nonconstant calculation inside a function that's deliberately mislabeled immutable. For example, create function ago(interval) returns timestamp with time zone as 'select now() - $1' language sql strict immutable; select * from eventtable where timestamp ago('10 minutes'); The planner folds the ago('10 minutes') to a constant, checks the statistics, and should do the right thing. Note however that this technique may break if you put a call to ago() inside a function or prepared statement --- it's only safe in interactive queries, where you don't care that the value is reduced to a constant during planning instead of during execution. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Timestamp-based indexing
Litao Wu [EMAIL PROTECTED] writes: How about changing: CURRENT_TIMESTAMP - INTERVAL '10 minutes' to 'now'::timestamptz - INTERVAL '10 minutes' It seems to me that Postgres will treat it as a constant. Yeah, that works too, though again it might burn you if used inside a function or prepared statement. What you're doing here is to push the freezing of the now value even further upstream, namely to initial parsing of the command. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Timestamp-based indexing
It seems to me that Postgres will treat it as a constant. Yeah, that works too, though again it might burn you if used inside a function or prepared statement. What you're doing here is to push the freezing of the now value even further upstream, namely to initial parsing of the command. What I do in my apps to get postgres to use the timestamp indexes in some situations is to just generate the current timestamp in iso format and then just insert it into the query as a constant, for that run of the query. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] arrays and indexes
Tom Lane [EMAIL PROTECTED] writes: I still don't really know why it failed, but after two days building the index I gave up. Sounds like a bug to me. Could you put together a test case? At the time I contacted one of the GiST authors and we went over things for a while. They diagnosed the problem as being caused by having a poor selectivity GiST btree as the leading column in the index. He seemed to think this was fairly fundamental and wasn't something they were going to be able to address. And I was fairly certain I didn't want to turn the index upside down to have the more selective columns first (as is usually normal) for various reasons. So I gave it up as a lost cause. In any case in my application it was unlikely to really help. I expect that leading btree index to narrow the search to only a few hundred or few thousand records in the normal case. So the access times are already within reason even having to dig through all the records. And since other queries are likely to need other records from that set I'll need them all in cache eventually. There are a lot of array columns to search through, so the added i/o to read all those indexes would probably be a net loss when they push other things out of cache. I could try setting up a test case, but I think all it took was having a btree-gist index that was insufficiently selective. In my case I had about 900 integer values each on the order of 100-1000 records. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])