Re: [PERFORM] arrays and indexes

2004-07-26 Thread Greg Stark

Ross J. Reedstrom [EMAIL PROTECTED] writes:

 In the new schema, the same thing is:
 
 SELECT * from content where 42 = ANY (authors);
 
 Works fine, but for the life of me I can't find nor figure out how to
 build an index that will be used to speed this along. Any ideas?

Well that's basically the problem with denormalized data like this.

Have you resolved what you're going to do if two sessions try to add a user to
the same group at the same time? Or how you'll go about removing a user from
all his groups in one shot?

Basically, if you denormalize in this fashion it becomes hard to use the
groups as anything but single monolithic objects. Whereas normalized data can
be queried and updated from other points of view like in the case you name
above.

Postgres does have a way to do what you ask, though. It involves GiST indexes
and the operators from the contrib/intarray directory from the Postgres
source.

However I warn you in advance that this is fairly esoteric stuff and will take
some time to get used to. And at least in my case I found the indexes didn't
actually help much for my data sets, probably because they just weren't big
enough to benefit.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] hardware raid suggestions

2004-07-26 Thread Gaetano Mendola
Brian Hirt wrote:
I've  been using the adaptec ZCR raid cards in our servers for a while 
now, mostly small systems with 3 or 6 disks, and we've been very happy 
with them.   However, we're building a new DB machine with 14 U320 15K 
SCA drives, and we've run into a performance bottlenkeck with the ZCR 
card where it just won't scale well.  Without going into too many 
details, we've tested RAID5, RAID10 and RAID50 on pretty much every 
array size from 4-14 disks (raid 50 tests used more drives), using JFS, 
reiserfs and EXT3.  With every different configuration,  performance 
didn't improve after array size became greater than 6 disks..   We used 
various benchmarks, including pgbench with scale factors of 10, 100, 
1000, 5000 and clients of 10, 15, 30 and 45.  We've done many other 
tests and monitoring tools, and we've come to the conclusion that the 
ZCR is the problem.

We're looking into getting an Adaptec 2200S or the Megaraid 320 2x which 
have better processors, and hopefully better performance.  We feel that 
the use of the AIC7930 as the CPU on the ZCR  just doesn't cut it and a 
faster raid controller would work better. Does anyone out there have any 
experience with these cards with postgresql and linux?  If so, would you 
be willing to share your experiences and possibly give a recommendation?

Did you consider the option of use an external storage array ?
We are using the dell emc CX600
http://www.dell.com/downloads/emea/products/pvaul/en/Dell_EMC_cx600_specs.pdf
and I'm forgotting to have a disk behind...

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Insert are going slower ...

2004-07-26 Thread Gaetano Mendola
Josh Berkus wrote:
Herve'
I forgot to ask about your hardware.   How much RAM, and what's your disk 
setup?  CPU?


sort_mem =   512000

Huh?   Sort_mem is in K.   The above says that you've allocated 512MB sort 
mem.  Is this process the *only* thing going on on the machine?
And also is not system wide but let me say for backend...

Regards
Gaetano Mendola
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Insert are going slower ...

2004-07-26 Thread Gaetano Mendola
Hervé Piedvache wrote:
Josh,
Le mardi 13 Juillet 2004 19:10, Josh Berkus a écrit :
What can I do to get better results ?? (configuration option, and/or
hardware update ?)
What can I give you to get more important informations to help me ?
1) What PostgreSQL version are you using?

v7.4.3

2) What's your VACUUM, ANALYZE, VACUUM FULL, REINDEX schedule?

VACUUM FULL VERBOSE ANALYZE;
Every day after the calculation I was talking about ...

3) Can you list the non-default settings in your PostgreSQL.conf?
Particularly, shared_buffers, sort_mem, checkpoint_segments,
estimated_cache, and max_fsm_pages?


sort_mem =   512000
This is too much, you are instructing Postgres to use 512MB
for each backend ( some time each backend can use this quantity
more then one )
vacuum_mem = 409600
max_fsm_pages = 5000
 max_fsm_relations = 2000
50 milions ? HUG.
what tell you postgres in the log after performing
a vacuum full ?
max_files_per_process = 2000
wal_buffers = 1000
checkpoint_segments = 3
For massive insert you have to increase this number,
pump it up to 16

effective_cache_size = 500
5GB for 8 GB system is too much
random_page_cost = 3	
on your HW you can decrease it to 2
and also decrease the other cpu costs
Regards
Gaetano Mendola
BTW, I live in Paris too, if you need a hand...


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine
Hello --
To increase query (i.e. select) performance, we're trying to get 
postgres to use an index based on a timestamp column in a given table.

Event-based data is put into this table several times a minute, with the 
timestamp indicating when a particular row was placed in the table.

The table is purged daily, retaining only the rows that are less than 7 
days old.  That is, any row within the table is less than 1 week old (+ 
1 day, since the purge is daily).

A typical number of rows in the table is around 400,000.
A VACUUM FULL ANALYZE  is performed every 3 hours.
The problem:
We often query the table to extract those rows that are, say, 10 minutes 
old or less.

Given there are 10080 minutes per week, the planner could, properly 
configured, estimate the number of rows returned by such a query to be:

10 min/ 10080 min  *  400,000 = 0.001 * 400,000 = 400.
Making an index scan, with the timestamp field the index, far faster 
then a sequential scan.

However, we can't get the planner to do an timestamp-based index scan.
Anyone know what to do?
Here's the table specs:
monitor=# \d eventtable
   Table public.eventtable
 Column   |Type |  
Modifiers
---+-+--
timestamp | timestamp without time zone | not null default 
('now'::text)::timestamp(6) with time zone
key   | bigint  | not null default 
nextval('public.eventtable_key_seq'::text)
propagate | boolean |
facility  | character(10)   |
priority  | character(10)   |
host  | character varying(128)  | not null
message   | text| not null
Indexes:
   eventtable_pkey primary key, btree (timestamp, key)
   eventtable_host btree (host)
   eventtable_timestamp btree (timestamp)

Here's a query (with explain analyze):
monitor=# explain analyze select * from eventtable where timestamp  
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN

Seq Scan on eventtable  (cost=0.00..19009.97 rows=136444 width=155) 
(actual time=11071.073..11432.522 rows=821 loops=1)
  Filter: ((timestamp)::timestamp with time zone  
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 11433.384 ms
(3 rows)

Here's something strange.  We try to disable sequential scans, but to no 
avail.  The estimated cost skyrockets, though:

monitor=# set enable_seqscan = false;
SET
monitor=# explain analyze select * from eventtable where timestamp  
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
-
Seq Scan on eventtable  (cost=1.00..100019009.97 rows=136444 
width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
  Filter: ((timestamp)::timestamp with time zone  
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 9934.353 ms
(3 rows)

monitor=# set enable_seqscan = true;
SET
monitor=#

Any help is greatly appreciated :)
-- Harmon

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] arrays and indexes

2004-07-26 Thread Merlin Moncure
Ross wrote:
 Hi all -
 I've got a schema I'm working on modifying, nad I need some help
getting
 the best performance out. The orginal schema has a many to many
linkage
 between a couple tables, using a two column linkage table. This is
used
 to represent groups of people and their relationship to an object
 (authors, copyrightholders, maintainers) This worked fine, and, with
the
 right indixes, is quite zippy. Approximate schems:
 
 table content (
 contentid serial,
 name text,
 ...
 authorgroupid int,
 cpholdergroupid int,
 maintgroupid int)
 
 table groups (
 personid text,
 groupid int)
 
 Note that neither grouid nor personid are unique.
 
 Now the users want not just groups, but ordered lists. Well, that's
just
 fine: we could do it with another column in the groups linkage table,
 and some additional logic in the middleware for detecting identical
 groups, but it occured to me that PG's array types are just the ticket
 for ordered lists like this.
 
 So, by dropping arrays of personids (authors, copyrightholders,
 maintainers, ...) into the content table, I can do everything I need.
 
 Only one problem. Retreiving all the content for a particular
 person/role is fairly common. Queries of the form:
 
 SELECT * from content c join groups g on c.authorgroupid = g.personid
 where personid = 'ross';
 
 work fine and use the index on groups.personid.
 
 In the new schema, the same thing is:
 
 SELECT * from content where 42 = ANY (authors);
 
 Works fine, but for the life of me I can't find nor figure out how to
 build an index that will be used to speed this along. Any ideas?
 
 I'm using 7.4.3, BTW.

Arrays are usually a bad choice to put in your tables with a couple of
exceptions.  Keep in mind that you can generate the array in the query
stage using custom aggregates if you prefer to deal with them on the
client side.  The basic problem is they introduce flexibility issues and
are usually better handled by moving the data to a dependant table.

Here are cases you might want to consider using arrays in your tables:
1.  Your array bounds are small and known at design time (think: pay by
quarter example in the docs).
2.  Your array will not contain more than one or two dependant elements.
3.  You are dealing with an extreme performance situation and you have
tried doing things the proper way first.

There are other exceptions...arrays can be a powerful tool albeit a
dangerous one...just know what you are getting into.  A firm
understanding of relational principles are a tremendous help.

If your array bounds are known, it possible to get around the index
problem in limited cases by using a custom function (but only when the
array bounds are known:

create function any_quarter_over_10k (numeric[]) returns boolean as
'
select 
case 
  when $1[1] =  1 then true
when $1[2] =  1 then true
when $1[3] =  1 then true
when $1[4] =  1 then true
else false
end;

' language 'sql' IMMUTABLE;

create index t_q_10k_idx on t(any_quarter_over_10k(salary_qtr));

select * from t where any_quarter_over_10k(t.salary_qtr) = true;


Good luck!
Merlin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Kevin Barnard

Harmon S. Nine wrote:
monitor=# explain analyze select * from eventtable where timestamp  
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
Try
SELECT * FROM eventtable where timestamp BETWEEN  (CURRENT_TIMESTAMP - 
INTERVAL '10 minutes') AND CURRENT_TIMESTAMP;

This should will use a range off valid times.  What your query is doing 
is looking for 10 minutes ago to an infinate future.  Statically 
speaking that should encompass most of the table because you have an 
infinate range.  No index will be used.  If you assign a range the 
planner can fiqure out what you are looking for.

--
Kevin Barnard
Speed Fulfillment and Call Center
[EMAIL PROTECTED]
214-258-0120
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Matthew T. O'Connor
VACUUM FULL ANALYZE every 3 hours seems a little severe.  You will 
probably be be served just as well by VACUUM ANALYZE.  But you probably 
don't need the VACUUM part most of the time.   You might try doing an 
ANALYZE on the specific tables you are having issues with.  Since 
ANALYZE should be much quicker and not have the performance impact of a 
VACUUM, you could do it every hour, or even every 15 minutes.

Good luck...
Harmon S. Nine wrote:
Hello --
To increase query (i.e. select) performance, we're trying to get 
postgres to use an index based on a timestamp column in a given table.

Event-based data is put into this table several times a minute, with 
the timestamp indicating when a particular row was placed in the table.

The table is purged daily, retaining only the rows that are less than 
7 days old.  That is, any row within the table is less than 1 week old 
(+ 1 day, since the purge is daily).

A typical number of rows in the table is around 400,000.
A VACUUM FULL ANALYZE  is performed every 3 hours.
The problem:
We often query the table to extract those rows that are, say, 10 
minutes old or less.

Given there are 10080 minutes per week, the planner could, properly 
configured, estimate the number of rows returned by such a query to be:

10 min/ 10080 min  *  400,000 = 0.001 * 400,000 = 400.
Making an index scan, with the timestamp field the index, far faster 
then a sequential scan.

However, we can't get the planner to do an timestamp-based index scan.
Anyone know what to do?
Here's the table specs:
monitor=# \d eventtable
   Table public.eventtable
 Column   |Type |  
Modifiers
---+-+-- 

timestamp | timestamp without time zone | not null default 
('now'::text)::timestamp(6) with time zone
key   | bigint  | not null default 
nextval('public.eventtable_key_seq'::text)
propagate | boolean |
facility  | character(10)   |
priority  | character(10)   |
host  | character varying(128)  | not null
message   | text| not null
Indexes:
   eventtable_pkey primary key, btree (timestamp, key)
   eventtable_host btree (host)
   eventtable_timestamp btree (timestamp)

Here's a query (with explain analyze):
monitor=# explain analyze select * from eventtable where timestamp  
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
 

Seq Scan on eventtable  (cost=0.00..19009.97 rows=136444 width=155) 
(actual time=11071.073..11432.522 rows=821 loops=1)
  Filter: ((timestamp)::timestamp with time zone  
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 11433.384 ms
(3 rows)

Here's something strange.  We try to disable sequential scans, but to 
no avail.  The estimated cost skyrockets, though:

monitor=# set enable_seqscan = false;
SET
monitor=# explain analyze select * from eventtable where timestamp  
CURRENT_TIMESTAMP - INTERVAL '10 minutes';
QUERY PLAN
- 

Seq Scan on eventtable  (cost=1.00..100019009.97 rows=136444 
width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
  Filter: ((timestamp)::timestamp with time zone  
(('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
Total runtime: 9934.353 ms
(3 rows)

monitor=# set enable_seqscan = true;
SET
monitor=#

Any help is greatly appreciated :)
-- Harmon

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Stephan Szabo

On Mon, 26 Jul 2004, Harmon S. Nine wrote:

 However, we can't get the planner to do an timestamp-based index scan.

 Anyone know what to do?

I'd wonder if the type conversion is causing you problems.
CURRENT_TIMESTAMP - INTERVAL '10 minutes' is a timestamp with time zone
while the column is timestamp without time zone.  Casting
CURRENT_TIMESTAMP to timestamp without time zone seemed to make it able to
choose an index scan on 7.4.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Tom Lane
Matthew T. O'Connor [EMAIL PROTECTED] writes:
 VACUUM FULL ANALYZE every 3 hours seems a little severe.

If rows are only deleted once a day, that's a complete waste of time,
indeed.

I'd suggest running a plain VACUUM just after the deletion pass is done.
ANALYZEs are a different matter and possibly need to be done every
few hours, seeing that your maximum timestamp value is constantly
changing.

 monitor=# set enable_seqscan = false;
 SET
 monitor=# explain analyze select * from eventtable where timestamp  
 CURRENT_TIMESTAMP - INTERVAL '10 minutes';
 QUERY PLAN
 -
  
 
 Seq Scan on eventtable  (cost=1.00..100019009.97 rows=136444 
 width=155) (actual time=9909.847..9932.438 rows=1763 loops=1)
 Filter: ((timestamp)::timestamp with time zone  
 (('now'::text)::timestamp(6) with time zone - '@ 10 mins'::interval))
 Total runtime: 9934.353 ms

You've got some datatype confusion, too.  CURRENT_TIMESTAMP yields
timestamp with time zone, and since you made the timestamp column
timestamp without time zone, you've got a cross-type comparison which is
not indexable (at least not in 7.4).  My opinion is that you chose the
wrong type for the column.  Values that represent specific real-world
instants should always be timestamp with time zone, so that they mean
the same thing if you look at them in a different time zone.

Another issue here is that because CURRENT_TIMESTAMP - INTERVAL '10
minutes' isn't a constant, the planner isn't able to make use of the
statistics gathered by ANALYZE anyway.  That's why the rowcount estimate
has nothing to do with reality.  Unless you force the decision with
set enable_seqscan, the planner will never pick an indexscan with this
rowcount estimate.  The standard advice for getting around this is to
hide the nonconstant calculation inside a function that's deliberately
mislabeled immutable.  For example,

create function ago(interval) returns timestamp with time zone as
'select now() - $1' language sql strict immutable;

select * from eventtable where timestamp  ago('10 minutes');

The planner folds the ago('10 minutes') to a constant, checks the
statistics, and should do the right thing.  Note however that this
technique may break if you put a call to ago() inside a function
or prepared statement --- it's only safe in interactive queries,
where you don't care that the value is reduced to a constant during
planning instead of during execution.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Insert are going slower ...

2004-07-26 Thread Scott Marlowe
On Mon, 2004-07-26 at 08:20, Gaetano Mendola wrote:
 Herv Piedvache wrote:

SNIP

  sort_mem =   512000
 
 This is too much, you are instructing Postgres to use 512MB
 for each backend ( some time each backend can use this quantity
 more then one )

agreed.  If any one process needs this much sort mem, you can set it in
that sessions with set sort_mem anyway, so to let every sort consume up
to 512 meg is asking for trouble.

  effective_cache_size = 500
 
 5GB for 8 GB system is too much

No, it's not.  Assuming that postgresql with all it's shared buffers is
using 2 gig, it's quite likely that the kernel is caching at least 5
gigs of disk data.  Effective cache size doesn't set any cache size, it
tells the planner about how much the kernel is caching.

  random_page_cost = 3
 
 on your HW you can decrease it to 2
 and also decrease the other cpu costs

On fast machines it often winds up needing to be set somewhere around
1.2 to 2.0


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Harmon S. Nine




THAT WAS IT!!

Thank you very much.
Is there a way to change the type of "CURRENT_TIMESTAMP" to "timestamp
without time zone" so that casting isn't needed?


BTW, isn't this a bug?

-- Harmon


Stephan Szabo wrote:

  On Mon, 26 Jul 2004, Harmon S. Nine wrote:

  
  
However, we can't get the planner to do an timestamp-based index scan.

Anyone know what to do?

  
  
I'd wonder if the type conversion is causing you problems.
CURRENT_TIMESTAMP - INTERVAL '10 minutes' is a timestamp with time zone
while the column is timestamp without time zone.  Casting
CURRENT_TIMESTAMP to timestamp without time zone seemed to make it able to
choose an index scan on 7.4.

  






Re: [PERFORM] arrays and indexes

2004-07-26 Thread Ross J. Reedstrom
On Mon, Jul 26, 2004 at 02:27:20AM -0400, Greg Stark wrote:
 
 Ross J. Reedstrom [EMAIL PROTECTED] writes:
 
  In the new schema, the same thing is:
  
  SELECT * from content where 42 = ANY (authors);
  
  Works fine, but for the life of me I can't find nor figure out how to
  build an index that will be used to speed this along. Any ideas?
 
 Well that's basically the problem with denormalized data like this.
 
 Have you resolved what you're going to do if two sessions try to add a user to
 the same group at the same time? Or how you'll go about removing a user from
 all his groups in one shot?

We've got plenty of interlocks in the middleware to handle the first
(mainly because this is an authoring system where everyone has to agree
to participate, and acknowledge the open license on the materials)

Second, they _can't_ be removed: we're effectively a write only archive.
Even if we weren't it would be a rare event and could go slowly (loop
over groups in the middleware, probably)

 
 Basically, if you denormalize in this fashion it becomes hard to use the
 groups as anything but single monolithic objects. Whereas normalized data can
 be queried and updated from other points of view like in the case you name
 above.

These groups _really are_ ideal for Joe Conway's work on arrays: we need
ordered vectors, so we'd be sorting all the time, otherwise. They're
static, and they're read only. The one thing they're not is fixed, known
size (Sorry Merlin). They work fine for the query as shown: the only
issue is performance.

 Postgres does have a way to do what you ask, though. It involves GiST
 indexes and the operators from the contrib/intarray directory from the
 Postgres source.

Well, yes, that's how it used to be done. I figured the new array
support should be able to handle it without the addon, however.

 However I warn you in advance that this is fairly esoteric stuff and
 will take some time to get used to. And at least in my case I found
 the indexes didn't actually help much for my data sets, probably
 because they just weren't big enough to benefit.

I know that they should help in  this case: we've got lots of content.
Any particular author or maintainter will be in a small fraction of
those. i.e.: it's ideal for an index. And the current joined case uses
an index, when it's available. I'll take a look at the GiST/contrib work,
anyway. 

Thanks - 

Ross 
-- 
Ross Reedstrom, Ph.D. [EMAIL PROTECTED]
Research Scientist  phone: 713-348-6166
The Connexions Project  http://cnx.rice.edu   fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] arrays and indexes

2004-07-26 Thread Pierre-Frdric Caillaud

 SELECT * from content where 42 = ANY (authors);

Postgres does have a way to do what you ask, though. It involves GiST
indexes and the operators from the contrib/intarray directory from the
Postgres source.
I have tried to use these indexes, and the performance was very good. It
can be faster (in fact much faster) than a join with an additional table,
because you don't have a join. The SQL array syntax is a pain, though.
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] arrays and indexes

2004-07-26 Thread Greg Stark
Ross J. Reedstrom [EMAIL PROTECTED] writes:

 These groups _really are_ ideal for Joe Conway's work on arrays: we need
 ordered vectors, so we'd be sorting all the time, otherwise. They're
 static, and they're read only. The one thing they're not is fixed, known
 size (Sorry Merlin). They work fine for the query as shown: the only
 issue is performance.

Well just as long as you understand the trade-offs. Denormalizing can be
useful but you need to know what flexibility you're losing too.

  Postgres does have a way to do what you ask, though. It involves GiST
  indexes and the operators from the contrib/intarray directory from the
  Postgres source.
 
 Well, yes, that's how it used to be done. I figured the new array
 support should be able to handle it without the addon, however.

I think you can btree index arrays now, which is new, but it's not useful for
the kind of lookup you're doing. It would only be useful for joining on array
types or looking for groups with given content, or things like that.

  However I warn you in advance that this is fairly esoteric stuff and
  will take some time to get used to. And at least in my case I found
  the indexes didn't actually help much for my data sets, probably
  because they just weren't big enough to benefit.
 
 I know that they should help in  this case: we've got lots of content.
 Any particular author or maintainter will be in a small fraction of
 those. i.e.: it's ideal for an index. And the current joined case uses
 an index, when it's available. I'll take a look at the GiST/contrib work,
 anyway. 

I would be curious to know how it goes. My own project uses denormalized sets
stored as arrays as well, though in my case they're precalculated from the
fully normalized data. I tried to use GiST indexes but ran into problems
combining the btree-GiST code with array GiST code in a multicolumn index. I
still don't really know why it failed, but after two days building the index I
gave up.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] arrays and indexes

2004-07-26 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 I would be curious to know how it goes. My own project uses
 denormalized sets stored as arrays as well, though in my case they're
 precalculated from the fully normalized data. I tried to use GiST
 indexes but ran into problems combining the btree-GiST code with array
 GiST code in a multicolumn index. I still don't really know why it
 failed, but after two days building the index I gave up.

Sounds like a bug to me.  Could you put together a test case?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Litao Wu
Hi,

How about changing:

CURRENT_TIMESTAMP - INTERVAL '10 minutes'
to
'now'::timestamptz - INTERVAL '10 minutes'

It seems to me that Postgres will treat it as
a constant.

Thanks,

--- Tom Lane [EMAIL PROTECTED] wrote:
 Matthew T. O'Connor [EMAIL PROTECTED] writes:
  VACUUM FULL ANALYZE every 3 hours seems a little
 severe.
 
 If rows are only deleted once a day, that's a
 complete waste of time,
 indeed.
 
 I'd suggest running a plain VACUUM just after the
 deletion pass is done.
 ANALYZEs are a different matter and possibly need to
 be done every
 few hours, seeing that your maximum timestamp value
 is constantly
 changing.
 
  monitor=# set enable_seqscan = false;
  SET
  monitor=# explain analyze select * from
 eventtable where timestamp  
  CURRENT_TIMESTAMP - INTERVAL '10 minutes';
  QUERY PLAN
 

-
 
  
  Seq Scan on eventtable 
 (cost=1.00..100019009.97 rows=136444 
  width=155) (actual time=9909.847..9932.438
 rows=1763 loops=1)
  Filter: ((timestamp)::timestamp with time zone
  
  (('now'::text)::timestamp(6) with time zone - '@
 10 mins'::interval))
  Total runtime: 9934.353 ms
 
 You've got some datatype confusion, too. 
 CURRENT_TIMESTAMP yields
 timestamp with time zone, and since you made the
 timestamp column
 timestamp without time zone, you've got a cross-type
 comparison which is
 not indexable (at least not in 7.4).  My opinion is
 that you chose the
 wrong type for the column.  Values that represent
 specific real-world
 instants should always be timestamp with time zone,
 so that they mean
 the same thing if you look at them in a different
 time zone.
 
 Another issue here is that because CURRENT_TIMESTAMP
 - INTERVAL '10
 minutes' isn't a constant, the planner isn't able to
 make use of the
 statistics gathered by ANALYZE anyway.  That's why
 the rowcount estimate
 has nothing to do with reality.  Unless you force
 the decision with
 set enable_seqscan, the planner will never pick an
 indexscan with this
 rowcount estimate.  The standard advice for getting
 around this is to
 hide the nonconstant calculation inside a function
 that's deliberately
 mislabeled immutable.  For example,
 
 create function ago(interval) returns timestamp with
 time zone as
 'select now() - $1' language sql strict immutable;
 
 select * from eventtable where timestamp  ago('10
 minutes');
 
 The planner folds the ago('10 minutes') to a
 constant, checks the
 statistics, and should do the right thing.  Note
 however that this
 technique may break if you put a call to ago()
 inside a function
 or prepared statement --- it's only safe in
 interactive queries,
 where you don't care that the value is reduced to a
 constant during
 planning instead of during execution.
 
   regards, tom lane
 
 ---(end of
 broadcast)---
 TIP 9: the planner will ignore your desire to choose
 an index scan if your
   joining column's datatypes do not match
 




__
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Tom Lane
Litao Wu [EMAIL PROTECTED] writes:
 How about changing:

 CURRENT_TIMESTAMP - INTERVAL '10 minutes'
 to
 'now'::timestamptz - INTERVAL '10 minutes'

 It seems to me that Postgres will treat it as
 a constant.

Yeah, that works too, though again it might burn you if used inside a
function or prepared statement.  What you're doing here is to push the
freezing of the now value even further upstream, namely to initial
parsing of the command.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Timestamp-based indexing

2004-07-26 Thread Christopher Kings-Lynne
It seems to me that Postgres will treat it as
a constant.

Yeah, that works too, though again it might burn you if used inside a
function or prepared statement.  What you're doing here is to push the
freezing of the now value even further upstream, namely to initial
parsing of the command.
What I do in my apps to get postgres to use the timestamp indexes in 
some situations is to just generate the current timestamp in iso format 
and then just insert it into the query as a constant, for that run of 
the query.

Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] arrays and indexes

2004-07-26 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

  I still don't really know why it failed, but after two days building the
  index I gave up.
 
 Sounds like a bug to me.  Could you put together a test case?

At the time I contacted one of the GiST authors and we went over things for a
while. They diagnosed the problem as being caused by having a poor selectivity
GiST btree as the leading column in the index.

He seemed to think this was fairly fundamental and wasn't something they were
going to be able to address. And I was fairly certain I didn't want to turn
the index upside down to have the more selective columns first (as is usually
normal) for various reasons.

So I gave it up as a lost cause. In any case in my application it was unlikely
to really help. I expect that leading btree index to narrow the search to only
a few hundred or few thousand records in the normal case. So the access times
are already within reason even having to dig through all the records. And
since other queries are likely to need other records from that set I'll need
them all in cache eventually. There are a lot of array columns to search
through, so the added i/o to read all those indexes would probably be a net
loss when they push other things out of cache.

I could try setting up a test case, but I think all it took was having a
btree-gist index that was insufficiently selective. In my case I had about 900
integer values each on the order of 100-1000 records.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])