Re: [PERFORM] Comparing user attributes with bitwise operators
Patrick Clery <[EMAIL PROTECTED]> writes: > PLAN > - > Limit (cost=6.03..6.03 rows=1 width=68) (actual time=69.391..69.504 rows=10 > loops=1) >-> Sort (cost=6.03..6.03 rows=1 width=68) (actual time=69.381..69.418 rows=10 > loops=1) > Sort Key: age > -> Index Scan using people_attributes_search on people_attributes > (cost=0.00..6.02 rows=1 width=68) (actual time=0.068..61.648 rows=937 loops=1) >Index Cond: (('{30,31,32,33,34,35,36,37,38,39,40}'::integer[] && age) > AND ('{2}'::integer[] && gender) AND ('{1,2,4}'::integer[] && orientation)) > Total runtime: 69.899 ms > (6 rows) ... > - Is there a way of speeding up the sort? The sort seems to have only taken 8ms out of 69ms or just over 10%. As long as the index scan doesn't match too many records the sort should never be any slower so it shouldn't be the performance bottleneck. You might consider putting a subquery inside the order by with a limit to ensure that the sort never gets more than some safe maximum. Something like: select * from (select * from people_attributes where ... limit 1000) order by age limit 10 This means if the query matches more than 1000 it won't be sorted properly by age; you'll get the top 10 out of some random subset. But you're protected against ever having to sort more than 1000 records. > - Will using queries like " WHERE orientation IN (1,2,4) " be any better/worse? Well they won't use the GiST index, so no. If there was a single column with a btree index then this would be the cleanest way to go. > - The queries with the GiST index are faster, but is it of any benefit when > the int[] arrays all contain a single value? Well you've gone from 5 minutes to 60ms. You'll have to do more than one test to be sure but it sure seems like it's of some benefit. If they're always a single value you could make it an expression index instead and not have to change your data model. Just have the fields be integers individually and make an index as: create index idx on people_attributes using gist ( (array[age]) gist__int_ops, (array[gender]) gist__int_ops, ... ) However I would go one step further. I would make the index simply: create index idx on people_attributes using gist ( (array[age,gender,orientation,...]) gist__int_ops ) And ensure that all of these attributes have distinct domains. Ie, that they don't share any values. There are 4 billion integer values available so that shouldn't be an issue. Then you could use query_int to compare them the way you want. You misunderstood how query_int is supposed to work. You index an array column and then you can check it against a query_int just as you're currently checking for overlap. Think of @@ as a more powerful version of the overlap operator that can do complex logical expressions. The equivalent of where '{30,31,32,33,34,35,36,37,38,39,40}'::int[] && age and '{2}'::int[] && gender and '{1,2,4}'::int[] && orientation would then become: WHERE array[age,gender,orientation] @@ '(30|31|32|33|34|35|36|37|38|39|40)&(2)&(1|2|4)' except you would have to change orientation and gender to not both have a value of 2. You might consider doing the expression index a bit of overkill actually. You might consider just storing a column "attributes" with an integer array directly in the table. You would also want a table that lists the valid attributes to be sure not to have any overlaps: 1 age 1 2 age 2 ... 101 gender male 102 gender female 103 orientation straight 104 orientation gay 105 orientation bi 106 bodytype scrawny ... > - Is there any hope for this structure? You'll have to test this carefully. I tried using GiST indexes for my project and found that I couldn't load the data and build the GiST indexes fast enough. You have to test the costs of building and maintaining this index, especially since it has so many columns in it. But it looks like your queries are in trouble without it so hopefully it'll be ok on the insert/update side for you. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Comparing user attributes with bitwise operators
I have currently implemented a schema for my "Dating Site" that is storing user search preferences and user attributes in an int[] array using the contrib/intarray package (suggested by Greg Stark). But there are a few problems. a) query_int can't be cast to int4. b) query_int can't be indexed. datingsite=> alter table people_attributes add column bla query_int; ALTER TABLE datingsite=> create index idx_query_int on people_attributes (bla); ERROR: data type query_int has no default operator class for access method "btree" HINT: You must specify an operator class for the index or define a default operator class for the data type. datingsite=> create index idx_query_int on people_attributes (bla gist__int_ops); ERROR: operator class "gist__int_ops" does not exist for access method "btree" datingsite=> alter table people_attributes drop column bla; ALTER TABLE c) query_int can only be used in one operation against int[]: README.intarray: int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)') It is not possible to use >=, <=, =, etc. Also, this operator does not work like example says: datingsite=> select '{2,3}'::int[] @@ '1'::query_int; ?column? -- f (1 row) d) I can't find a way to simply check if an integer is an array without declaring it as an array; Therefore, I need to use an int[] type for a column that will only be storing one int4 if I want to compare it to an int[] array: README.intarray: int[] && int[] - overlap - returns TRUE if arrays has at least one common elements. e) int[] and query_int are somewhat ugly to deal with since query_int needs to be quoted as a string, and int[] is returned as '{1,2,3}'. Or maybe I'm just being anal :) Because of these limitations, I've chosen to declare the attribute columns as int[] arrays (even though they will only contain one value) so that I can use '{1,2,3}'::int[] && column_name: README.intarray: int[] && int[] - overlap - returns TRUE if arrays has at least one common elements. Here is the schema: create table people ( person_id serial, datecreated timestamp with time zone default now (), signup_ip cidr not null, username character varying(30) not null, password character varying(28) not null, email character varying(65) not null, dob date not null, primary key (person_id) ); create table people_attributes ( person_id int references people (person_id) on delete cascade initially deferred, askmecount int not null default 0, age int[] not null default '{1}'::int[], gender int[] not null default '{1}'::int[], orientation int[] not null default '{1}'::int[], bodytype int[] not null default '{1}'::int[], children int[] not null default '{1}'::int[], drinking int[] not null default '{1}'::int[], education int[] not null default '{1}'::int[], ethnicity int[] not null default '{1}'::int[], eyecolor int[] not null default '{1}'::int[], haircolor int[] not null default '{1}'::int[], hairstyle int[] not null default '{1}'::int[], height int[] not null default '{1}'::int[], income int[] not null default '{1}'::int[], occupation int[] not null default '{1}'::int[], relation int[] not null default '{1}'::int[], /* multiple answer */ religion int[] not null default '{1}'::int[], seeking int[] not null default '{1}'::int[], /* multiple answer */ smoking int[] not null default '{1}'::int[], want_children int[] not null default '{1}'::int[], weight int[] not null default '{1}'::int[], primary key (person_id) ) without oids; create index people_attributes_search on people_attributes using gist ( age gist__int_ops, gender gist__int_ops, orientation gist__int_ops, bodytype gist__int_ops, children gist__int_ops, drinking gist__int_ops, education gist__int_ops, ethnicity gist__int_ops, eyecolor gist__int_ops, haircolor gist__int_ops, hairstyle gist__int_ops, height gist__int_ops, income gist__int_ops, occupation gist__int_ops, relation gist__int_ops, religion gist__int_ops, seeking gist__int_ops, smoking gist__int_ops, want_children gist__int_ops, weight gist__int_ops ); /* These will be compared against the people_attributes table */ create table people_searchprefs ( person_id int references people (person_id) on delete cascade initially deferred, age int[] not null default '{18,19,20,21,22,23,24,25,26,27,28,29,30}'::int[], gender int[] not null default '{1,2,4}'::int[], orientation int[] not null default '{1,2,8}'::int[], bodytype int[] not null default '{1,2,3,4,5,6}'::int[], children int[] not null default '{0}'::int[], drinking int[] not null default '{0}'::int[], education int[] not null default '{0}'::int[], ethnicity int[] not null default '{0}'::int[], eyeco
Re: [PERFORM] Planner having way wrong estimate for group aggregate
On Sat, Sep 18, 2004 at 03:48:13PM -0400, Tom Lane wrote: > 7.4 doesn't have any statistics on expression indexes. 8.0 will do what > you want though. (I just fixed an oversight that prevented it from > doing so...) OK, so I'll have to wait for 8.0.0beta3 or 8.0.0 (I tried 8.0.0beta2, it gave me zero difference) -- fortunately, I can probably wait at the rate everything else is progressing here. :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Planner having way wrong estimate for group aggregate
"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes: > Now, my first notion was creating a functional index to help the planner: > ... > However, this obviously didn't help the planner (this came as a surprise to > me, but probably won't come as a surprise to the more seasoned users here :-) 7.4 doesn't have any statistics on expression indexes. 8.0 will do what you want though. (I just fixed an oversight that prevented it from doing so...) > Actually, it seems that the higher I set statistics on "tid", the worse the > estimate becomes. I believe that the estimate of number of groups will be exactly the same as the estimate of the number of values of tid --- there's no knowledge that date_trunc() might reduce the number of distinct values. > Any ideas for speeding this up? In 7.4, the only way I can see to force this to use a hash aggregate is to temporarily set enable_sort false or raise sort_mem. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?
Mischa Sandberg wrote: In the meantime, what I gather from browsing mail archives is that postgresql on Solaris seems to get hung up on IO rather than CPU. Well, people more knowledgeable in the secrets of postgres seem confident that this is not your problem. Fortunetly, however, there is a simple way to find out. Just download the utinyint var type from pgfoundry (http://pgfoundry.org/projects/sql2pg/). There are some stuff there you will need to compile yourself from CVS. I'm sorry, but I haven't done a proper release just yet. In any case, the utinyint type should provide you with the data type you seek, and thus allow you to find out whether this is, indeed, the problem. -- Shachar Shemesh Lingnu Open Source Consulting ltd. http://www.lingnu.com/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Planner having way wrong estimate for group aggregate
Hi, I'm using PostgreSQL 7.4 on a table with ~700.000 rows looking like this: Table "public.enkeltsalg" Column | Type | Modifiers +--+--- id | integer | not null default nextval('"enkeltsalg_id_seq"'::text) kommentar | text | not null default ''::text antall | numeric(14,4)| not null belop | numeric(10,0)| not null type | character(1) | not null tid| timestamp with time zone | default now() eksternid | integer | kasseid| integer | baraapning | integer | salgspris | integer | firma | integer | bongid | integer | Indexes: "enkeltsalg_pkey" primary key, btree (id) "enkeltsalg_aapn" btree (baraapning) "enkeltsalg_aapn_pris" btree (baraapning, salgspris) "enkeltsalg_aapn_type" btree (baraapning, "type") "enkeltsalg_pris" btree (salgspris) Check constraints: "enkeltsalg_type_valid" CHECK ("type" = 'K'::bpchar OR "type" = 'B'::bpchar OR "type" = 'M'::bpchar OR "type" = 'T'::bpchar) And I'm doing the query (after VACUUM ANALYZE) smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN -- GroupAggregate (cost=108062.34..114477.98 rows=172735 width=17) (actual time=20977.544..23890.020 rows=361 loops=1) -> Sort (cost=108062.34..109912.99 rows=740263 width=17) (actual time=20947.372..21627.107 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..18010.29 rows=740263 width=17) (actual time=0.091..7180.528 rows=710720 loops=1) Total runtime: 23908.538 ms (5 rows) Now, as you can see, the GroupAggregate here is _way_ off, so the planner makes the wrong choice (it should do a hash aggregate). If I set sort_mem to 131072 instead of 16384, it does a hash aggregate (which is 10 seconds instead of 24), but I can't have sort_mem that high generally. Now, my first notion was creating a functional index to help the planner: smt=# create index enkeltsalg_dag on enkeltsalg ( date_trunc('day',tid) ); CREATE INDEX smt=# vacuum analyze; VACUUM However, this obviously didn't help the planner (this came as a surprise to me, but probably won't come as a surprise to the more seasoned users here :-) ): smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN -- GroupAggregate (cost=103809.15..110017.11 rows=175512 width=17) (actual time=21061.357..23917.370 rows=361 loops=1) -> Sort (cost=103809.15..105585.95 rows=710720 width=17) (actual time=21032.239..21695.674 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..17641.00 rows=710720 width=17) (actual time=0.091..7231.387 rows=710720 loops=1) Total runtime: 23937.791 ms (5 rows) I also tried to increase the statistics on the "tid" column: smt=# alter table enkeltsalg alter column tid set statistics 500; ALTER TABLE smt=# analyze enkeltsalg; ANALYZE However, this made the planner only do a _worse_ estimate: smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN -- GroupAggregate (cost=107906.59..114449.09 rows=199715 width=17) (actual time=20947.197..23794.389 rows=361 loops=1) -> Sort (cost=107906.59..109754.56 rows=739190 width=17) (actual time=20918.001..21588.735 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..17996.88 rows=739190 width=17) (actual time=0.092..7166.488 rows=710720 loops=1) Total runtime: 23814.624 ms (5 rows) Actually, it seems that the higher I set statistics on "tid", the worse the estimate becomes. Also, I was told (on #postgresql :-) ) to include the following information: smt
Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?
Mischa Sandberg <[EMAIL PROTECTED]> writes: > Our product (Sophos PureMessage) runs on a Postgres database. > Some of our Solaris customers have Oracle licenses, and they've > commented on the performance difference between Oracle and Postgresql > on such boxes. In-house, we've noticed the 2:1 (sometimes 5:1) > performance difference in inserting rows (mostly 2-4K), between > Postgresql on Solaris 8 and on Linux, for machines with comparable > CPU's and RAM. You haven't given any evidence at all to say that I/O is where the problem is. I think it would be good first to work through the conventional issues such as configuration parameters, foreign key problems, etc. Give us some more detail about the slow INSERT queries ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Tryint to match Solaris-Oracle performance with directio?
Our product (Sophos PureMessage) runs on a Postgres database. Some of our Solaris customers have Oracle licenses, and they've commented on the performance difference between Oracle and Postgresql on such boxes. In-house, we've noticed the 2:1 (sometimes 5:1) performance difference in inserting rows (mostly 2-4K), between Postgresql on Solaris 8 and on Linux, for machines with comparable CPU's and RAM. These (big) customers are starting to ask, why don't we just port our dataserver to Oracle for them? I'd like to avoid that, if possible :-) What we can test on, in-house are leetle Sun workstations, while some of our customers have BIG Sun iron --- so I have no means to-date to reproduce what their bottleneck is :-( Yes, it has been recommended that we talk to Sun about their iForce test lab ... that's in the pipe. In the meantime, what I gather from browsing mail archives is that postgresql on Solaris seems to get hung up on IO rather than CPU. Furthermore, I notice that Oracle and now MySQL use directio to bypass the system cache, when doing heavy writes to the disk; and Postgresql does not. Not wishing to alter backend/store/file for this test, I figured I could get a customer to mount the UFS volume for pg_xlog with the option "forcedirectio". Any comment on this? No consideration of what the wal_sync_method is at this point. Presumably it's defaulting to fdatasync on Solaris. BTW this is Postgres 7.4.1, and our customers are Solaris 8 and 9. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] CPU maximized out!
Hi, there, I am running PostgreSQL 7.3.4 on MAC OS X G5 with dual processors and 8GB memory. The shared buffer was set as 512MB. The database has been running great until about 10 days ago when our developers decided to add some indexes to some tables to speed up certain uploading ops. Now the CPU usage reaches 100% constantly when there are a few users accessing their information by SELECT tables in databases. If I REINEX all the indexes, the database performance improves a bit but before long, it goes back to bad again. My suspicion is that since now a few indexes are added, every ops are run by PostgreSQL with the indexes being used when calculating cost. This leads to the downgrade of performance. What do you think of this? What is the possible solution? Thanks! Qing The following is the output from TOP command: Processes: 92 total, 4 running, 88 sleeping... 180 threads 13:09:18 Load Avg: 2.81, 2.73, 2.50 CPU usage: 95.2% user, 4.8% sys, 0.0% idle SharedLibs: num = 116, resident = 11.5M code, 1.66M data, 4.08M LinkEdit MemRegions: num = 12132, resident = 148M + 2.82M private, 403M shared PhysMem: 435M wired, 5.04G active, 2.22G inactive, 7.69G used, 316M free VM: 32.7G + 81.5M 5281127(13) pageins, 8544145(0) pageouts PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 27314 postgres92.2% 2:14.75 1 949 12.8M+ 396M 75.0M+ 849M 26099 postgres91.1% 19:28.04 1 967 15.9M+ 396M 298M+ 850M 24754 top 2.8% 4:48.33 12926 272K 404K 648K 27.1M 0 kernel_tas 1.9% 2:12:05 40 2 8476 67.1M 0K 281M 1.03G 294 hwmond 0.5% 2:26:34 87557 240K 544K 1.09M 31.0M 347 lookupd 0.3% 1:52:28 23573 3.05M 648K 3.14M 33.6M 89 configd 0.1% 53:05.16 3 126 151 304K 644K 832K 29.2M 26774 servermgrd 0.1% 0:02.93 11040 344K- 1.17M+ 1.86M 28.2M 170 coreservic 0.1% 0:09.04 14093 152K 532K 2.64M 28.5M 223 DirectoryS 0.1% 19:42.47 884 135 880K+ 1.44M 4.60M+ 37.1M+ 125 dynamic_pa 0.0% 0:26.79 1121716K 292K28K 17.7M 87 kextd0.0% 0:01.23 21721 0K 292K36K 28.2M 122 update 0.0% 14:27.71 1 91516K 300K44K 17.6M 1 init 0.0% 0:00.03 1121628K 320K76K 17.6M 2 mach_init0.0% 3:36.18 2951876K 320K 148K 18.2M 81 syslogd 0.0% 0:19.96 1101796K 320K 148K 17.7M ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly