[PERFORM] View columns calculated
Folks, I have a question about views: I want to have a fairly wide view (lots of columns) where most of the columns have some heavyish calculations in them, but I'm concerned that it will have to calculate every column even when I'm not selecting them. So, the question is, if I have 5 columns in a view but only select 1 column, is the system smart enough to not calculate the unused columns, or am I taking a performance hit over a smaller view that doesn't have the extra 4 columns? Thanks, Peter Darley ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] View columns calculated
"Peter Darley" <[EMAIL PROTECTED]> writes: > I have a question about views: I want to have a fairly wide view (lots of > columns) where most of the columns have some heavyish calculations in them, > but I'm concerned that it will have to calculate every column even when I'm > not selecting them. So, the question is, if I have 5 columns in a view but > only select 1 column, is the system smart enough to not calculate the unused > columns, It depends on what the rest of your view looks like. If the view is simple enough to be "flattened" into the parent query then the unused columns will disappear into the ether. If it's not flattenable then they will get evaluated. You can check by seeing whether an EXPLAIN shows a separate "subquery scan" node corresponding to the view. (Without bothering to look at the code, an unflattenable view is one that uses GROUP BY, DISTINCT, aggregates, ORDER BY, LIMIT, UNION, INTERSECT, EXCEPT, probably a couple other things.) regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] configure shmmax on MAC OS X
Hello, I found that if you SHMALL value was less than your SHMMAX value, the value wouldn't take. J Tom Lane wrote: Qing Zhao <[EMAIL PROTECTED]> writes: My suspision is that the change i made in /etc/rc does not take effect.Is there a way to check it? sysctl has an option to show the values currently in effect. I believe that /etc/rc is the correct place to set shmmax on OSX 10.3 or later ... but we have seen prior reports of people having trouble getting the setting to "take". There may be some other constraint involved. sysctl -w kern.sysv.shmmax=4294967296 // byte Hmm, does sysctl work for values that exceed the range of int? There's no particularly good reason to try to set shmmax as high as you are trying anyhow; you really don't need more than a couple hundred meg in Postgres shared memory. It's better to leave the kernel to manage the bulk of your RAM. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] configure shmmax on MAC OS X
Qing Zhao <[EMAIL PROTECTED]> writes: > My suspision is that the change i made in /etc/rc does not take > effect.Is there a way to check it? sysctl has an option to show the values currently in effect. I believe that /etc/rc is the correct place to set shmmax on OSX 10.3 or later ... but we have seen prior reports of people having trouble getting the setting to "take". There may be some other constraint involved. > sysctl -w kern.sysv.shmmax=4294967296 // byte Hmm, does sysctl work for values that exceed the range of int? There's no particularly good reason to try to set shmmax as high as you are trying anyhow; you really don't need more than a couple hundred meg in Postgres shared memory. It's better to leave the kernel to manage the bulk of your RAM. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] configure shmmax on MAC OS X
Tom: I used sysctl -A to see the kernel state, I got: kern.sysv.shmmax: -1 It looks the value is too big! Thanks! Qing On Apr 13, 2004, at 12:55 PM, Tom Lane wrote: Qing Zhao <[EMAIL PROTECTED]> writes: My suspision is that the change i made in /etc/rc does not take effect.Is there a way to check it? sysctl has an option to show the values currently in effect. I believe that /etc/rc is the correct place to set shmmax on OSX 10.3 or later ... but we have seen prior reports of people having trouble getting the setting to "take". There may be some other constraint involved. sysctl -w kern.sysv.shmmax=4294967296 // byte Hmm, does sysctl work for values that exceed the range of int? There's no particularly good reason to try to set shmmax as high as you are trying anyhow; you really don't need more than a couple hundred meg in Postgres shared memory. It's better to leave the kernel to manage the bulk of your RAM. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] query slows down with more accurate stats
Robert Treat <[EMAIL PROTECTED]> writes: > live=# analyze cl; > ANALYZE > live=# select reltuples from pg_class where relname = 'cl'; > reltuples > --- > 53580 > (1 row) > live=# vacuum cl; > VACUUM > live=# select reltuples from pg_class where relname = 'cl'; > reltuples > - > 1.14017e+06 > (1 row) Well, the first problem is why is ANALYZE's estimate of the total row count so bad :-( ? I suspect you are running into the situation where the initial pages of the table are thinly populated and ANALYZE mistakenly assumes the rest are too. Manfred is working on a revised sampling method for ANALYZE that should fix this problem in 7.5 and beyond, but for now it seems like a VACUUM FULL might be in order. > so i guess i am wondering if there is something I should be doing to > help get the better plan at the more accurate stats levels and/or why it > doesn't stick with the original plan (I noticed disabling merge joins > does seem to push it back to the original plan). With the larger number of estimated rows it's figuring the nestloop will be too expensive. The row estimate for the cl scan went up from 1248 to 10546, so the estimated cost for the nestloop plan would go to about 24 units vs 8 for the mergejoin plan. This is obviously off rather badly when the true runtimes are 1.7 vs 8.1 seconds :-(. I think this is an example of a case where we really need better estimation of nestloop costs --- it's drastically overestimating the relative cost of the nestloop because it's not accounting for the cache benefits of the repeated index searches. You could probably force the nestloop to be chosen by lowering random_page_cost, but that's just a kluge solution ... the real problem is the model is wrong. I have a to-do item to work on this, and will try to bump up its priority a bit. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] configure shmmax on MAC OS X
On OS X, I've always made these changes in: /System/Library/StartupItems/SystemTuning/SystemTuning and manually checked it with sysctl after reboot. Works for me. 100k buffers is probably overkill. There can be a performance penalty with too many buffers. See this lists' archives for more. 10k would probably be a better start. - Jeff >Hi, all, > >I have got a new MaC OS G5 with 8GB RAM. So i tried to increase >the shmmax in Kernel so that I can take advantage of the RAM. > >I searched the web and read the manual for PG7.4 chapter 16.5.1. >After that, I edited /etc/rc file: > >sysctl -w kern.sysv.shmmax=4294967296 // byte >sysctl -w kern.sysv.shmmin=1 >sysctl -w kern.sysv.shmmni=32 >sysctl -w kern.sysv.shmseg=8 >sysctl -w kern.sysv.shmall=1048576 //4kpage > >for 4G shared RAM. > >Then I changed postgresql.conf: >shared_buffer=10 //could be bigger? > >and restart the machine and postgres server. To my surprise, postgres server wouldn't >start, saying that the requested shared memory exceeds kernel's shmmax. > >My suspision is that the change i made in /etc/rc does not take effect.Is there a way >to check it? Is there an >up limit for how much RAM can be allocated for shared buffer in MAC OS X? Or >is there something wrong with my calculation in numbers? > >Thanks a lot! > >Qing > > >---(end of broadcast)--- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Jeff Bohmer VisionLink, Inc. _ 303.402.0170 www.visionlink.org _ People. Tools. Change. Community. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] configure shmmax on MAC OS X
Hi, all, I have got a new MaC OS G5 with 8GB RAM. So i tried to increase the shmmax in Kernel so that I can take advantage of the RAM. I searched the web and read the manual for PG7.4 chapter 16.5.1. After that, I edited /etc/rc file: sysctl -w kern.sysv.shmmax=4294967296 // byte sysctl -w kern.sysv.shmmin=1 sysctl -w kern.sysv.shmmni=32 sysctl -w kern.sysv.shmseg=8 sysctl -w kern.sysv.shmall=1048576 //4kpage for 4G shared RAM. Then I changed postgresql.conf: shared_buffer=10 //could be bigger? and restart the machine and postgres server. To my surprise, postgres server wouldn't start, saying that the requested shared memory exceeds kernel's shmmax. My suspision is that the change i made in /etc/rc does not take effect.Is there a way to check it? Is there an up limit for how much RAM can be allocated for shared buffer in MAC OS X? Or is there something wrong with my calculation in numbers? Thanks a lot! Qing ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] index v. seqscan for certain values
On Tue, 2004-04-13 at 14:04, Jeremy Dunn wrote: > > > There's a hard limit of 1000, I believe. Didn't it give you > > a warning saying so? > > No warning at 2000, and no warning at 100,000 either! > > Remember we are still on 7.2.x. The docs here > http://www.postgresql.org/docs/7.2/static/sql-altertable.html don't say > anything about a limit. > > This is good to know, if it's true. Can anyone confirm? > transform=# alter table data_pull alter column msg set statistics 10; WARNING: lowering statistics target to 1000 ERROR: column "msg" of relation "data_pull" does not exist transform=# select version(); version PostgreSQL 7.4beta4 on i686-pc-linux-gnu, compiled by GCC 2.96 (1 row) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index v. seqscan for certain values
> There's a hard limit of 1000, I believe. Didn't it give you > a warning saying so? No warning at 2000, and no warning at 100,000 either! Remember we are still on 7.2.x. The docs here http://www.postgresql.org/docs/7.2/static/sql-altertable.html don't say anything about a limit. This is good to know, if it's true. Can anyone confirm? - Jeremy ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] query slows down with more accurate stats
In the process of optimizing some queries, I have found the following query seems to degrade in performance the more accurate I make the statistics on the table... whether by using increased alter table ... set statistics or by using vacuum.. SELECT count( cl.caller_id ), npanxx.city, npanxx.state FROM cl LEFT OUTER JOIN npanxx on substr( cl.caller_id, 1, 3 ) = npanxx.npa and substr( cl.caller_id, 4, 3 ) = npanxx.nxx LEFT OUTER JOIN cp ON cl.caller_id = cp.caller_id WHERE cl.ivr_system_id = 130 AND cl.call_time > '2004-03-01 06:00:00.0 CST' AND cl.call_time < '2004-04-01 06:00:00.0 CST' AND cp.age >= 18 AND cp.age <= 24 AND cp.gender = 'm' GROUP BY npanxx.city, npanxx.state live=# analyze cl; ANALYZE live=# select reltuples from pg_class where relname = 'cl'; reltuples --- 53580 (1 row) live=# select count(*) from cl; count - 1140166 (1 row) The plan i get under these conditions is actually pretty good... HashAggregate (cost=28367.22..28367.66 rows=174 width=32) (actual time=1722.060..1722.176 rows=29 loops=1) -> Nested Loop (cost=0.00..28365.92 rows=174 width=32) (actual time=518.592..1716.254 rows=558 loops=1) -> Nested Loop Left Join (cost=0.00..20837.33 rows=1248 width=32) (actual time=509.991..1286.755 rows=13739 loops=1) -> Index Scan using cl_ivr_system_id on cl (cost=0.00..13301.15 rows=1248 width=14) (actual time=509.644..767.421 rows=13739 loops=1) Index Cond: (ivr_system_id = 130) Filter: ((call_time > '2004-03-01 07:00:00-05'::timestamp with time zone) AND (call_time < '2004-04-01 07:00:00-05'::timestamp with time zone)) -> Index Scan using npanxx_pkey on npanxx (cost=0.00..6.02 rows=1 width=32) (actual time=0.025..0.027 rows=1 loops=13739) Index Cond: ((substr(("outer".caller_id)::text, 1, 3) = (npanxx.npa)::text) AND (substr(("outer".caller_id)::text, 4, 3) = (npanxx.nxx)::text)) -> Index Scan using cp_pkey on cp (cost=0.00..6.02 rows=1 width=14) (actual time=0.027..0.027 rows=0 loops=13739) Index Cond: (("outer".caller_id)::text = (cp.caller_id)::text) Filter: ((age >= 18) AND (age <= 24) AND (gender = 'm'::bpchar)) Total runtime: 1722.489 ms (12 rows) but when i do live=# vacuum cl; VACUUM live=# select reltuples from pg_class where relname = 'cl'; reltuples - 1.14017e+06 (1 row) (or alternatively increase the stats target on the table) I now get the following plan: HashAggregate (cost=80478.74..80482.41 rows=1471 width=32) (actual time=8132.261..8132.422 rows=29 loops=1) -> Merge Join (cost=79951.95..80467.70 rows=1471 width=32) (actual time=7794.338..8130.041 rows=558 loops=1) Merge Cond: ("outer"."?column4?" = "inner"."?column2?") -> Sort (cost=55719.06..55745.42 rows=10546 width=32) (actual time=4031.827..4052.526 rows=13739 loops=1) Sort Key: (cl.caller_id)::text -> Merge Right Join (cost=45458.30..55014.35 rows=10546 width=32) (actual time=2944.441..3796.787 rows=13739 loops=1) Merge Cond: ((("outer".npa)::text = "inner"."?column2?") AND (("outer".nxx)::text = "inner"."?column3?")) -> Index Scan using npanxx_pkey on npanxx (cost=0.00..8032.99 rows=132866 width=32) (actual time=0.200..461.767 rows=130262 loops=1) -> Sort (cost=45458.30..45484.67 rows=10546 width=14) (actual time=2942.994..2967.935 rows=13739 loops=1) Sort Key: substr((cl.caller_id)::text, 1, 3), substr((cl.caller_id)::text, 4, 3) -> Seq Scan on cl (cost=0.00..44753.60 rows=10546 width=14) (actual time=1162.423..2619.662 rows=13739 loops=1) Filter: ((ivr_system_id = 130) AND (call_time > '2004-03-01 07:00:00-05'::timestamp with time zone) AND (call_time < '2004-04-01 07:00:00-05'::timestamp with time zone)) -> Sort (cost=24232.89..24457.06 rows=89667 width=14) (actual time=3761.703..3900.340 rows=98010 loops=1) Sort Key: (cp.caller_id)::text -> Seq Scan on cp (cost=0.00..15979.91 rows=89667 width=14) (actual time=0.128..1772.215 rows=100302 loops=1) Filter: ((age >= 18) AND (age <= 24) AND (gender = 'm'::bpchar)) Total runtime: 8138.607 ms (17 rows) so i guess i am wondering if there is something I should be doing to help get the better plan at the more accurate stats levels and/or why it doesn't stick with the original plan (I noticed disabling merge joins does seem to push it back to the original plan). alternatively if anyone has any general suggestions on speeding up the query I'd be open to that too :-
Re: [PERFORM] index v. seqscan for certain values
"Jeremy Dunn" <[EMAIL PROTECTED]> writes: > Interestingly, I tried increasing the stat size for the CID column to > 2000, analyzing, and checking the accuracy of the stats again. There's a hard limit of 1000, I believe. Didn't it give you a warning saying so? At 1000 the ANALYZE sample size would be 30 rows, or about a quarter of your table. I would have thought this would give frequency estimates with much better precision than you seem to be seeing --- but my statistics are rusty enough that I'm not sure about it. Possibly the nonuniform clumping of CID has something to do with the poor results. Any stats majors on the list? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] index v. seqscan for certain values
> > When I just tried it again with a value of 300, analyze, > then run the query, I get a *worse* result for an estimate. I don't understand > > this. > > That's annoying. How repeatable are these results --- if you > do ANALYZE over again several times, how much does the row > count estimate change each time? (It should change somewhat, > since ANALYZE is taking a random sample, but one would like > to think not a whole lot.) Is the variance more or less at > the higher stats target? Take a look at a few different CID > values to get a sense of the accuracy, don't look at just one ... Yes, it's repeatable. I tried a bunch of times, and there are only small variations in the stats for the higher stat targets. > (Actually, you might find it more profitable to look at the > pg_stats entry for the CID column rather than > reverse-engineering the stats via ANALYZE. Look at how well > the most-common-values list and associated frequency numbers > track reality.) I checked the accuracy of the stats for various values, and there is a wide variation. I see some values where the estimate is 1.75x the actual; and others where the estimate is .44x the actual. > Also, can you think of any reason for the distribution of CID > values to be nonuniform within the table? For instance, do > rows get inserted in order of increasing CID, or is there any > clustering of rows with the same CID? This is almost certainly the answer. The data is initially inserted in chunks for each CID, and later on there is a more normal distribution of insert/update/deletes across all CIDs; and then again a new CID will come with a large chunk of rows, etc. Interestingly, I tried increasing the stat size for the CID column to 2000, analyzing, and checking the accuracy of the stats again. Even with this relatively high value, the accuracy of the stats is not that close. The value giving .44x previously nows gives an estimate .77x of actual. Another value which was at 1.38x of actual is now at .71x of actual! Then just for kicks I set the statistics size to 100,000 (!), analyzed, and ran the query again. For the same CID I still got an estimated row count that is .71x the actual rows returned. Why is this not better? I wonder how high I'd have to set the statistics collector to get really good data, given the uneven data distribution of this table. Is there any other technique that works better to get good estimates, given uneven distribution of values? So I think this explains the inaccurate stats; and the solution as far as I'm concerned is to increase the two params mentioned yesterday (effective_cache_size & random_page_cost). Thanks again for the help! - Jeremy ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Deleting certain duplicates
Shea,Dan [CIS] wrote: The index is Indexes: "forecastelement_rwv_idx" btree (region_id, wx_element, valid_time) -Original Message- From: Shea,Dan [CIS] [mailto:[EMAIL PROTECTED] Sent: Monday, April 12, 2004 10:39 AM To: Postgres Performance Subject: [PERFORM] Deleting certain duplicates We have a large database which recently increased dramatically due to a change in our insert program allowing all entries. PWFPM_DEV=# select relname,relfilenode,reltuples from pg_class where relname = 'forecastelement'; relname | relfilenode | reltuples -+-+- forecastelement | 361747866 | 4.70567e+08 Column |Type | Modifiers +-+--- version| character varying(99) | origin | character varying(10) | timezone | character varying(99) | region_id | character varying(20) | wx_element | character varying(99) | value | character varying(99) | flag | character(3)| units | character varying(99) | valid_time | timestamp without time zone | issue_time | timestamp without time zone | next_forecast | timestamp without time zone | reception_time | timestamp without time zone | The program is supposed to check to ensure that all fields but the reception_time are unique using a select statement, and if so, insert it. Due an error in a change, reception time was included in the select to check for duplicates. The reception_time is created by a program creating the dat file to insert. Essentially letting all duplicate files to be inserted. I tried the delete query below. PWFPM_DEV=# delete from forecastelement where oid not in (select min(oid) from forecastelement group by version,origin,timezone,region_id,wx_element,value,flag,units,valid_time,iss ue_time,next_forecast); It ran for 3 days creating what I assume is an index in pgsql_tmp of the group by statement. The query ended up failing with "dateERROR:write failed". Well the long weekend is over and we do not have the luxury of trying this again. So I was thinking maybe of doing the deletion in chunks, perhaps based on reception time. its more of an sql question though. to deduplicate on basis of version,origin,timezone,region_id,wx_element,value,flag,units,valid_time, issue_time,next_forecast You could do this. begin work; create temp_table as select distinct on (version,origin,timezone,region_id,wx_element,value,flag,units,valid_time, issue_time,next_forecast) * from forecastelement ; truncate table forecastelement ; drop index ; insert into forecastelement select * from temp_table ; commit; create indexes Analyze forecastelement ; note that distinct on will keep only one row out of all rows having distinct values of the specified columns. kindly go thru the distinct on manual before trying the queries. regds mallah. Are there any suggestions for a better way to do this, or using multiple queries to delete selectively a week at a time based on the reception_time. I would say there are a lot of duplicate entries between mid march to the first week of April. ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Index Backward Scan fast / Index Scan slow !
=?ISO-8859-1?Q?Pailloncy_Jean-G=E9rard?= <[EMAIL PROTECTED]> writes: >> Are you using a nondefault value of >> BLCKSZ? If so what? > Sorry, I forgot to specify I use BLCKSZ of 32768, Okay, the numbers are sensible then. The index density seems a bit low (754 entries/page where the theoretical ideal would be about 1365) but not really out-of-line. >> could you send me a copy of the index file? > Do you want the index file now, or may I try something before? If you're going to reindex, please do send me a copy of the file first. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index Backward Scan fast / Index Scan slow !
Hm, this is odd. That says you've got 349519 live index entries in only 463 actively-used index pages, or an average of 754 per page, which AFAICS could not fit in an 8K page. Are you using a nondefault value of BLCKSZ? If so what? Sorry, I forgot to specify I use BLCKSZ of 32768, the same blokck's size for newfs, the same for RAID slice's size. I test the drive sometimes ago, and found a speed win if the slice size the disk block size and the read block size was the same. I do not think that a different BLCKSZ should exhibit a slowdown as the one I found. If you *are* using default BLCKSZ then this index must be corrupt, and what you probably need to do is REINDEX it. But before you do that, could you send me a copy of the index file? Do you want the index file now, or may I try something before? Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings