Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
- "Stephen Frost" wrote: | From: "Stephen Frost" | To: "Rajesh Kumar. Mallah" | Cc: pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:27:37 PM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | Rajesh, | | * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: | > We are puzzled why the CPU and DISK I/O system are not being | utilized | > fully and would seek lists' wisdom on that. | | What OS is this? What kernel version? Dear Frost , We are running linux with kernel 3.2.X (which has the lseek improvements) | | > just a thought, will it be a good idea to partition the host | hardware | > to 4 equal virtual environments , ie 1 for master (r/w) and 3 | slaves r/o | > and distribute the r/o load on the 3 slaves ? | | Actually, it might help with 9.1, if you're really running into some | scalability issues in our locking area.. You might review this: | | http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html | | That's a pretty contrived test case, but I suppose it's possible your | case is actually close enough to be getting affected also.. Thanks for the reference , even i thought so (LockManager) , but we are actually also running out db max connections (also) ( which is currently at 600) , when that happens something at the beginning of the application stack also gets dysfunctional and it changes the very input to the system. ( think of negative feedback systems ) It is sort of complicated but i will definitely update list , when i get to the point of putting the blame on DB :-) . Regds Mallah. | | Thanks, | | Stephen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
| From: "Steve Crawford" | To: "Rajesh Kumar. Mallah" | Cc: "Andy Colson" , "Claudio Freire" , pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:23:47 PM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote: | > Dear Andy , | > | > Following the discussion on load average we are now investigating | on some | > other parts of the stack (other than db). | > | > Essentially we are bumping up the limits (on appserver) so that more | requests | > goes to the DB server. | Which leads to the question: what, other than the db, runs on this | machine? No nothing else runs on *this* machine. We are lucky to have such a beefy hardware dedicated to postgres :) We have a separate machine for application server that has 2 tiers. I am trying to reach to the point to max out the db machine , for that to happen we need to work on the other parts. regds mallah. | | Cheers, | Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Dear Andy , Following the discussion on load average we are now investigating on some other parts of the stack (other than db). Essentially we are bumping up the limits (on appserver) so that more requests goes to the DB server. | | Maybe you are hitting some locks? If its not IO and not CPU then | maybe something is getting locked and queries are piling up. | | -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
| | Load avg is the number of processes in the running queue, which can | be either waiting to be run or actually running. | | So if you had 100% CPU usage, then you'd most definitely have a load | avg of 64, which is neither good or bad. It may simply mean that | you're using your hardware's full potential. Dear Claudio , Thanks for the reply and clarifying on the "actually running" part. below is a snapshot of the top output while the system was loaded. top - 12:15:13 up 101 days, 19:01, 1 user, load average: 23.50, 18.89, 21.74 Tasks: 650 total, 11 running, 639 sleeping, 0 stopped, 0 zombie Cpu(s): 26.5%us, 5.7%sy, 0.0%ni, 67.2%id, 0.0%wa, 0.0%hi, 0.6%si, 0.0%st Mem: 131971752k total, 122933996k used, 9037756k free, 251544k buffers Swap: 33559780k total, 251916k used, 33307864k free, 116356252k cached Our applications does slowdown when loads are at that level. Can you please tell what else can be metered? | | If your processes are waiting but not using CPU or I/O time... all I | can think of is mcelog (it's the only application I've ever witnessed | doing that). Do check ps/top and try to find out which processes are | in a waiting state to have a little more insight. I will read more on the processes status and try to keep a close eye over it. I shall be responding after a few hours on it. regds mallah. | | -- | Sent via pgsql-performance mailing list | (pgsql-performance@postgresql.org) | To make changes to your subscription: | http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
- "Claudio Freire" wrote: | From: "Claudio Freire" | To: "Rajesh Kumar. Mallah" | Cc: pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:23:43 AM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah | wrote: | > The problem is that sometimes there are spikes of load avg which | > jumps to > 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and | > it remains there for sometime and slowly reduces to normal value. | > | > During such times of high load average we observe that there is no | IO wait | > in system and even CPU is 50% idle. In any case the IO Wait always | remains < 1.0 % and | > is mostly 0. Hence the load is not due to high I/O wait which was | generally | > the case with our previous hardware. | | Do you experience decreased query performance? Yes we do experience substantial application performance degradations. | | Load can easily get to 64 (1 per core) without reaching its capacity. | So, unless you're experiencing decreased performance I wouldn't think | much of it. I far as i understand , Load Avg is the average number of processes waiting to be run in past 1 , 5 or 15 mins. A number > 1 would mean that countable number of processes were waiting to be run. how can load of more than 1 and upto 64 be OK for a 64 core machine ? | | Do you have mcelog running? as a cron or a daemon? No we do not have mcelog. BTW the Postgresql version is : 9.1.3 which i forgot to mention in my last email. regds mallah. | Sometimes, mcelog tends to crash in that way. We had to disable it in | our servers because it misbehaved like that. It only makes load avg | meaningless, no performance impact, but being unable to accurately | measure load is bad enough. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Dear List , We are having scalability issues with a high end hardware The hardware is CPU = 4 * opteron 6272 with 16 cores ie Total = 64 cores. RAM = 128 GB DDR3 Disk = High performance RAID10 with lots of 15K spindles and a working BBU Cache. normally the 1 min load average of the system remains between 0.5 to 1.0 . The problem is that sometimes there are spikes of load avg which jumps to > 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and it remains there for sometime and slowly reduces to normal value. During such times of high load average we observe that there is no IO wait in system and even CPU is 50% idle. In any case the IO Wait always remains < 1.0 % and is mostly 0. Hence the load is not due to high I/O wait which was generally the case with our previous hardware. We are puzzled why the CPU and DISK I/O system are not being utilized fully and would seek lists' wisdom on that. We have setup sar to poll the system parameters every minute and the data of which is graphed with cacti. If required any of the system parameters or postgresql parameter can easily be put under cacti monitoring and can be graphed. The query load is mostly read only. It is also possible to replicate the problem with pg_bench to some extent . I choose -s = 100 and -t=1 , the load does shoot but not that spectacularly as achieved by the real world usage. any help shall be greatly appreciated. just a thought, will it be a good idea to partition the host hardware to 4 equal virtual environments , ie 1 for master (r/w) and 3 slaves r/o and distribute the r/o load on the 3 slaves ? regds mallah -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Thanks for the thought but it (-C) does not work . > > > BTW, I think you should use -C option with pgbench for this kind of > testing. -C establishes connection for each transaction, which is > pretty much similar to the real world application which do not use > connection pooling. You will be supprised how PostgreSQL connection > overhead is large. > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp >
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Looks like , pgbench cannot be used for testing with pgbouncer if number of pgbench clients exceeds pool_size + reserve_pool_size of pgbouncer. pgbench keeps waiting doing nothing. I am using pgbench of postgresql 8.1. Are there changes to pgbench in this aspect ? regds Rajesh Kumar Mallah. On Sun, Jul 18, 2010 at 11:38 PM, Rajesh Kumar Mallah < mallah.raj...@gmail.com> wrote: > > > On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith wrote: > >> Rajesh Kumar Mallah wrote: >> >>> the no of clients was 10 ( -c 10) carrying out 1 transactions each >>> (-t 1) . >>> pgbench db was initilised with scaling factor -s 100. >>> >>> since client count was less there was no queuing of requests in pgbouncer >>> i would prefer to say it was in 'passthrough' mode. >>> >> >> Of course pgbouncer is going decrease performance in this situation. >> You've added a whole layer to things that all traffic has to pass through, >> without a setup that gains any benefit from the connection pooling. Try >> making the client count 1000 instead if you want a useful test. >> > > Dear Greg, > > my max_client is 300 shall i test with client count 250 ? > if so what should be the scaling factor while initializing > the pgbench db? > > >> -- >> Greg Smith 2ndQuadrant US Baltimore, MD >> PostgreSQL Training, Services and Support >> g...@2ndquadrant.com www.2ndQuadrant.us >> >> >
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith wrote: > Rajesh Kumar Mallah wrote: > >> the no of clients was 10 ( -c 10) carrying out 1 transactions each >> (-t 1) . >> pgbench db was initilised with scaling factor -s 100. >> >> since client count was less there was no queuing of requests in pgbouncer >> i would prefer to say it was in 'passthrough' mode. >> > > Of course pgbouncer is going decrease performance in this situation. > You've added a whole layer to things that all traffic has to pass through, > without a setup that gains any benefit from the connection pooling. Try > making the client count 1000 instead if you want a useful test. > Dear Greg, my max_client is 300 shall i test with client count 250 ? if so what should be the scaling factor while initializing the pgbench db? > -- > Greg Smith 2ndQuadrant US Baltimore, MD > PostgreSQL Training, Services and Support > g...@2ndquadrant.com www.2ndQuadrant.us > >
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
ok , now the question is , is it possible to dig out from from postgresql database server if connection pooling is needed ? In our case eg i have kept max_connections = 300 if i reduce below 250 i get error "max connection reached." on connecting to db directly, if i put pgbouncer i get less performance (even if no clients waiting) without pooling the dbserver CPU usage increases but performance of apps is also become good. Regds Rajesh Kumar Mallah. On Sun, Jul 18, 2010 at 10:55 PM, Greg Smith wrote: > Rajesh Kumar Mallah wrote: > >> the no of clients was 10 ( -c 10) carrying out 1 transactions each >> (-t 1) . >> pgbench db was initilised with scaling factor -s 100. >> >> since client count was less there was no queuing of requests in pgbouncer >> i would prefer to say it was in 'passthrough' mode. >> > > Of course pgbouncer is going decrease performance in this situation. > You've added a whole layer to things that all traffic has to pass through, > without a setup that gains any benefit from the connection pooling. Try > making the client count 1000 instead if you want a useful test. > > -- > Greg Smith 2ndQuadrant US Baltimore, MD > PostgreSQL Training, Services and Support > g...@2ndquadrant.com www.2ndQuadrant.us > >
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Nice suggestion to try , I will put pgbouncer on raw hardware and run pgbench from same hardware. regds rajesh kumar mallah. > Why in VM (openvz container) ? > > Did you also try it in the same OS as your appserver ? > > Perhaps even connecting from appserver via unix seckets ? > > > and all my applications on other virtual machines > >
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
note: my postgresql server & pgbouncer were not in virtualised environment in the first setup. Only application server has many openvz containers.
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Hi, Sorry, if posting here was not proper instead of starting new thread (I am really not sure if its bad thing to do) I would like to share my recent experience on implementation of client side pooling using pgbouncer. By client side i mean that the the pgbouncer process in not on same machine as postgresql server. In first trial pgbouncer and postgresql were in same machine & phbouncer was connecting to postgresql using unix domain sockets. But i shifted it laters owing to high CPU usage > 50%. ( using top) Now i have shifted pgbouncer into a virtual machine (openvz container) in the application server hardware and all my applications on other virtual machines (web applications) connect to pgbouncer on this virtual machine. I tested the setup with pgbench in two scenarios 1. connecting to DB server directly 2. connecting to DB via pgbouncer the no of clients was 10 ( -c 10) carrying out 1 transactions each (-t 1) . pgbench db was initilised with scaling factor -s 100. since client count was less there was no queuing of requests in pgbouncer i would prefer to say it was in 'passthrough' mode. the result was that 1. direct ~ 2000 tps 2. via pgbouncer ~ 1200 tps -- Experience on deploying to production environment with real world load/usage pattern -- Pgbouncer was put in same machine as postgresql connecting via unix domain to server and tcp sockets with clients. 1. There was drastic reduction in CPU loads from 30 to 10 ldavg 2. There were no clients waiting, pool size was 150 and number of active connections was 100-120. 3. Application performance was worse (inspite of 0 clients waiting ) I am still waiting to see what is the effect of shifting out pgbounce from dbserver to appserver, but with pgbench results i am not very hopeful. I am curious why inspite of 0 clients waiting pgbounce introduces a drop in tps. Warm Regds Rajesh Kumar Mallah. CTO - tradeindia.com. Keywords: pgbouncer performance On Mon, Jul 12, 2010 at 6:11 PM, Kevin Grittner wrote: > Craig Ringer wrote: > > > So rather than asking "should core have a connection pool" perhaps > > what's needed is to ask "what can an in-core pool do that an > > external pool cannot do?" > > (1) It can prevent the most pessimal performance problems resulting > from lack of an external connection pool (or a badly configured one) > by setting a single GUC. Configuration tools could suggest a good > value during initial setup. > > (2) It can be used without installing and configuring a more > sophisticated and complex product. > > (3) It might reduce latency because it avoids having to receive, > parse, and resend data in both directions -- eliminating one "hop". > I know the performance benefit would usually accrue to the external > connection pooler, but there might be some circumstances where a > built-in pool could win. > > (4) It's one more checkbox which can be ticked off on some RFPs. > > That said, I fully agree that if we can include good documentation > on the external poolers and we can get packagers to include poolers > in their distribution, that gets us a much bigger benefit. A > built-in solution would only be worthwhile if it was simple enough > and lightweight enough not to be a burden on execution time or > maintenance. Maybe that's too big an if. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Extremely high CPU usage when building tables
Dear Deb, i feel three distinct steps are present 1. CREATE TABLE 2. LOAding of data in tables 3. Creation of indexes It is still not clear when you are seeing your system becoming unresponsive to other application. Is it during loading of data or creation of indexes? 1. can you give any idea about how much data you are loading ? rows count or GB data etc 2. how many indexes are you creation ? regds Rajesh Kumar Mallah.
Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer
Dear Criag, Thanks for thinking about it.I do not understand why u feel OpenVz is weird. at the most its not very popular. But lets not get into that debate as its not the proper forum. From your reply i understand that there is not a easy and clean way of doing it. Since performance related profiling requires multiple iterations it is not feasible to reboot the machine. I think i will try to profile my code using new and unique input parameters in each iteration, this shall roughly serve my purpose. On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer wrote: > On 02/07/10 01:59, Rajesh Kumar Mallah wrote: > > > I had set it to 128kb > > it does not really work , i even tried your next suggestion. I am in > > virtualized > > environment particularly OpenVz. where echo 3 > /proc/sys/vm/drop_caches > > does not work inside the virtual container, i did it in the hardware node > > but still does not give desired result. > > Yeah, if you're in a weird virtualized environment like that you're > likely to have problems, because caching can be done at multiple levels. > In the case of OpenVZ, it's hard to know what the "guest" and what the > "host" even is sometimes, and I wouldn't trust it to respect things like > the Linux VM cache management. > > You might have to fall back on the classic method: a program that tries > to allocate as much RAM as it can. On Linux this is EXTREMELY unsafe > unless you ensure you have vm overcommit disabled (see the postgresql > docs) because by default Linux systems will never fail a memory > allocation - instead they'll go looking for a big process to kill to > free some memory. In theory this should be your memory gobbler program, > but in reality the OOM killer isn't so predictable. > > So: try turning vm overcommit off, then writing (or finding) a simple > program that keeps on malloc()ing memory until an allocation call fails. > That should force any caches out, freeing you for another cold run. > > Note that this method won't just force out the obvious caches like > postgresql data files. It also forces out things like caches of running > binaries. Things will grind to an absolute crawl for a minute or two > before resuming normal speed, because *everything* has to come back from > disk at once. The same is true of using /proc/sys/vm/drop_caches to drop > all caches. > > I guess, in the end, nothing really subtitutes for a good reboot. > > -- > Craig Ringer > > Tech-related writing: http://soapyfrogs.blogspot.com/ >
Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer
On Thu, Jul 1, 2010 at 10:07 PM, Craig Ringer wrote: > On 01/07/10 17:41, Rajesh Kumar Mallah wrote: > > Hi, > > > > this is not really a performance question , sorry if its bit irrelevant > > to be posted here. We have a development environment and we want > > to optimize the non-database parts of the application. The problem is > > that subsequent run of queries are execute very fast and makes the > > performance analysis a trivial problem. We want that the subsequent runs > > of query should take similar times as the first run so that we can work > > on the optimizing the calling patterns to the database. > > You can get rid of PostgreSQL's caches in shared_buffers by restarting > the PostgreSQL server. I don't know if there's any more convenient way. > Alternately, just set a really minimal shared_buffers that's just enough > for your connections so there's not much room for cached data. > > I had set it to 128kb it does not really work , i even tried your next suggestion. I am in virtualized environment particularly OpenVz. where echo 3 > /proc/sys/vm/drop_caches does not work inside the virtual container, i did it in the hardware node but still does not give desired result. regds Rajesh Kumar Mallah. > If you are running a Linux server (as you didn't mention what you're > running on) you can drop the OS disk cache quite easily: > > http://linux-mm.org/Drop_Caches > http://www.linuxinsight.com/proc_sys_vm_drop_caches.html > > AFAIK for most other platforms you have to use a tool that gobbles > memory to force caches out. On Windows, most of those garbage tools that > claim to "free" memory do this - it's about the only time you'd ever > want to use one, since they do such horrid things to performance. > > -- > Craig Ringer >
Re: [PERFORM] What is the best way to optimize this query
Dear Sri, Please post at least the Explain Analyze output . There is a nice posting guideline also regarding on how to post query optimization questions. http://wiki.postgresql.org/wiki/SlowQueryQuestions On Thu, Jul 1, 2010 at 10:49 AM, Srikanth Kata wrote: > > Please tell me What is the best way to optimize this query > > select > s.*,a.actid,a.phone,d.domid,d.domname,d.domno,a.actno,a.actname,p.descr > as svcdescr from vwsubsmin s inner join packages p on s.svcno=p.pkgno inner > join > account a on a.actno=s.actno inner join ssgdom d on a.domno=d.domno inner > join (select subsno from > getexpiringsubs($1,cast($2 as integer),cast($3 as double precision), $4) as > (subsno int,expirydt timestamp without time zone,balcpt double precision)) > as e on s.subsno=e.subsno where s.status <=15 and d.domno=$5 order by > d.domname,s.expirydt,a.actname > -- > View this message in context: > http://old.nabble.com/What-is-the-best-way-to-optimize-this-query-tp29041515p29041515.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Extremely high CPU usage when building tables
Hi, 1. Did you also check vmstat output , from sar output the i/o wait is not clear. 2. i gues you must be populating the database between creating tables and creating indexes. creating indexes require sorting of data that may be cpu intensive, loading/populating the data may saturate the i/o bandwidth . I think you should check when the max cpu utilisation is taking place exactly. regds Rajesh Kumar Mallah. On Sat, Jun 26, 2010 at 3:55 AM, Deborah Fuentes wrote: > Hello, > > When I run an SQL to create new tables and indexes is when Postgres > consumes all CPU and impacts other users on the server. > > We are running Postgres 8.3.7 on a Sun M5000 with 2 x quad core CPUs (16 > threads) running Solaris 10. > > I've attached the sar data at the time of the run- here's a snip-it below. > > Any ideas would be greatly appreciated. > > Thanks! > Deb > > > > Here, note the run queue, the left column. That is the number of processes > waiting to run. 97 processes waiting to run at any time with only eight CPU > cores looks very busy. > > r...@core2 # sar -q 5 500 > > SunOS core2 5.10 Generic_142900-11 sun4u06/17/2010 > > 12:01:50 runq-sz %runocc swpq-sz %swpocc > 12:01:55 1.8 80 0.0 0 > 12:02:00 1.0 20 0.0 0 > 12:02:05 1.0 20 0.0 0 > 12:02:10 0.0 0 0.0 0 > 12:02:15 0.0 0 0.0 0 > 12:02:21 3.3 50 0.0 0 > 12:02:26 1.0 20 0.0 0 > 12:02:31 1.0 60 0.0 0 > 12:02:36 1.0 20 0.0 0 > 12:02:4227.0 50 0.0 0 > 12:02:4932.8 83 0.0 0 > 12:02:5576.0 100 0.0 0 > 12:03:0166.1 100 0.0 0 > 12:03:0743.8 100 0.0 0 > 12:03:1352.0 100 0.0 0 > 12:03:1991.2 100 0.0 0 > 12:03:2697.8 83 0.0 0 > 12:03:3363.7 100 0.0 0 > 12:03:3967.4 100 0.0 0 > 12:03:4741.5 100 0.0 0 > 12:03:5382.0 83 0.0 0 > 12:03:5988.7 100 0.0 0 > 12:04:0687.7 50 0.0 0 > 12:04:1241.3 100 0.0 0 > 12:04:1794.3 50 0.0 0 > 12:04:22 1.0 20 0.0 0 > 12:04:27 3.3 60 0.0 0 > 12:04:32 1.0 20 0.0 0 > 12:04:38 0.0 0 0.0 0 > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
[PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer
Hi, this is not really a performance question , sorry if its bit irrelevant to be posted here. We have a development environment and we want to optimize the non-database parts of the application. The problem is that subsequent run of queries are execute very fast and makes the performance analysis a trivial problem. We want that the subsequent runs of query should take similar times as the first run so that we can work on the optimizing the calling patterns to the database. regds Rajesh Kumar Mallah.
Re: [PERFORM] order by slowing down a query by 80 times
The way to make this go faster is to set up the actually recommended > infrastructure for full text search, namely create an index on > (co_name_vec)::tsvector (either directly or using an auxiliary tsvector > column). If you don't want to maintain such an index, fine, but don't > expect full text search queries to be quick. > >regards, tom lane > Dear Tom/List , co_name_vec is actually the auxiliary tsvector column that is mantained via a an update trigger. and the index that you suggested is there . consider simplified version. When we order by co_name the index on co_name_vec is not used some other index is used. tradein_clients=> explain analyze SELECT profile_id from general.profile_master b where 1=1 and co_name_vec @@ to_tsquery ('manufacturer') order by co_name limit 25; QUERY PLAN --- Limit (cost=0.00..3958.48 rows=25 width=25) (actual time=0.045..19.847 rows=25 loops=1) -> Index Scan using profile_master_co_name on profile_master b (cost=0.00..1125315.59 rows=7107 width=25) (actual time=0.043..19.818 rows=25 loops=1) Filter: ((co_name_vec)::tsvector @@ to_tsquery('manufacturer'::text)) Total runtime: 19.894 ms (4 rows) tradein_clients=> explain analyze SELECT profile_id from general.profile_master b where 1=1 and co_name_vec @@ to_tsquery ('manufacturer')limit 25; QUERY PLAN --- Limit (cost=0.00..101.18 rows=25 width=4) (actual time=0.051..0.632 rows=25 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master b (cost=0.00..28761.89 rows=7107 width=4) (actual time=0.049..0.593 rows=25 loops=1) Index Cond: ((co_name_vec)::tsvector @@ to_tsquery('manufacturer'::text)) Total runtime: 0.666 ms (4 rows) tradein_clients=>
Re: [PERFORM] order by slowing down a query by 80 times
Dear Tom/Kevin/List thanks for the insight, i will check the suggestion more closely and post the results. regds Rajesh Kumar Mallah.
Re: [PERFORM] order by slowing down a query by 80 times
On Mon, Jun 28, 2010 at 5:09 PM, Yeb Havinga wrote: > Rajesh Kumar Mallah wrote: > >> Dear List, >> >> just by removing the order by co_name reduces the query time dramatically >> from ~ 9 sec to 63 ms. Can anyone please help. >> > The 63 ms query result is probably useless since it returns a limit of 25 > rows from an unordered result. It is not surprising that this is fast. > > The pain is here: > > Index Scan using profile_master_co_name on profile_master b > (cost=0.00..1125295.59 rows=6968 width=25) (actual time=0.097..9193.154 > rows=2212 loops=1) >Filter: ((co_name IS NOT NULL) AND > ((co_name_vec)::tsvector @@ to_tsquery('manufacturer'::text))) > > > It looks like seq_scans are disabled, since the index scan has only a > filter expression but not an index cond. > seq_scans is NOT explicitly disabled. The two queries just differed in the order by clause. regds Rajesh Kumar Mallah. > > regards, > Yeb Havinga > > > >> Regds >> Rajesh Kumar Mallah. >> >
[PERFORM] order by slowing down a query by 80 times
Dear List, just by removing the order by co_name reduces the query time dramatically from ~ 9 sec to 63 ms. Can anyone please help. Regds Rajesh Kumar Mallah. explain analyze SELECT * from ( SELECT a.profile_id,a.userid,a.amount,a.category_id,a.catalog_id,a.keywords,b.co_name from general.catalogs a join general.profile_master using(profile_id) where 1=1 and co_name_vec @@ to_tsquery('manufacturer') and b.co_name is not null and a.ifmain is true ) as c order by co_name limit 25 offset 0; Limit (cost=0.00..3659.13 rows=25 width=129) (actual time=721.075..9241.105 rows=25 loops=1) -> Nested Loop (cost=0.00..1215772.28 rows=8307 width=476) (actual time=721.073..9241.050 rows=25 loops=1) -> Nested Loop (cost=0.00..1208212.37 rows=8307 width=476) (actual time=721.052..9240.037 rows=25 loops=1) -> Nested Loop (cost=0.00..1204206.26 rows=6968 width=472) (actual time=721.032..9239.516 rows=25 loops=1) -> Nested Loop (cost=0.00..1154549.19 rows=6968 width=471) (actual time=721.012..9236.523 rows=25 loops=1) -> Index Scan using profile_master_co_name on profile_master b (cost=0.00..1125295.59 rows=6968 width=25) (actual time=0.097..9193.154 rows=2212 loops=1) Filter: ((co_name IS NOT NULL) AND ((co_name_vec)::tsvector @@ to_tsquery('manufacturer'::text))) -> Index Scan using catalog_master_profile_id_fkindex on catalog_master (cost=0.00..4.19 rows=1 width=446) (actual time=0.016..0.016 rows=0 loops=2212) Index Cond: (catalog_master.profile_id = b.profile_id) Filter: ((catalog_master.hide IS FALSE) AND ((catalog_master.hosting_status)::text = 'ACTIVE'::text)) -> Index Scan using profile_master_profile_id_pkey on profile_master (cost=0.00..7.11 rows=1 width=9) (actual time=0.105..0.105 rows=1 loops=25) Index Cond: (profile_master.profile_id = catalog_master.profile_id) -> Index Scan using catalog_categories_pkey_catalog_id_category_id on catalog_categories (cost=0.00..0.56 rows=1 width=8) (actual time=0.014..0.015 rows=1 loops=25) Index Cond: (catalog_categories.catalog_id = catalog_master.catalog_id) Filter: (catalog_categories.ifmain IS TRUE) -> Index Scan using web_category_master_pkey on web_category_master (cost=0.00..0.90 rows=1 width=4) (actual time=0.034..0.036 rows=1 loops=25) Index Cond: (web_category_master.category_id = catalog_categories.category_id) Filter: ((web_category_master.status)::text = 'ACTIVE'::text) Total runtime: 9241.304 ms explain analyze SELECT * from ( SELECT a.profile_id,a.userid,a.amount,a.category_id,a.catalog_id,a.keywords,b.co_name from general.catalogs a join general.profile_master b using(profile_id) where 1=1 and co_name_vec @@ to_tsquery('manufacturer') and b.co_name is not null and a.ifmain is true ) as c limit 25 offset 0; QUERY PLAN -- Limit (cost=0.00..358.85 rows=25 width=476) (actual time=0.680..63.176 rows=25 loops=1) -> Nested Loop (cost=0.00..119238.58 rows=8307 width=476) (actual time=0.677..63.139 rows=25 loops=1) -> Nested Loop (cost=0.00..111678.66 rows=8307 width=476) (actual time=0.649..62.789 rows=25 loops=1) -> Nested Loop (cost=0.00..107672.56 rows=6968 width=472) (actual time=0.626..62.436 rows=25 loops=1) -> Nested Loop (cost=0.00..58015.49 rows=6968 width=471) (actual time=0.606..62.013 rows=25 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master b (cost=0.00..28761.89 rows=6968 width=25) (actual time=0.071..50.576 rows=1160 loops=1) Index Cond: ((co_name_vec)::tsvector @@ to_tsquery('manufacturer'::text)) Filter: (co_name IS NOT NULL) -> Index Scan using catalog_master_profile_id_fkindex on catalog_master (cost=0.00..4.19 rows=1 width=446) (actual time=0.008..0.008 rows=0 loops=1160) Index Cond: (catalog_master.profile_id = b.profile_id) Filter: ((catalog_master.hide IS FALSE) AND ((catalog_master.hosting_status)::text = 'ACTIVE'::text)) -> Index Scan using profile_master_profile_id_pkey on profile_master (cost=0.00..7.11 rows=1 width=9) (actual time=0.012..0.012 rows=1 loops=25) Index Cond: (profile_master.profile_id = catalog_master.profile_id) -> Index Scan using catalog_categories_pkey_catalog_id_category_id on catal
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List , A simple (perl) script was made to 'watch' the state transitions of back ends. On startup It captures a set of pids for watching and displays a visual representation of the states for next 30 intervals of 1 seconds each. The X axis is interval cnt, Y axis is pid and the origin is on top-left. The state value can be Active Query (*) , or indicated by '.' or ' in transaction' indicated by '?' . for my server below is a random output (during lean hours and on a lean day). PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 4334 ? ? ? ? * ? ? ? ? ? * ? ? ? ? ? ? ? ? 6904 ? ? . . . * ? . . . . . . ? ? . 6951 ? ? ? . . . . ? ? ? ? ? . . . ? ? ? . . . ? . . . . . ? ? . 7009 ? * ? ? . . . . . . . . . * * . * ? ? . . . * ? ? ? . . . ? 7077 ? . ? . . . * . ? . . . . ? . . . . . . . ? . . ? . . . ? ? 7088 ? . . ? . ? ? ? . . . . . . ? . . ? ? * . . . . . ? . ? . * 7091 ? . . * ? ? ? ? ? ? ? * ? . . ? * . * . . . . . . . . . . . 7093 ? ? . ? . . . . ? . ? * . . . . . . . . . ? ? ? . ? ? ? . . 7112 * * . . . ? ? ? . . . . . . . . ? ? . ? . ? . . ? . . . . . 7135 ? . . * . ? ? ? . ? ? . . . ? . . . . . . . ? . . . ? ? . . 7142 ? . ? . . . . . . * . . . ? . . . . . . . . . . . . . . 7166 ? . ? ? ? * * . ? * . ? . . . ? . ? ? . . . * . . . ? . . . 8202 ? ? . . . * . ? . . . . . . . * ? . . . ? ? . . . . ? ? ? . 8223 ? . . . . . . ? 8237 ? ? ? . ? ? ? ? . . . . . . . ? . . . . . ? . . * ? . . . . 8251 ? . ? . . . . . ? ? . . . * ? . . . ? . . . . . . . . . . . 8278 ? ? . . . . ? . . . . . . . ? . . . . . . ? ? . . * . . . . 8290 ? . . 8294 ? ? . . . . . . . . . . . . ? . . . ? ? . . . . . . . . * * 8303 ? * ? . ? ? ? . ? ? ? . . . . * . . . . . . . . . . . . . . 8306 ? ? . . . ? . . . ? . . . . . . * . . . 8309 * ? ? ? ? . . . ? . . . 8329 ? . * * . . . . . . . * . ? . * . ? . * . * ? . . . (*) Active Query , (.) Idle , (?) Idle in transaction, backend over. Looks like most of the graph space is filled with (.) or (?) and very less active queries (long running queries > 1s). on a busy day and busi hour i shall check the and post again. The script is presented which depends only on perl , DBI and DBD::Pg. script pasted here: http://pastebin.com/mrjSZfLB Regds mallah. On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah < mallah.raj...@gmail.com> wrote: > Dear List, > > Today has been good since morning. Although it is a lean day > for us but the indications are nice. I thank everyone who shared > the concern. I think the most significant change has been to reduce > shared_buffers from 10G to 4G , this has lead to reduced memory > usage and some breathing space to the OS. > > Although i am yet to incorporate the suggestions from pgtune but > i think the issue of max_connection needs to be addressed first. > > I am investigating application issues and about the mechanism that > puts many backend to ' in transaction ' mode for significant > times. I thank Tom for the script he sent. Once that resolves i shall > check pooling as suggested by Kevin, then eventually max_connections > can be reduced. I shall also check pgpool and pgbouncer if they are > helpful in this regard. > > I observed that the number of simultaneous connection today (lean day) > hovers between 1 to 10 , occasionally shooting to 15 but never more than > 20 i would say. > > > I am happy that i/o waits are negligible and cpu is idling also for a > while. > > > procs ---memory-- ---swap-- -io --system-- > -cpu-- > r b swpd free buff cache si sobibo in cs us sy > id wa st > 22 0 18468 954120 13460 2849177200 568 1558 13645 18355 62 10 > 27 2 0 > 16 0 18468 208100 13476 2846908400 580 671 14039 17055 67 13 > 19 1 0 > 10 2 18812 329
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, Today has been good since morning. Although it is a lean day for us but the indications are nice. I thank everyone who shared the concern. I think the most significant change has been to reduce shared_buffers from 10G to 4G , this has lead to reduced memory usage and some breathing space to the OS. Although i am yet to incorporate the suggestions from pgtune but i think the issue of max_connection needs to be addressed first. I am investigating application issues and about the mechanism that puts many backend to ' in transaction ' mode for significant times. I thank Tom for the script he sent. Once that resolves i shall check pooling as suggested by Kevin, then eventually max_connections can be reduced. I shall also check pgpool and pgbouncer if they are helpful in this regard. I observed that the number of simultaneous connection today (lean day) hovers between 1 to 10 , occasionally shooting to 15 but never more than 20 i would say. I am happy that i/o waits are negligible and cpu is idling also for a while. procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 22 0 18468 954120 13460 2849177200 568 1558 13645 18355 62 10 27 2 0 16 0 18468 208100 13476 2846908400 580 671 14039 17055 67 13 19 1 0 10 2 18812 329032 13400 283569720 46 301 1768 13848 17884 68 10 20 1 0 16 2 18812 366596 13416 2836162000 325 535 13957 16649 72 11 16 1 0 50 1 18812 657048 13432 2836654800 416 937 13823 16667 62 9 28 1 0 6 1 18812 361040 13452 2837190800 323 522 14352 16789 74 12 14 0 0 33 0 18812 162760 12604 2821015200 664 1544 14701 16315 66 11 22 2 0 5 0 18812 212028 10764 2792180000 552 648 14567 17737 67 10 21 1 0 6 0 18796 279920 10548 2789038830 359 562 12635 15976 60 9 30 1 0 8 0 18796 438820 10564 2789444000 289 2144 12234 15770 57 8 34 1 0 5 0 18796 531800 10580 2790170000 514 394 12169 16005 59 8 32 1 0 17 0 18796 645868 10596 2789070400 423 948 13369 16554 67 10 23 1 0 9 1 18796 1076540 10612 27898604 00 598 403 12703 17363 71 10 18 1 0 8 0 18796 1666508 10628 27904748 00 430 1123 13314 17421 57 9 32 1 0 9 1 18776 1541444 10644 27913092 10 653 954 13194 16822 75 11 12 1 0 8 0 18776 1526728 10660 27921380 00 692 788 13073 16987 74 9 15 1 0 8 0 18776 1482304 10676 27933176 00 966 2029 13017 16651 76 12 11 1 0 21 0 18776 1683260 10700 27937492 00 298 663 13110 15796 67 10 23 1 0 18 0 18776 2087664 10716 27943512 00 406 622 12399 17072 62 9 28 1 0 With 300 connections, I think that either of these could lead you to > experience intermittent bursts of extreme swapping. I'd drop it to > somewhere in the 16MB to 32MB range until I had a connection pool > configured such that it was actually keeping the number of active > connections much lower. > > > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 > > (64kb , via default) > > Sure, I'd boost this. > > > checkpoint_segments = 16 # pgtune wizard 2010-06-25 > > (30 , specified) > > If you have the disk space for the 30 segments, I wouldn't reduce > it. > > > shared_buffers = 7680MB # pgtune wizard 2010-06-25 > > (4096 MB , specified) > > This one is perhaps the most sensitive to workload. Anywhere > between 1GB and 8GB might be best for you. Greg Smith has some > great advice on how to tune this for your workload. > > > (*) max_connections = 80 # pgtune wizard 2010-06-25 > > (300 , ;-) specified) > > > > when i reduce max_connections i start getting errors, i will see > > again concurrent connections during business hours. > > That's probably a good number to get to, but you have to reduce the > number of actual connections before you set the limit that low. > > > lot of our connections are in in transaction state > > If any of these stay in that state for more than a minute or two, > you need to address that if you want to get your connection count > under control. If any of them persist for hours or days, you need > to fix it to avoid bloat which can kill performance. > > -Kevin >
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear Greg/Kevin/List , Many thanks for the comments regarding the params, I am however able to change an experiment on production in a certain time window , when that arrives i shall post my observations. Rajesh Kumar Mallah. Tradeindia.com - India's Largest B2B eMarketPlace.
[PERFORM] Re: sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear tom, we have autocommit off in dbi. Any commit or rollback from the persistent modperl process immediately issues begin work; if the modperl process is waiting for request the database backend remains in idle in transaction state. Unless we modify data in a http request we neighter issue a commit nor rollback. On 6/25/10, Tom Molesworth wrote: > On 25/06/10 16:59, Rajesh Kumar Mallah wrote: >> when i reduce max_connections i start getting errors, i will see again >> concurrent connections >> during business hours. lot of our connections are in > transaction state> during business >> this peculiar behavior of mod_perl servers have been discussed in >> past i think. dont' remember >> if there was any resolution. > > If connections spend any significant amount of time in transaction> state, that might indicate you're not committing/rolling > back after running queries - can you show an example of the code you're > using? > > e.g. something like my $dbh = DBI->connect(...); my $sth = > $dbh->prepare(q{select ... }); $sth->fetchall_arrayref; $sth->rollback; > > Tom > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent from Gmail for mobile | mobile.google.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
Dear Criag, also check for the possibility of installing sysstat in our system. it goes a long way in collecting the system stats. you may consider increasing the frequency of data collection by changing the interval of cron job manually in /etc/cron.d/ normally its */10 , you may make it */2 for time being. the software automatically maintains historical records of data for 1 month.
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, pgtune suggests the following: (current value are in braces via reason) , (*) indicates significant difference from current value. default_statistics_target = 50 # pgtune wizard 2010-06-25 (current 100 via default) (*) maintenance_work_mem = 1GB # pgtune wizard 2010-06-25 (16MB via default) checkpoint_completion_target = 0.9 # pgtune wizard 2010-06-25 (0.5 via default) (*) effective_cache_size = 22GB # pgtune wizard 2010-06-25 (18GB , specified) work_mem = 192MB # pgtune wizard 2010-06-25 (256MB , specified) (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 ( 64kb , via default) checkpoint_segments = 16 # pgtune wizard 2010-06-25 (30 , specified) shared_buffers = 7680MB # pgtune wizard 2010-06-25 ( 4096 MB , specified) (*) max_connections = 80 # pgtune wizard 2010-06-25 ( 300 , ;-) specified ) when i reduce max_connections i start getting errors, i will see again concurrent connections during business hours. lot of our connections are in during business this peculiar behavior of mod_perl servers have been discussed in past i think. dont' remember if there was any resolution.
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
I changed shared_buffers from 10G to 4G , swap usage has almost become nil. # free total used free sharedbuffers cached Mem: 32871276 245758248295452 0 11064 22167324 -/+ buffers/cache:2397436 30473840 Swap: 41929123524192560 I also observed that there was a huge IO wait and load spike initially which gradually reduced to normal levels. Now things seems to be fine. but real test shall be during business hours. vmstat output: http://pastebin.com/ygu8gUhS the iowait now is very respectable < 10% and CPU is idling most of the time. # vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 1352 8482444 11336 2229910010 450 20800 50 6 39 5 0 4 0352 8393840 11352 2230448400 480 163 9260 12717 32 4 62 3 0 5 1352 8474788 11360 2230898000 304 445 8295 12358 28 4 67 2 0 3 0352 8370672 11376 2231667600 648 158 8760 13214 38 4 55 3 0 11 0352 8193824 11392 2232357200 621 577 8800 13163 37 4 56 3 0 2 0352 8229012 11408 2232666400 169 405 9588 13696 34 4 61 1 0 6 1352 8319176 11424 2233314400 559 170 8830 12929 32 4 61 3 0 I shall also try pgtune in a while. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
Dear List, Hmmm , lemme test efficacy of pg_tune. I would reduce shared buffers also. regarding concurrent queries: its now non business hours and SELECT procpid,current_query from pg_stat_activity where current_query not ilike '%idle%' ; is just 5-10, i am yet to measure it during business hours. Warm Regds Rajesh Kumar Mallah. On Fri, Jun 25, 2010 at 4:58 PM, Yeb Havinga wrote: > Rajesh Kumar Mallah wrote: >> >> A scary phenomenon is being exhibited by the server , which is the server >> is slurping all the swap suddenly >> 8 1 4192912 906164 6100 2787364000 2277 858 13440 16235 >> 63 8 19 10 0 >> >> I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that >> this >> abnormal consumption of swap was NOT there even when work_mem was 4GB. >> eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 >> the question is whats making postgres slurp the swap? i am posting my >> current postgresql.conf >> once again. >> >> # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" >> listen_addresses = '*' # what IP address(es) to listen on; >> port = 5432 # (change requires restart) >> max_connections = 300 # (change requires restart) >> > > Hello Rajesh, > > In constrast with e.g. shared_buffers and effective_cache_size, work_mem is > amount of memory per 'thing' (e.g. order/group by) that wants some working > memory, so even a single backend can use several pieces of work_mem memory. > > Looking at your postgresql.conf, other memory values seem a bit too high as > well for a 32GB ram server. It is probably a good idea to use pgtune (on > pgfoundry) to get some reasonable ball park settings for your hardware. > > regards, > Yeb Havinga > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] sudden spurt in swap utilization (was:cpu bound postgresql setup.)
A scary phenomenon is being exhibited by the server , which is the server is slurping all the swap suddenly , some of the relevant sar -r output are: 10:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 10:40:01 AM979068 31892208 97.02 10588 28194876 1781568314872 15.02 66500 10:50:01 AM 1791536 31079740 94.55 10480 27426512 1782848313592 14.96 43880 11:00:01 AM 4678768 28192508 85.77 9692 27213312 1784888311552 14.86 33296 11:10:01 AM179208 32692068 99.45 3180 27569008 1725136371304 17.71 65444 11:20:01 AM225604 32645672 99.31 2604 29817192 1693672402768 19.21 78312 <--- 11:30:01 AM520224 32351052 98.42 1780 26863576 0 2096440100.00 1585772 <-- within 10mins 11:40:02 AM483532 32387744 98.53 2672 27220404 0 2096440100.00 43876 11:50:01 AM162700 32708576 99.51 3316 27792540 0 2096440100.00 43708 12:00:01 PM420176 32451100 98.72 3772 28181316 0 2096440100.00 43708 12:10:01 PM331624 32539652 98.99 3236 27857760 0 2096440100.00 0 12:20:01 PM 1023428 31847848 96.89 4632 27450504 0 2096440100.00 0 12:30:01 PM763296 32107980 97.68 4988 28270704 0 2096440100.00 0 12:40:01 PM770280 32100996 97.66 5260 28423292 0 2096440100.00 0 Then i added more swap made it 4GB from 2GB 02:10:05 PM 8734144 24137132 73.43 5532 21219972 2096788 2096124 49.9952 02:12:01 PM 5989044 26882232 81.78 6108 23606680 2096788 2096124 49.9952 02:14:01 PM 1517724 31353552 95.38 6320 26988280 2096788 2096124 49.9952 02:16:01 PM316692 32554584 99.04 6516 28840264 1844856 2348056 56.00251984 02:18:01 PM450672 32420604 98.63 7748 27238712 0 4192912100.00 2096840 < all swap gone. 02:20:01 PM164388 32706888 99.50 7556 27118104 0 4192912100.00 2096840 02:22:01 PM848544 32022732 97.42 6212 26718712 0 4192912100.00 2096840 02:24:01 PM231332 32639944 99.30 6136 27276720 0 4192912100.00 2096840 02:26:01 PM639560 32231716 98.05 5608 27029372 0 4192912100.00 2096840 02:28:01 PM868824 32002452 97.36 4648 26253996 0 4192912100.00 2096840 ... 03:04:01 PM854408 32016868 97.40 4976 27182140 0 4192912100.00 0 03:06:01 PM 1571904 31299372 95.22 5184 27513232 0 4192912100.00 0 03:08:02 PM304600 32566676 99.07 5420 27850780 0 4192912100.00 0 03:10:01 PM915352 31955924 97.22 5632 28076320 0 4192912100.00 0 03:12:01 PM705132 32166144 97.85 5680 28057444 0 4192912100.00 0 03:14:01 PM369516 32501760 98.88 6136 27684364 0 4192912100.00 0 in vmstat the system does not seems to be swapping vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 24 2 4192912 947796 6036 2778532410 451 20800 50 6 39 5 0 22 3 4192912 1028956 6044 2779572800 1730 555 13445 14736 67 12 17 4 0 24 0 4192912 877508 6052 2780617200 1595 2292 13334 15666 67 9 19 5 0 14 8 4192912 820432 6068 2781975600 2331 1351 13208 16192 66 9 14 11 0 23 1 4192912 925960 6076 2783164400 1932 1584 13144 16291 71 9 14 5 0 2 3 4192912 895288 6084 2784643200 2496 991 13450 16303 70 9 13 8 0 17 0 4192912 936252 6092 2785986800 2122 826 13438 16233 69 9 17 5 0 8 1 4192912 906164 6100 2787364000 2277 858 13440 16235 63 8 19 10 0 I reduced work_mem from 4GB to 2GB to 512MB (now). I clearly remember that this abnormal consumption of swap was NOT there even when work_mem was 4GB. eg during happier times swap utilisation was: http://pastebin.com/bnE1pFZ9 the question is whats making postgres slurp the swap? i am posting my current postgresql.conf once again. # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) shared_buffers = 10GB # min 128kB work_mem = 512MB# min 64kB fsync = on # turns forced synchronization on or off synchronous_commit = on # immediate fsync at commit checkpoint_segments = 30# in logfile segments, min 1, 16MB each archive_mode = on
Re: [PERFORM] cpu bound postgresql setup.
>i do not remember well but there is a system view that (i think) >guides at what stage the marginal returns of increasing it >starts disappearing , i had set it a few years back. Sorry the above comment was regarding setting shared_buffers not effective_cache_size. On Thu, Jun 24, 2010 at 10:55 PM, Rajesh Kumar Mallah wrote: > On Thu, Jun 24, 2010 at 8:57 PM, Kevin Grittner > wrote: >> I'm not clear whether you still have a problem, or whether the >> changes you mention solved your issues. I'll comment on potential >> issues that leap out at me. > > It shall require more observation to know if the "problem" is solved. > my "problem" was high load average in the server . We find that > when ldavg is between 10-20 responses of applications were acceptable > ldavg > 40 makes things slower. > > What prompted me to post to list is that the server transitioned from > being IO bound to CPU bound and 90% of syscalls being > lseek(XXX, 0, SEEK_END) = YYY > >> >> Rajesh Kumar Mallah wrote: >> >>> 3. we use xfs and our controller has BBU , we changed barriers=1 >>> to barriers=0 as i learnt that having barriers=1 on xfs and fsync >>> as the sync method, the advantage of BBU is lost unless barriers >>> is = 0 (correct me if my understanding is wrong) >> >> We use noatime,nobarrier in /etc/fstab. I'm not sure where you're >> setting that, but if you have a controller with BBU, you want to set >> it to whichever disables write barriers. > > as per suggestion in discussions on some other thread I set it > in /etc/fstab. > >> >>> max_connections = 300 >> >> As I've previously mentioned, I would use a connection pool, in >> which case this wouldn't need to be that high. > > We do use connection pooling provided to mod_perl server > via Apache::DBI::Cache. If i reduce this i *get* "too many > connections from non-superuser ... " error. Will pgpool - I/II > still applicable in this scenario ? > > >> >>> work_mem = 4GB >> >> That's pretty high. That much memory can be used by each active >> connection, potentially for each of several parts of the active >> query on each connection. You should probably set this much lower >> in postgresql.conf and boost it if necessary for individual queries. > > hmmm.. it was 8GB for many months ! > > i shall reduce it further, but will it not result in usage of too many > temp files > and saturate i/o? > > > >> >>> effective_cache_size = 18GB >> >> With 32GB RAM on the machine, I would probably set this higher -- >> somewhere in the 24GB to 30GB range, unless you have specific >> reasons to believe otherwise. It's not that critical, though. > > i do not remember well but there is a system view that (i think) > guides at what stage the marginal returns of increasing it > starts disappearing , i had set it a few years back. > > >> >>> add_missing_from = on >> >> Why? There has been discussion of eliminating this option -- do you >> have queries which rely on the non-standard syntax this enables? > > unfortunately yes. > >> >>> Also i would like to apologize that some of the discussions on >>> this problem inadvertently became private between me & kevin. >> >> Oops. I failed to notice that. Thanks for bringing it back to the >> list. (It's definitely in your best interest to keep it in front of >> all the other folks here, some of whom regularly catch things I miss >> or get wrong.) >> >> If you still do have slow queries, please follow up with details. > > > I have now set log_min_duration_statement = 5000 > and there are few queries that come to logs. > > please comment on the connection pooling aspect. > > Warm Regards > Rajesh Kumar Mallah. > >> >> -Kevin >> > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cpu bound postgresql setup.
On Thu, Jun 24, 2010 at 8:57 PM, Kevin Grittner wrote: > I'm not clear whether you still have a problem, or whether the > changes you mention solved your issues. I'll comment on potential > issues that leap out at me. It shall require more observation to know if the "problem" is solved. my "problem" was high load average in the server . We find that when ldavg is between 10-20 responses of applications were acceptable ldavg > 40 makes things slower. What prompted me to post to list is that the server transitioned from being IO bound to CPU bound and 90% of syscalls being lseek(XXX, 0, SEEK_END) = YYY > > Rajesh Kumar Mallah wrote: > >> 3. we use xfs and our controller has BBU , we changed barriers=1 >> to barriers=0 as i learnt that having barriers=1 on xfs and fsync >> as the sync method, the advantage of BBU is lost unless barriers >> is = 0 (correct me if my understanding is wrong) > > We use noatime,nobarrier in /etc/fstab. I'm not sure where you're > setting that, but if you have a controller with BBU, you want to set > it to whichever disables write barriers. as per suggestion in discussions on some other thread I set it in /etc/fstab. > >> max_connections = 300 > > As I've previously mentioned, I would use a connection pool, in > which case this wouldn't need to be that high. We do use connection pooling provided to mod_perl server via Apache::DBI::Cache. If i reduce this i *get* "too many connections from non-superuser ... " error. Will pgpool - I/II still applicable in this scenario ? > >> work_mem = 4GB > > That's pretty high. That much memory can be used by each active > connection, potentially for each of several parts of the active > query on each connection. You should probably set this much lower > in postgresql.conf and boost it if necessary for individual queries. hmmm.. it was 8GB for many months ! i shall reduce it further, but will it not result in usage of too many temp files and saturate i/o? > >> effective_cache_size = 18GB > > With 32GB RAM on the machine, I would probably set this higher -- > somewhere in the 24GB to 30GB range, unless you have specific > reasons to believe otherwise. It's not that critical, though. i do not remember well but there is a system view that (i think) guides at what stage the marginal returns of increasing it starts disappearing , i had set it a few years back. > >> add_missing_from = on > > Why? There has been discussion of eliminating this option -- do you > have queries which rely on the non-standard syntax this enables? unfortunately yes. > >> Also i would like to apologize that some of the discussions on >> this problem inadvertently became private between me & kevin. > > Oops. I failed to notice that. Thanks for bringing it back to the > list. (It's definitely in your best interest to keep it in front of > all the other folks here, some of whom regularly catch things I miss > or get wrong.) > > If you still do have slow queries, please follow up with details. I have now set log_min_duration_statement = 5000 and there are few queries that come to logs. please comment on the connection pooling aspect. Warm Regards Rajesh Kumar Mallah. > > -Kevin > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cpu bound postgresql setup.
Dear List, 1. It was found that too many stray queries were getting generated from rouge users and bots we controlled using some manual methods. 2. We have made application changes and some significant changes have been done. 3. we use xfs and our controller has BBU , we changed barriers=1 to barriers=0 as i learnt that having barriers=1 on xfs and fsync as the sync method, the advantage of BBU is lost unless barriers is = 0 (correct me if my understanding is wrong) 4. We had implemented partitioning using exclusion constraints , parent relnship was removed from quite a lot of old partition tables. our postgresql.conf -- # cat postgresql.conf | grep -v "^\s*#" | grep -v "^\s*$" listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) shared_buffers = 10GB # min 128kB work_mem = 4GB # min 64kB fsync = on # turns forced synchronization on or off synchronous_commit = on # immediate fsync at commit checkpoint_segments = 30# in logfile segments, min 1, 16MB each archive_mode = on # allows archiving to be done archive_command = '/opt/scripts/archive_wal.sh %p %f ' archive_timeout = 600 # force a logfile segment switch after this effective_cache_size = 18GB constraint_exclusion = on # on, off, or partition logging_collector = on # Enable capturing of stderr and csvlog log_directory = '/var/log/postgresql' # directory where log files are written, log_filename = 'postgresql.log' # log file name pattern, log_truncate_on_rotation = on # If on, an existing log file of the log_rotation_age = 1d # Automatic rotation of logfiles will log_error_verbosity = verbose # terse, default, or verbose messages log_min_duration_statement = 5000 # -1 is disabled, 0 logs all statements datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' add_missing_from = on custom_variable_classes = 'general' # list of custom variable class names general.report_level = '' general.disable_audittrail2 = '' general.employee='' Also i would like to apologize that some of the discussions on this problem inadvertently became private between me & kevin. On Thu, Jun 24, 2010 at 12:10 AM, Rajesh Kumar Mallah wrote: > It was nice to go through the interesting posting guidelines. i shall > be analyzing the slow queries more objectively tomorrow during the > peak hours. I really hope it sould be possible to track down the > problem. > > On 6/23/10, Kevin Grittner wrote: >> Rajesh Kumar Mallah wrote: >> >>> did you suggest at some point that number of backend per core >>> should be preferebly 3 ? >> >> I've found the number of *active* backends is optimal around (2 * >> cores) + spindles. You said you had eight cores and eight or ten >> spindles, so I figure a connection pool limited to somewhere around >> 24 active connections is ideal. (Depending on how you set up your >> pool, you may need a higher total number of connections to keep 24 >> active.) >> >> -Kevin >> > > -- > Sent from Gmail for mobile | mobile.google.com > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: cpu bound postgresql setup. Firstly many thanks for responding. I am concerned because the load averages have increased and users complaining of slowness. I do not change settings freq
On 6/23/10, Kevin Grittner wrote: > Rajesh Kumar Mallah wrote: >> PasteBin for the vmstat output >> http://pastebin.com/mpHCW9gt >> >> On Wed, Jun 23, 2010 at 8:22 PM, Rajesh Kumar Mallah >> wrote: >>> Dear List , >>> >>> I observe that my postgresql (ver 8.4.2) dedicated server has >>> turned cpu bound and there is a high load average in the server > >>> 50 usually. >>> The server has 2 Quad Core CPUs already and there are 6 or 8 >>> drives in raid 10 , there is negligable i/o wait. There is 32GB >>> ram and no swapping. >>> >>> When i strace processes at random i see lot of lseek >>> (XXX,0,SEEK_END) calls which i feel were not that frequent >>> before. can any pointers be got for investigating the high cpu >>> usage by postgresql processes. > > I'm not clear on what problem you are experiencing. Using a lot of > your hardware's capacity isn't a problem in itself -- are you > getting poor response time? Poor throughput? Some other problem? > Is it continuous, or only when certain queries run? > > One thing that is apparent is that you might want to use a > connection pool, or if you're already using one you might want to > configure it to reduce the maximum number of active queries. With > eight cores and eight drives, your best throughput is going to be at > somewhere around 24 active connections, and you appear to be going > to at least twice that. > > If you can provide a copy of your postgresql.conf settings (without > comments) and an EXPLAIN ANALYZE of a slow query, along with the > schema information for the tables used by the query, you'll probably > get useful advice on how to adjust your configuration, indexing, or > query code to improve performance. > > -Kevin > -- Sent from Gmail for mobile | mobile.google.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cpu bound postgresql setup.
PasteBin for the vmstat output http://pastebin.com/mpHCW9gt On Wed, Jun 23, 2010 at 8:22 PM, Rajesh Kumar Mallah wrote: > Dear List , > > I observe that my postgresql (ver 8.4.2) dedicated server has turned cpu > bound and there is a high load average in the server > 50 usually. > The server has > 2 Quad Core CPUs already and there are 6 or 8 drives in raid 10 , there is > negligable i/o wait. There is 32GB ram and no swapping. > > When i strace processes at random i see lot of lseek (XXX,0,SEEK_END) calls > which i feel were not that frequent before. can any pointers be got > for investigating > the high cpu usage by postgresql processes. > > attached is strace out in strace.txt file (sorry if that was not > allowed, i am not sure) > > vmstat output > > # vmstat 10 > > output. > procs ---memory-- ---swap-- -io --system-- > -cpu--- > r b swpd free buff cache si sobibo in > cs us sy id wa st > 13 2 150876 2694612 4804 2491554010 443 2030 0 > 50 6 39 5 0 > 17 1 150868 3580472 4824 2493131210 1395 803 12951 15403 > 63 11 22 4 0 > 20 5 150868 3369892 4840 2493818000 1948 1827 12691 14542 > 79 13 6 2 0 > 8 0 150868 2771920 4856 2496801600 2680 1254 13890 14329 > 72 11 11 5 0 > 18 2 150864 2454008 4872 2499564000 2530 923 13968 15434 > 63 10 20 7 0 > 45 3 150860 2367760 4888 2501175600 1338 1327 13203 14580 > 71 11 16 3 0 > 5 6 150860 1949212 4904 2503305200 1727 1981 13960 15079 > 73 11 12 5 0 > 27 0 150860 1723104 4920 2504958800 1484 794 13199 13676 > 73 10 13 3 0 > 28 6 150860 1503888 4928 2506972400 1650 981 12625 14867 > 75 9 14 2 0 > 8 3 150860 1807744 4944 2508740400 1521 791 13110 15421 > 69 9 18 4 0 > > Rajesh Kumar Mallah. > Avid/Loyal-PostgreSQL user for (past 10 years) > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Entry point for Postgresql Performance
Databases are usually IO bound , vmstat results can confirm individual cases and setups. In case the server is IO bound the entry point should be setting up properly performing IO. RAID10 helps a great extent in improving IO bandwidth by parallelizing the IO operations, more spindles the better. Also write caches helps in great deal in caching the writes and making commits faster. In my opinion system level tools (like vmstat) at peak load times can be an entry point in understanding the bottlenecks of a particular setup. if there is swapping u absolutely need to double the ram . ( excess ram can be used in disk block caching) if its cpu bound add more cores or high speed cpus if its io bound put better raid arrays & controller. regds mallah. On Thu, Mar 12, 2009 at 4:22 PM, Nagalingam, Karthikeyan wrote: > Hi, > Can you guide me, Where is the entry point to get the documentation for > Postgresql performance tuning, Optimization for Postgresql with Storage > controller. > > Your recommendation and suggestion are welcome. > > Regards > Karthikeyan.N > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
There has been an error in the tests the dataset size was not 2*MEM it was 0.5*MEM i shall redo the tests and post results. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
On Wed, Feb 18, 2009 at 2:27 PM, Grzegorz Jaśkiewicz wrote: > have you tried hanging bunch of raid1 to linux's md, and let it do > raid0 for you ? Hmmm , i will have only 3 bunches in that case as system has to boot from first bunch as system has only 8 drives. i think reducing spindles will reduce perf. I also have a SATA SAN though from which i can boot! but the server needs to be rebuilt in that case too. I (may) give it a shot. regds -- mallah. > I heard plenty of stories where this actually sped up performance. One > noticeable is case of youtube servers. > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
>> Effect of ReadAhead Settings >> disabled,256(default) , 512,1024 >> SEQUENTIAL >> xfs_ra0 414741 , 66144 >> xfs_ra256403647, 545026 all tests on sda6 >> xfs_ra512411357, 564769 >> xfs_ra1024 404392, 431168 >> >> looks like 512 was the best setting for this controller > > That's only known for sequential access. > How did it perform under the random access, or did the numbers not > change too much? RANDOM SEEKS /sec xfs_ra0 6341.0 xfs_ra256 14642.7 xfs_ra512 14415.6 xfs_ra102414541.6 the value does not seems to be having much effect unless its totally disabled. regds mallah. > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Detailed bonnie++ figures. http://98.129.214.99/bonnie/report.html On Wed, Feb 18, 2009 at 1:22 PM, Rajesh Kumar Mallah wrote: > the raid10 voulme was benchmarked again > taking in consideration above points > > # fdisk -l /dev/sda > Disk /dev/sda: 290.9 GB, 290984034304 bytes > 255 heads, 63 sectors/track, 35376 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sda1 * 1 12 96358+ 83 Linux > /dev/sda2 13131710482412+ 83 Linux > /dev/sda313181578 2096482+ 83 Linux > /dev/sda41579 35376 2714824355 Extended > /dev/sda515791839 2096451 82 Linux swap / Solaris > /dev/sda61840791948837568+ 83 Linux > /dev/sda7 29297 3537648837600 83 Linux > > > CASE writesreads > KB/s KB/s > > ext3(whole disk) 244194 , 352093 one part whole disk > xfs(whole disk) 402352 , 547674 > > 25ext3 260132 , 420905 partition only first > 25% > 25xfs 404291 , 547672(/dev/sda6) > > ext3_25 227307, 348237 partition > specifically last 25% > xfs25 350661, 474481 (/dev/sda7) > > > Effect of ReadAhead Settings > disabled,256(default) , 512,1024 > > xfs_ra0 414741 , 66144 > xfs_ra256403647, 545026 all tests on sda6 > xfs_ra512411357, 564769 > xfs_ra1024 404392, 431168 > > looks like 512 was the best setting for this controller > > Considering these two figures > xfs25 350661, 474481(/dev/sda7) > 25xfs 404291 , 547672(/dev/sda6) > > looks like the beginning of the drives are 15% faster > than the ending sections , considering this is it worth > creating a special tablespace at the begining of drives > > if at all done what kind of data objects should be placed > towards begining , WAL , indexes , frequently updated tables > or sequences ? > > regds > mallah. > >>On Tue, Feb 17, 2009 at 9:49 PM, Scott Carey wrote: >> Generally speaking, you will want to use a partition that is 25% or less the >> size of the whole disk as well. If it is >the whole thing, one file system >> can place the file you are testing in a very different place on disk and >> skew results as well. >> >> My own tests, using the first 20% of an array for all, showed that xfs with >> default settings beat out or equalled >'tuned' settings with hardware raid >> 10, and was far faster than ext3 in sequential transfer rate. > > same here. > >> >> If testing STR, you will also want to tune the block device read ahead value >> (example: /sbin/blockdev -getra >> /dev/sda6). This has very large impact on sequential transfer performance >> (and no impact on random access). >How large of an impact depends quite a >> bit on what kernel you're on since the readahead code has been getting >> >better over time and requires less tuning. But it still defaults >> out-of-the-box to more optimal settings for a single >drive than RAID. >> For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 >> for raid 10). For SATA, try 1024 or >2048 * the number of effective >> spindles. The value is in blocks (512 bytes). There is documentation on >> the >blockdev command, and here is a little write-up I found with a couple >> web searches: >>http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command > > >> >> >> From: pgsql-performance-ow...@postgresql.org >> [pgsql-performance-ow...@postgresql.org] On Behalf Of Rajesh Kumar Mallah >> [mallah.raj...@gmail.com] >> Sent: Tuesday, February 17, 2009 5:25 AM >> To: Matthew Wakeling >> Cc: pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , >> PERC6i controller >> >> On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling >> wrote: >>> On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >>>> >>>> sda6 --> xfs with default formatting options. >>>> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >>>> sda8 --> ext3 (default) >>>> >>>>
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
the raid10 voulme was benchmarked again taking in consideration above points # fdisk -l /dev/sda Disk /dev/sda: 290.9 GB, 290984034304 bytes 255 heads, 63 sectors/track, 35376 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 12 96358+ 83 Linux /dev/sda2 13131710482412+ 83 Linux /dev/sda313181578 2096482+ 83 Linux /dev/sda41579 35376 2714824355 Extended /dev/sda515791839 2096451 82 Linux swap / Solaris /dev/sda61840791948837568+ 83 Linux /dev/sda7 29297 3537648837600 83 Linux CASE writesreads KB/s KB/s ext3(whole disk) 244194 , 352093 one part whole disk xfs(whole disk) 402352 , 547674 25ext3 260132 , 420905 partition only first 25% 25xfs 404291 , 547672(/dev/sda6) ext3_25 227307, 348237 partition specifically last 25% xfs25 350661, 474481 (/dev/sda7) Effect of ReadAhead Settings disabled,256(default) , 512,1024 xfs_ra0 414741 , 66144 xfs_ra256403647, 545026 all tests on sda6 xfs_ra512411357, 564769 xfs_ra1024 404392, 431168 looks like 512 was the best setting for this controller Considering these two figures xfs25 350661, 474481(/dev/sda7) 25xfs 404291 , 547672(/dev/sda6) looks like the beginning of the drives are 15% faster than the ending sections , considering this is it worth creating a special tablespace at the begining of drives if at all done what kind of data objects should be placed towards begining , WAL , indexes , frequently updated tables or sequences ? regds mallah. >On Tue, Feb 17, 2009 at 9:49 PM, Scott Carey wrote: > Generally speaking, you will want to use a partition that is 25% or less the > size of the whole disk as well. If it is >the whole thing, one file system > can place the file you are testing in a very different place on disk and skew > results as well. > > My own tests, using the first 20% of an array for all, showed that xfs with > default settings beat out or equalled >'tuned' settings with hardware raid > 10, and was far faster than ext3 in sequential transfer rate. same here. > > If testing STR, you will also want to tune the block device read ahead value > (example: /sbin/blockdev -getra > /dev/sda6). This has very large impact on sequential transfer performance > (and no impact on random access). >How large of an impact depends quite a bit > on what kernel you're on since the readahead code has been getting >better > over time and requires less tuning. But it still defaults out-of-the-box to > more optimal settings for a single >drive than RAID. > For SAS, try 256 or 512 * the number of effective spindles (spindles * 0.5 > for raid 10). For SATA, try 1024 or >2048 * the number of effective > spindles. The value is in blocks (512 bytes). There is documentation on the > >blockdev command, and here is a little write-up I found with a couple web > searches: >http://portal.itauth.com/2007/11/20/howto-linux-double-your-disk-read-performance-single-command > > ____ > From: pgsql-performance-ow...@postgresql.org > [pgsql-performance-ow...@postgresql.org] On Behalf Of Rajesh Kumar Mallah > [mallah.raj...@gmail.com] > Sent: Tuesday, February 17, 2009 5:25 AM > To: Matthew Wakeling > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i > controller > > On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling wrote: >> On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >>> >>> sda6 --> xfs with default formatting options. >>> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >>> sda8 --> ext3 (default) >>> >>> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >>> io throughtput as such in bonnie++ tests . >>> >>> it looks like ext3 with default options performed worst in my case. >> >> Of course, doing comparisons using a setup like that (on separate >> partitions) will skew the results, because discs' performance differs >> depending on the portion of the disc being accessed. You should perform the >> different filesystem tests on the same partition one after the other >> instead. > > point noted . wil
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
On Tue, Feb 17, 2009 at 5:15 PM, Matthew Wakeling wrote: > On Tue, 17 Feb 2009, Rajesh Kumar Mallah wrote: >> >> sda6 --> xfs with default formatting options. >> sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 >> sda8 --> ext3 (default) >> >> it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve >> io throughtput as such in bonnie++ tests . >> >> it looks like ext3 with default options performed worst in my case. > > Of course, doing comparisons using a setup like that (on separate > partitions) will skew the results, because discs' performance differs > depending on the portion of the disc being accessed. You should perform the > different filesystem tests on the same partition one after the other > instead. point noted . will redo the test on ext3. > > Matthew > > -- > "We did a risk management review. We concluded that there was no risk > of any management."-- Hugo Mills > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
The URL of the result is http://98.129.214.99/bonnie/report.html (sorry if this was a repost) On Tue, Feb 17, 2009 at 2:04 AM, Rajesh Kumar Mallah wrote: > BTW > > our Machine got build with 8 15k drives in raid10 , > from bonnie++ results its looks like the machine is > able to do 400 Mbytes/s seq write and 550 Mbytes/s > read. the BB cache is enabled with 256MB > > sda6 --> xfs with default formatting options. > sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 > sda8 --> ext3 (default) > > it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve > io throughtput as such in bonnie++ tests . > > it looks like ext3 with default options performed worst in my case. > > regds > -- mallah > > > NOTE: observations made in this post are interpretations by the poster > only which may or may not be indicative of the true suitablity of the > filesystem. > > > > On Mon, Feb 16, 2009 at 7:01 PM, Gregory Stark wrote: >> Arjen van der Meijden writes: >> >>> When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas >>> disks, my >>> colleague actually spend some time benchmarking the PERC and a ICP Vortex >>> (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't >>> have too many comparable results, but it basically boiled down to quite good >>> scores for the PERC and a bit less for the ICP Vortex. >>> IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s >>> for >>> a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). >> >> FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're >> about what you would expect if for a PCI-X card which was bus bandwidth >> limited. A PCI-e card should be able to get about 3x that from the drives. >> >> -- >> Gregory Stark >> EnterpriseDB http://www.enterprisedb.com >> Ask me about EnterpriseDB's RemoteDBA services! >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
BTW our Machine got build with 8 15k drives in raid10 , from bonnie++ results its looks like the machine is able to do 400 Mbytes/s seq write and 550 Mbytes/s read. the BB cache is enabled with 256MB sda6 --> xfs with default formatting options. sda7 --> mkfs.xfs -f -d sunit=128,swidth=512 /dev/sda7 sda8 --> ext3 (default) it looks like mkfs.xfs options sunit=128 and swidth=512 did not improve io throughtput as such in bonnie++ tests . it looks like ext3 with default options performed worst in my case. regds -- mallah NOTE: observations made in this post are interpretations by the poster only which may or may not be indicative of the true suitablity of the filesystem. On Mon, Feb 16, 2009 at 7:01 PM, Gregory Stark wrote: > Arjen van der Meijden writes: > >> When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, >> my >> colleague actually spend some time benchmarking the PERC and a ICP Vortex >> (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't >> have too many comparable results, but it basically boiled down to quite good >> scores for the PERC and a bit less for the ICP Vortex. >> IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s >> for >> a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). > > FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're > about what you would expect if for a PCI-X card which was bus bandwidth > limited. A PCI-e card should be able to get about 3x that from the drives. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's RemoteDBA services! > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
Its nice to know the evolution of autovacuum and i understand that the suggestion/requirement of "autovacuum at lean hours only" was defeating the whole idea. regds --rajesh kumar mallah. On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne wrote: > mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes: >> why is it not a good idea to give end users control over when they >> want to run it ? > > It's not a particularly good idea to give end users things that they > are likely then to *immediately* use to shoot themselves in the foot. > > Turning off vacuuming "all day" is the sort of thing that is indeed > pretty certain to hurt you when you imagined it was going to help you. > > In particular, if you shut off autovac all day, heavily updated tables > with certain sorts of (pretty common!) update patterns are certain to > "bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on > them. > > In effect, the practical effect of "autovacuum at lean hours only" is > more reasonably described as "cancel autovacuum and revert to the > elder policy of requiring users to do manual vacuuming." > > It's worth looking at how autovacuum has been evolving over time... > > - When it was introduced, 8.0-ish (maybe it was 8.1 when it became > "official"), it was pretty primitive. > > Autovac was a single process, where you had three controls over > behaviour: > > - You could run autovac, or not; > > - You could exclude specific tables from being processed by autovac > > - There is a capability to vacuum less aggressively by using > delays to reduce autovac I/O usage > > - In 8.3, it was enhanced to add the notion of having multiple vacuum > workers > > There was discussion about having one of those workers restrict > itself to small tables, so that you'd never have the scenario where > the workers were all busy and a small table that needed vacuuming > was left unvacuumed for a long time. It appears that didn't happen, > which seems unfortunate, but that's life... > > You should look at all the "knobs" that *are* offered before deciding > a policy that may be harmful to performance. As things stand now, > there are a couple of ways I could see tuning this: > > - You might check on the GUC variables autovacuum_vacuum_cost_delay > and autovacuum_vacuum_cost_limit, which would allow you to restrict > the I/O cost. > > This might allow you to run autovacuum all the time without > adversely affecting performance. > > - You might come up with a list of the *LARGE* tables that you don't > want vacuumed during the day, and set up a cron job that adds/drops > them from the pg_autovacuum table at the appropriate times. > > This is definitely going to be more finicky, and requires a great > deal more awareness of the tables being updated by your > applications. It makes "autovacuum" a whole lot less "automatic." > > There are known enhancements coming up: > > - In 8.4, there is a capability for VACUUM to only process the > portions of the table known to have been altered. > > That ought to be a better answer than *any* of the fiddling > suggested, to date. Right now, a VACUUM on "public.my_huge_table", > a table 18GB in size, will walk through the entire table, even > though there were only a handful of pages where tuples were > invalidated. > > This is almost certainly the single best improvement possible to > resolve your issue; it seems likely to *directly* address the > problem, and has the considerable merit of not requiring much if > any configuration/reconfiguration/scheduling. > -- > (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) > http://linuxdatabases.info/info/ > "what would we do without C? we would have PASAL, BASI, OBOL, and > Ommon Lisp." -- #Erik > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] dissimilar drives in Raid10 , does it make difference ?
I have received Dell Poweredge 2950 MIII with 2 kind of drives. I cant' make out the reason behind it , does it make any difference in long run or in performance the drives are similar in overall characteristics but does the minor differences if will cause any problem ? scsi0 : LSI Logic SAS based MegaRAID driver Vendor: SEAGATE Model: ST973451SSRev: SM04 Type: Direct-Access ANSI SCSI revision: 05 Vendor: FUJITSU Model: MBC2073RC Rev: D506 Type: Direct-Access ANSI SCSI revision: 05 Vendor: FUJITSU Model: MBC2073RC Rev: D506 Type: Direct-Access ANSI SCSI revision: 05 Vendor: SEAGATE Model: ST973451SSRev: SM04 Type: Direct-Access ANSI SCSI revision: 05 Vendor: SEAGATE Model: ST973451SSRev: SM04 Type: Direct-Access ANSI SCSI revision: 05 Vendor: FUJITSU Model: MBC2073RC Rev: D506 Type: Direct-Access ANSI SCSI revision: 05 Vendor: SEAGATE Model: ST973451SSRev: SM04 Type: Direct-Access ANSI SCSI revision: 05 Vendor: FUJITSU Model: MBC2073RC Rev: D506 Type: Direct-Access ANSI SCSI revision: 05 thanks regds -- mallah -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
On Wed, Feb 11, 2009 at 11:30 PM, Brad Nicholson wrote: > On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote: >> On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz >> wrote: >> > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah >> > wrote: >> > >> >>> vacuum_cost_delay = 150 >> >>> vacuum_cost_page_hit = 1 >> >>> vacuum_cost_page_miss = 10 >> >>> vacuum_cost_page_dirty = 20 >> >>> vacuum_cost_limit = 1000 >> >>> autovacuum_vacuum_cost_delay = 300 >> >> >> >> why is it not a good idea to give end users control over when they >> >> want to run it ? >> > >> > Effectively, you have control over autovacuum via these params. >> > You have to remember, that autovacuum doesn't cost much, and it makes >> > planner know more about data. >> > It's not there to clean up databases, as you might imagine - it is >> > there to update stats, and mark pages as free. >> > >> > So make sure you tweak that config fist, because I have a funny >> > feeling that you just think that vacuuming bogs down your machine, and >> > _can_ be turned off without any bad consequences, which is simply not >> > true. >> >> our usage pattern is such that peak activity (indicated by load average) >> during day time is 10 times during night hours. Autovacuum just puts >> more pressure to the system. If less stressing version is used then >> it shall take longer to complete one cycle, which would mean less >> performance for longer time . Less performance queues up queries >> and encourages people to re submit their queries which again >> adds to bogging up the system. > > That's not exactly how it works in practise, if tuned properly. It may > take longer, but it is less intensive while running. > > We had one system that had spikes happening due to the exact case you > described - there were noticeably high IO wait times while certain > tables were being vacuumed. We set the cost delay and the wait times > dropped to the point where it was non-issue. I think i can take this route and monitor the io activity during vacuums. thanks everyone for their suggestions. -- mallah. >Vacuums take twice as > long, but there is no measurable impact to the performance. > >> In our case i feel the hardware is bit underscaled as compared to >> load thats why i think running in lean hours is best of both worlds >> no performance sacrifices and intelligent vacuuming. > > That is a different issue altogether. > > Not vacuuming a running system at all during peak hours is not > considered intelligent vacuuming IMHO. There are plenty of use cases > where small, frequent vacuums keep tables under control at a very low > cost. Letting them go for extended periods of time without vacuuming > causes bloat and eventual slowdowns to table access which manifest in > higher IO usage across the board. > > If you really are dead set on vacuuming only at night, you may want to > do a careful analysis of which tables need to be vacuumed and when, and > trigger manual vacuums from cron. > > -- > Brad Nicholson 416-673-4106 > Database Administrator, Afilias Canada Corp. > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz wrote: > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah > wrote: > >>> vacuum_cost_delay = 150 >>> vacuum_cost_page_hit = 1 >>> vacuum_cost_page_miss = 10 >>> vacuum_cost_page_dirty = 20 >>> vacuum_cost_limit = 1000 >>> autovacuum_vacuum_cost_delay = 300 >> >> why is it not a good idea to give end users control over when they >> want to run it ? > > Effectively, you have control over autovacuum via these params. > You have to remember, that autovacuum doesn't cost much, and it makes > planner know more about data. > It's not there to clean up databases, as you might imagine - it is > there to update stats, and mark pages as free. > > So make sure you tweak that config fist, because I have a funny > feeling that you just think that vacuuming bogs down your machine, and > _can_ be turned off without any bad consequences, which is simply not > true. our usage pattern is such that peak activity (indicated by load average) during day time is 10 times during night hours. Autovacuum just puts more pressure to the system. If less stressing version is used then it shall take longer to complete one cycle, which would mean less performance for longer time . Less performance queues up queries and encourages people to re submit their queries which again adds to bogging up the system. In our case i feel the hardware is bit underscaled as compared to load thats why i think running in lean hours is best of both worlds no performance sacrifices and intelligent vacuuming. regds -- mallah. > > > -- > GJ > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
On Wed, Feb 11, 2009 at 7:11 PM, Guillaume Cottenceau wrote: > Rajesh Kumar Mallah writes: > >> Hi, >> >> Is it possible to configure autovacuum to run only >> during certain hours ? We are forced to keep >> it off because it pops up during the peak >> query hours. > > You'd rather configure the delaying process to not alter too much > performance. Autovacuum is really not targeted at running once a > day - I think it is partly because the old vacuuming was too weak > (because too seldom in many cases) that autovaccum was added. > > A delaying configuration that works nicely for us without > impacting performance much (tested at the time of 8.2 to cause a > +40% response time during autovacuuming, compared to +300% with > more default values): > > vacuum_cost_delay = 150 > vacuum_cost_page_hit = 1 > vacuum_cost_page_miss = 10 > vacuum_cost_page_dirty = 20 > vacuum_cost_limit = 1000 > autovacuum_vacuum_cost_delay = 300 why is it not a good idea to give end users control over when they want to run it ? > > (Disclaimer: IIRC, Alvaro doesn't like these figures at all) > > Of course, these are good for us (bloat is very, very low and > performance impact is not experienced in production), not > necessarily for you. You should conduct your own tests. > > Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/ > > -- > Guillaume Cottenceau > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] scheduling autovacuum at lean hours only.
Hi, Is it possible to configure autovacuum to run only during certain hours ? We are forced to keep it off because it pops up during the peak query hours. Regds rajesh kumar mallah. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] please help with the explain analyze plan
hurray! ANALYZING changed the plan I was not expecting the plan to change because the partition of 2006_02 is supposed to be dormant. maybe the partition was never analyzed. But still question remains, why the time taken was in orders of magnitude higher in loaded condition. tradein_clients=> explain SELECT count(*) from rfi_partitions.rfis_part_2006_02 where generated_date >= 2251 and receiver_uid=1320721 ; QUERY PLAN Aggregate (cost=6.44..6.45 rows=1 width=0) -> Index Scan using rfis_part_2006_02_generated_date on rfis_part_2006_02 (cost=0.00..6.43 rows=1 width=0) Index Cond: (generated_date >= 2251) Filter: (receiver_uid = 1320721) (4 rows) tradein_clients=> ANALYZE rfi_partitions.rfis_part_2006_02; ANALYZE tradein_clients=> explain SELECT count(*) from rfi_partitions.rfis_part_2006_02 where generated_date >= 2251 and receiver_uid=1320721 ; QUERY PLAN -- Aggregate (cost=8.78..8.79 rows=1 width=0) -> Index Scan using rfis_part_2006_02_receiver_uid on rfis_part_2006_02 (cost=0.00..8.77 rows=1 width=0) Index Cond: (receiver_uid = 1320721) Filter: (generated_date >= 2251) (4 rows) tradein_clients=> explain analyze SELECT count(*) from rfi_partitions.rfis_part_2006_02 where generated_date >= 2251 and receiver_uid=1320721 ; QUERY PLAN Aggregate (cost=8.78..8.79 rows=1 width=0) (actual time=0.045..0.045 rows=1 loops=1) -> Index Scan using rfis_part_2006_02_receiver_uid on rfis_part_2006_02 (cost=0.00..8.77 rows=1 width=0) (actual time=0.042..0.042 rows=0 loops=1) Index Cond: (receiver_uid = 1320721) Filter: (generated_date >= 2251) Total runtime: 0.082 ms (5 rows) tradein_clients=> On Wed, Feb 11, 2009 at 6:07 PM, Rajesh Kumar Mallah wrote: > thanks for the hint, > > now the peak hour is over and the same scan is taking 71 ms in place of 8 > ms > and the total query time is also acceptable. But it is surprising that > the scan was > taking so long consistently at that point of time. I shall test again > under similar > circumstance tomorrow. > > Is it possible to enable block level statistics from the psql prompt > for a particular query > and see the results on the psql prompt ? > > explain analyze SELECT count(*) from > rfi_partitions.rfis_part_2006_02 where generated_date >= 2251 and > receiver_uid=1320721 ; > > QUERY PLAN > > Aggregate (cost=6.44..6.45 rows=1 width=0) (actual > time=71.513..71.513 rows=1 loops=1) > -> Index Scan using rfis_part_2006_02_generated_date on > rfis_part_2006_02 (cost=0.00..6.43 rows=1 width=0) (actual > time=71.508..71.508 rows=0 loops=1) > Index Cond: (generated_date >= 2251) > Filter: (receiver_uid = 1320721) > Total runtime: 71.553 ms > (5 rows) > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] please help with the explain analyze plan
thanks for the hint, now the peak hour is over and the same scan is taking 71 ms in place of 8 ms and the total query time is also acceptable. But it is surprising that the scan was taking so long consistently at that point of time. I shall test again under similar circumstance tomorrow. Is it possible to enable block level statistics from the psql prompt for a particular query and see the results on the psql prompt ? explain analyze SELECT count(*) from rfi_partitions.rfis_part_2006_02 where generated_date >= 2251 and receiver_uid=1320721 ; QUERY PLAN Aggregate (cost=6.44..6.45 rows=1 width=0) (actual time=71.513..71.513 rows=1 loops=1) -> Index Scan using rfis_part_2006_02_generated_date on rfis_part_2006_02 (cost=0.00..6.43 rows=1 width=0) (actual time=71.508..71.508 rows=0 loops=1) Index Cond: (generated_date >= 2251) Filter: (receiver_uid = 1320721) Total runtime: 71.553 ms (5 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] please help with the explain analyze plan
Dear friends, I have explain analyze of two queries explain analyze SELECT count(*) from general.rfis where 1=1 and inquiry_type = 'BUY'and receiver_uid=1320721; (7 ms) http://pastebin.com/m5297e03c explain analyze SELECT count(*) from general.rfis where 1=1 and inquiry_type = 'BUY'and receiver_uid=1320721 generated_date >= 2251 and ; (80 secs) http://pastebin.com/d1e4bdea7 The table general.rfis is partitioned on generated_date and the condition generated_date >= 2251 was added with the intention to limit the number of (date based) partitions that would be searched by the query using the constraint exclusion facility. however as evident the query has become very slow as a result of this condition (even on primed caches). can anyone kindly explain why the result was so counter intuitive ? In particular where is most of the (80828.438 ms) spent on the plan http://pastebin.com/d1e4bdea7 (reference to actual line number is appreciated) structure of a typical partition (abridged) Table "rfi_partitions.rfis_part_2009_01" Column | Type | Modifiers ---++--- rfi_id| integer| not null default nextval('general.rfis_rfi_id_seq'::regclass) sender_uid| integer| not null receiver_uid | integer| not null subject | character varying(100) | not null message | text | not null inquiry_type | character varying(50) | default 'BUY'::character varying inquiry_source| character varying(30) | not null generated_date| integer| not null default general.current_date_id() Indexes: "rfis_part_2009_01_pkey" PRIMARY KEY, btree (rfi_id) "rfis_part_2009_01_generated_date" btree (generated_date) "rfis_part_2009_01_receiver_uid" btree (receiver_uid) CLUSTER "rfis_part_2009_01_sender_uid" btree (sender_uid) Check constraints: "rfis_part_2009_01_generated_date_check" CHECK (generated_date >= 3289 AND generated_date <= 3319) "rfis_part_2009_01_rfi_id_check" CHECK (rfi_id >= 12344252 AND rfi_id <= 12681399) Inherits: rfis regds rajesh kumar mallah. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query becomes fas on 'SET enable_hashjoin TO off;'
On Tue, Feb 10, 2009 at 9:09 PM, Tom Lane wrote: > Rajesh Kumar Mallah writes: >> On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas wrote: >>> I'm guessing that the problem is that the selectivity estimate for >>> co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm >>> not real familiar with full text search, so I'm not sure whether >>> there's anything sensible you can do about it. > > Yeah, the bad selectivity estimate seems to be the entire problem --- > if that were even slightly closer to reality the planner would've > preferred the nestloop. > > I don't think there's a good solution to this in 8.3, this is 8.2 server at the moment. >because its > estimator for @@ is just a stub. There will be a non-toy estimator > in 8.4, fwiw. > > A possibility that seems a bit less crude than turning off hashjoins > is to reduce random_page_cost, so as to bias things toward nestloop > indexscans in general. reducing random_page_cost from 4 (default) to 3 does switch the plan in favour of nested loop thanks for the suggestion. SET random_page_cost TO 4; SET tradein_clients=> explain select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --- Limit (cost=4109.11..11127.78 rows=20 width=4) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) (10 rows) tradein_clients=> SET random_page_cost TO 3; SET tradein_clients=> explain select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN -- Limit (cost=0.00..9944.78 rows=20 width=4) -> Nested Loop (cost=0.00..122818.07 rows=247 width=4) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..3256.28 rows=1074 width=4) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Index Scan using trade_leads_profile_id on trade_leads (cost=0.00..110.76 rows=45 width=8) Index Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) (8 rows) > >regards, tom lane > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ERROR: Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Test/Parser/Dbt2.pm line 521.
> Can't use an undefined value as an ARRAY reference at > /usr/lib/perl5/site_perl/5.8.8/Test/Parser/Dbt2.pm line 521. > > Can someone please give inputs to resolve this issue? Any help on this will > be appreciated. 519 sub transactions { 520 my $self = shift; 521 return @{$self->{data}->{transactions}->{transaction}}; 522 } the stuff in $self->{data}->{transactions}->{transaction} is not defined so it cannot be dereferenced. If you want to just escape this error you may modify the code as: if ($self->{data}->{transactions}->{transaction}) { return @{$self->{data}->{transactions}->{transaction}}; } else { return (); } however understanding the root cause is recommended. > > Thanks in advance. > > Thanks, > Rohan > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query becomes fas on 'SET enable_hashjoin TO off;'
Dear Robert, thanks for ur interest. Our server was too loaded what i posted my last observation, now the other explain analyze can also be run and i am posting both the result , as you can see latter is 55ms versus 3000 ms . explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --- Limit (cost=4109.11..11127.78 rows=20 width=4) (actual time=3076.059..3076.059 rows=0 loops=1) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) (actual time=3076.057..3076.057 rows=0 loops=1) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) (actual time=0.020..2972.446 rows=127371 loops=1) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) (actual time=42.368..42.368 rows=7 loops=1) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) (actual time=42.287..42.360 rows=7 loops=1) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) (actual time=42.252..42.252 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Total runtime: 3076.121 ms (11 rows) tradein_clients=> SET enable_hashjoin TO off; SET tradein_clients=> explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN -- Limit (cost=3.42..13080.44 rows=20 width=4) (actual time=55.233..55.233 rows=0 loops=1) -> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual time=55.232..55.232 rows=0 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual time=16.578..46.175 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Heap Scan on trade_leads (cost=3.42..145.75 rows=47 width=8) (actual time=1.287..1.287 rows=0 loops=7) Recheck Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) -> Bitmap Index Scan on trade_leads_profile_id (cost=0.00..3.41 rows=47 width=0) (actual time=1.285..1.285 rows=0 loops=7) Index Cond: (trade_leads.profile_id = pm.profile_id) Total runtime: 55.333 ms (11 rows) SELECT SUM(1) FROM general.trade_leads WHERE status = 'm'; sum 127371 this constitutes 90% of the total rows. regds mallah. On Tue, Feb 10, 2009 at 6:36 PM, Robert Haas wrote: > On Tue, Feb 10, 2009 at 5:31 AM, Rajesh Kumar Mallah > wrote: >> I have a query in which two huge tables (A,B) are joined using an indexed >> column and a search is made on tsvector on some column on B. Very limited >> rows of B are expected to match the query on tsvector column. >> >> With default planner settings the query takes too long ( > 100 secs) , but >> with hashjoin off it returns almost immediately. The question is , is >> it is advisable to >> tweak planner settings for specific queries in application ? > > The ones that start with "enable_" usually shouldn't be changed. > They're mostly for debugging and finding problems. > >> The plans are as follows. > > It's a little hard to figure out what's gone wrong here because you've > only included EXPLAIN ANALYZE output for one of the plans - the other > is just regular EXPLAIN. Can you send that, along with the output of > the following query: > > SELECT SUM(1) FROM trade_leads WHERE status = 'm' > > I'm guessing that the problem is that the selectivity estimate for > co_name_vec @@ to_tsquery('plastic&tubes') is not very good, but I'm > not real familiar with full text search, so I'm not sure whether > there's anything sensible you can do about it. > > ...Robert > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query becomes fas on 'SET enable_hashjoin TO off;'
Hi , I have a query in which two huge tables (A,B) are joined using an indexed column and a search is made on tsvector on some column on B. Very limited rows of B are expected to match the query on tsvector column. With default planner settings the query takes too long ( > 100 secs) , but with hashjoin off it returns almost immediately. The question is , is it is advisable to tweak planner settings for specific queries in application ? The plans are as follows. 1. With default settings explainselect lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN --- Limit (cost=4109.11..11127.78 rows=20 width=4) -> Hash Join (cost=4109.11..90789.72 rows=247 width=4) Hash Cond: (trade_leads.profile_id = pm.profile_id) -> Seq Scan on trade_leads (cost=0.00..85752.52 rows=246832 width=8) Filter: ((status)::text = 'm'::text) -> Hash (cost=4095.68..4095.68 rows=1074 width=4) -> Bitmap Heap Scan on profile_master pm (cost=40.89..4095.68 rows=1074 width=4) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Index Scan on profile_master_co_name_vec (cost=0.00..40.62 rows=1074 width=0) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) (10 rows) 2. with SET enable_hashjoin TO off; explain analyze select lead_id from general.trade_leads join general.profile_master as pm using(profile_id) where status ='m' and co_name_vec @@ to_tsquery('plastic&tubes') limit 20; QUERY PLAN - Limit (cost=3.42..13080.44 rows=20 width=4) (actual time=1530.039..1530.039 rows=0 loops=1) -> Nested Loop (cost=3.42..161504.56 rows=247 width=4) (actual time=1530.037..1530.037 rows=0 loops=1) -> Index Scan using profile_master_co_name_vec on profile_master pm (cost=0.00..4335.36 rows=1074 width=4) (actual time=220.821..1014.501 rows=7 loops=1) Index Cond: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) Filter: (co_name_vec @@ '''plastic'' & ''tube'''::tsquery) -> Bitmap Heap Scan on trade_leads (cost=3.42..145.75 rows=47 width=8) (actual time=73.640..73.640 rows=0 loops=7) Recheck Cond: (trade_leads.profile_id = pm.profile_id) Filter: ((status)::text = 'm'::text) -> Bitmap Index Scan on trade_leads_profile_id (cost=0.00..3.41 rows=47 width=0) (actual time=73.579..73.579 rows=0 loops=7) Index Cond: (trade_leads.profile_id = pm.profile_id) Total runtime: 1530.137 ms regds mallah. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
On Thu, Feb 5, 2009 at 6:10 PM, Matt Burke wrote: > Arjen van der Meijden wrote: > >> Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're >> not identical in layout etc), so it would be a bit weird if they >> performed much less than the similar LSI's wouldn't you think? > > I've recently had to replace a PERC4/DC with the exact same card made by > LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 > actually appears to be concatenated RAID1 arrays. > > Since replacing it and rebuilding the array on the LSI card, performance > has been considerably better (14 disk SCSI shelf) > >> Areca may be the fastest around right now, but if you'd like to get it >> all from one supplier, its not too bad to be stuck with Dell's perc 5 or >> 6 series. > > The PERC6 isn't too bad, however it grinds to a halt when the IO queue > gets large and it has the serious limitation of not supporting more than > 8 spans, so trying to build a RAID10 array greater than 16 disks is > pointless if you're not just after the extra capacity. > > Are there any reasonable choices for bigger (3+ shelf) direct-connected > RAID10 arrays, or are hideously expensive SANs the only option? I've > checked out the latest Areca controllers, but the manual available on > their website states there's a limitation of 32 disks in an array... Where exactly is there limitation of 32 drives. the datasheet of 1680 states support upto 128drives using enclosures. regds rajesh kumar mallah. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Hi, I am going to get a Dell 2950 with PERC6i with 8 * 73 15K SAS drives + 300 GB EMC SATA SAN STORAGE, I seek suggestions from users sharing their experience with similar hardware if any. I have following specific concerns. 1. On list i read that RAID10 function in PERC5 is not really striping but spanning and does not give performance boost is it still true in case of PERC6i ? 2. I am planning for RAID10 array of 8 drives for entrire database ( including pg_xlog) , the controller has a write back cache (256MB) is it a good idea ? or is it better to have 6 drives in HW RAID1 and RAID0 of 3 mirrors in s/w and leave 2 drives (raid1) for OS ? 3. Is there any preferred Stripe Size for RAID0 for postgresql usage ? 4. Although i would benchmark (with bonnie++) how would the EMC SATA SAN storage compare with locally attached SAS storage for the purpose of hosting the data , i am hiring the storage primarily for storing base base backups and log archives for PITR implementation. as retal of separate machine was higher than SATA SAN. Regds mallah. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] setting up raid10 with more than 4 drives
On 5/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Thu, May 31, 2007 at 01:28:58AM +0530, Rajesh Kumar Mallah wrote: > i am still not clear what is the best way of throwing in more > disks into the system. > does more stripes means more performance (mostly) ? > also is there any thumb rule about best stripe size ? (8k,16k,32k...) It isn't that simple. RAID1 should theoretically give you the best read performance. If all you care about is read, then "best performance" would be to add more mirrors to your array. For write performance, RAID0 is the best. I think this is what you mean by "more stripes". This is where RAID 1+0/0+1 come in. To reconcile the above. Your question seems to be: I have a RAID 1+0/0+1 system. Should I add disks onto the 0 part of the array? Or the 1 part of the array? My conclusion to you would be: Both, unless you are certain that you load is scaled heavily towards read, in which case the 1, or if scaled heavily towards write, then 0. thanks . this answers to my query. all the time i was thinking of 1+0 only failing to observe the 0+1 part in it. Then comes the other factors. Do you want redundancy? Then you want 1. Do you want capacity? Then you want 0. Ok. There is no single answer for most people. For me, stripe size is the last decision to make, and may be heavily sensitive to load patterns. This suggests a trial and error / benchmarking requirement to determine the optimal stripe size for your use. thanks. mallah. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] setting up raid10 with more than 4 drives
Sorry for posting and disappearing. i am still not clear what is the best way of throwing in more disks into the system. does more stripes means more performance (mostly) ? also is there any thumb rule about best stripe size ? (8k,16k,32k...) regds mallah On 5/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Wed, May 30, 2007 at 08:51:45AM -0700, Luke Lonergan wrote: > > This is standard stuff, very well proven: try googling 'self healing zfs'. > The first hit on this search is a demo of ZFS detecting corruption of one of > the mirror pair using checksums, very cool: > > http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508 > D464883F194061E341F58F4E7E1 > > The bad drive is pointed out directly using the checksum and the data > integrity is preserved. One part is corruption. Another is ordering and consistency. ZFS represents both RAID-style storage *and* journal-style file system. I imagine consistency and ordering is handled through journalling. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] setting up raid10 with more than 4 drives
On 5/30/07, Luke Lonergan <[EMAIL PROTECTED]> wrote: Stripe of mirrors is preferred to mirror of stripes for the best balance of protection and performance. nooo! i am not aksing raid10 vs raid01 . I am considering stripe of mirrors only. the question is how are more number of disks supposed to be BEST utilized in terms of IO performance for 1. for adding more mirrored stripes OR 2. for adding more harddrives to the mirrors. say i had 4 drives in raid10 format D1 raid1 D2 --> MD0 D3 raid1 D4 --> MD1 MD0 raid0 MD1 --> MDF (final) now i get 2 drives D5 and D6 the i got 2 options 1. create a new mirror D5 raid1 D6 --> MD2 MD0 raid0 MD1 raid0 MD2 --> MDF final OR D1 raid1 D2 raid1 D5 --> MD0 D3 raid1 D4 raid1 D6 --> MD1 MD0 raid0 MD1 --> MDF (final) thanks , hope my question is clear now. Regds mallah. In the stripe of mirrors you can lose up to half of the disks and still be operational. In the mirror of stripes, the most you could lose is two drives. The performance of the two should be similar - perhaps the seek performance would be different for high concurrent use in PG. - Luke On 5/29/07 2:14 PM, "Rajesh Kumar Mallah" <[EMAIL PROTECTED]> wrote: > hi, > > this is not really postgresql specific, but any help is appreciated. > i have read more spindles the better it is for IO performance. > > suppose i have 8 drives , should a stripe (raid0) be created on > 2 mirrors (raid1) of 4 drives each OR should a stripe on 4 mirrors > of 2 drives each be created ? > > also does single channel or dual channel controllers makes lot > of difference in raid10 performance ? > > regds > mallah. > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] setting up raid10 with more than 4 drives
hi, this is not really postgresql specific, but any help is appreciated. i have read more spindles the better it is for IO performance. suppose i have 8 drives , should a stripe (raid0) be created on 2 mirrors (raid1) of 4 drives each OR should a stripe on 4 mirrors of 2 drives each be created ? also does single channel or dual channel controllers makes lot of difference in raid10 performance ? regds mallah. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
[offtopic]; hmm quite a long thread below is stats of posting Total Messages:87Total Participants: 27 - 19 Daniel van Ham Colchete 12 Michael Stone 9 Ron 5 Steinar H. Gunderson 5 Alexander Staubo 4 Tom Lane 4 Greg Smith 3 Luke Lonergan 3 Christopher Browne 2 Merlin Moncure 2 Guido Neitzer 2 Dave Cramer 2 Cosimo Streppone 2 Bucky Jordan 1 Tatsuo Ishii 1 Sven Geisler 1 Shane Ambler 1 Michael Glaesemann 1 Mark Kirkwood 1 Gene 1 Florian Weimer 1 David Boreham 1 Craig A. James 1 Chris Browne 1 Brad Nicholson 1 Bill Moran 1 Alvaro Herrera --- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insertion to temp table deteriorating over time
On 12/13/06, Steven Flatt <[EMAIL PROTECTED]> wrote: Hi, Our application is using Postgres 7.4 and I'd like to understand the root cause of this problem: To speed up overall insert time, our application will write thousands of rows, one by one, into a temp table 1. how frequently are you commiting the transaction ? if you commit less frequetly it will be faster. 2. If you use COPY instead of INSERT it will be faster. using COPY is easy with DBD::Pg (perl). In versions 8.x i think there has been major speed improvements in COPY. I do not know the root cause of slowdown though. Regds mallah. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] really quick multiple inserts can use COPY?
So, my questions: Is it possible to use COPY FROM STDIN with JDBC? Should be. Its at least possible using DBI and DBD::Pg (perl) my $copy_sth = $dbh -> prepare( "COPY general.datamining_mailing_lists (query_id,email_key) FROM STDIN;") ; $copy_sth -> execute(); while (my ($email_key ) = $fetch_sth -> fetchrow_array ()) { $dbh -> func("$query_id\t$email_key\n", 'putline'); } $fetch_sth -> finish(); $dbh -> func("\\.\n", 'putline'); $dbh -> func('endcopy'); $copy_sth->finish(); Some JDBC expert would tell better how its done with JDBC. Will it bring performance improvement compared to SELECT UNION solution? COPY is quite faast. Regds mallah. many thanks in advance, Jens Schipkowski -- ** APUS Software GmbH ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgresql - Threshold value.
On 12/11/06, Ravindran G - TLS, Chennai. <[EMAIL PROTECTED]> wrote: Thanks. I am using Postgres 8.1.4 in windows 2000 and i don't get the proper response for threshold. what is the response you get ? please be specific about the issues. also the footer that comes with your emails are not appreciated by many people. if possible pls avoid it. Regds mallah. - ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgresql - Threshold value.
On 12/11/06, Ravindran G - TLS, Chennai. <[EMAIL PROTECTED]> wrote: Hello, How to get Postgresql Threshold value ?. Any commands available ?. What is meant my threshold value ? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUM FULL does not works.......
We have a view in our database. CREATE view public.hogs AS SELECT pg_stat_activity.procpid, pg_stat_activity.usename, pg_stat_activity.current_query FROM ONLY pg_stat_activity; Select current_query from public.hogs helps us to spot errant queries at times. regds mallah. On 12/7/06, asif ali <[EMAIL PROTECTED]> wrote: Thanks Scott, It worked!!! We killed an old idle running transaction, now everything is fine.. Thanks Again asif ali icrossing inc Scott Marlowe <[EMAIL PROTECTED]> wrote: On Wed, 2006-12-06 at 15:53, asif ali wrote: > Thanks Everybody for helping me out. > I checked "pg_stat_activity"/pg_locks, but do not see any activity on > the table. > How to find a old running transaction... > I saw this link, but it did not help.. > http://archives.postgresql.org/pgsql-hackers/2005-02/msg00760.php Sometimes just using top or ps will show you. on linux you can run top and then hit c for show command line and look for ones that are IDLE Or, try ps: ps axw|grep postgres On my machine normally: 2408 ? S 0:00 /usr/local/pgsql/bin/postmaster -p 5432 -D /home/postgres/data 2615 ? S 0:00 postgres: stats buffer process 2616 ? S 0:00 postgres: stats collector process 2857 ? S 0:00 postgres: writer process 2858 ? S 0:00 postgres: stats buffer process 2859 ? S 0:00 postgres: stats collector process But with an idle transaction: 2408 ? S 0:00 /usr/local/pgsql/bin/postmaster -p 5432 -D /home/postgres/data 2615 ? S 0:00 postgres: stats buffer process 2616 ? S 0:00 postgres: stats collector process 2857 ? S 0:00 postgres: writer process 2858 ? S 0:00 postgres: stats buffer process 2859 ? S 0:00 postgres: stats collector process 8679 ? S 0:00 postgres: smarlowe test [local] idle in transaction Thar she blows! Also, you can restart the database and vacuum it then too. Of course, don't do that during regular business hours... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Have a burning question? Go to Yahoo! Answers and get answers from real people who know. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL does not works.......
On 12/6/06, asif ali <[EMAIL PROTECTED]> wrote: Hi, I have a "product" table having 350 records. It takes approx 1.8 seconds to get all records from this table. I copies this table to a "product_temp" table and run the same query to select all records; and it took 10ms(much faster). I did "VACUUM FULL" on "product" table but It did not work. I checked the file size of these two tables. "product" table's file size is "32mb" and "product_temp" table's file size is "72k". So, it seems that "VACUUM FULL" is not doing anything. Please suggest. It is desirable that PostgreSQL version be reported in problem descriptions. Older versions of pgsql had problem of index bloat. It is interesting to find out why VACUUM FULL does not work in your case(wait for the experts) , but most probably CLUSTERING the table on primary key is going to solve the query performance problem (temporarily) asif ali icrossing inc. Have a burning question? Go to Yahoo! Answers and get answers from real people who know. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Restart time
On 12/6/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Rajesh Kumar Mallah" <[EMAIL PROTECTED]> writes: > Startup time of a clean shutdown database is constant. But we still > face problem when it comes to shutting down. PostgreSQL waits > for clients to finish gracefully. till date i have never been able to > shutdown > quickly (web application scenerio) and i tend to do pg_ctl -m immediate stop > mostly. RTFM ... you should be using -m fast not -m immediate. -m immediate is for emergency situations not routine. Thanks for correcting , -m fast works fine for me. I shall RTFM. :) Regds mallah.regards, tom lane
Re: [PERFORM] Restart time
On 12/5/06, Tom Lane <[EMAIL PROTECTED]> wrote: Jean Arnaud <[EMAIL PROTECTED]> writes: > Is there a relation between database size and PostGreSQL restart duration ? No. > Does anyone now the behavior of restart time ? It depends on how many updates were applied since the last checkpoint before the crash. If you're talking about startup of a cleanly-shut-down database, it should be pretty much constant time. Dear Sir, Startup time of a clean shutdown database is constant. But we still face problem when it comes to shutting down. PostgreSQL waits for clients to finish gracefully. till date i have never been able to shutdown quickly (web application scenerio) and i tend to do pg_ctl -m immediate stop mostly. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Stored Procedure Performance
On 4/11/06, Simon Dale <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > I'm trying to evaluate PostgreSQL as a database that will have to store a > high volume of data and access that data frequently. One of the features on > our wish list is to be able to use stored procedures to access the data and > I was wondering if it is usual for stored procedures to perform slower on > PostgreSQL than raw SQL? No. RETURN NEXT keeps accumulating the data before returning. I am not sure if any optimisations have been done to that effect. In general functions are *NOT* slower than RAW SQL. Regds mallah. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Restore performance?
4. fsync can also be turned off while loading huge dataset , but seek others comments too (as study docs) as i am not sure about the reliability. i think it can make a lot of difference. On 4/10/06, Jesper Krogh <[EMAIL PROTECTED]> wrote: Rajesh Kumar Mallah wrote:>> I'd run pg_dump | gzip > sqldump.gz on the old system. That took about>> 30 hours and gave me an 90GB zipped file. Running>> cat sqldump.gz | gunzip | psql >> into the 8.1 database seems to take about the same time. Are there>> any tricks I can use to speed this dump+restore process up?>>> was the last restore successfull ?> if so why do you want to repeat ? "about the same time" == Estimated guess from restoring a few tablesI was running a testrun, without disabling updates to the productiondatabase, the real run is scheduled for easter where there hopefully is no users on the system. So I need to repeat, I'm just trying to get afeelingabout how long time I need to allocate for the operation.> 1. run new version of postgres in a different port and pipe pg_dump to psql > this may save the CPU time of compression , there is no need for a temporary> dump file.>> pg_dump | /path/to/psql813 -p 54XX newdbI'll do that. It is a completely different machine anyway. > 2. use new version of pg_dump to dump the old database as new version> is supposed to be wiser.Check.> 3. make sure you are trapping the restore errors properly> psql newdb 2>&1 | cat | tee err works for me. Thats noted.--Jesper Krogh, [EMAIL PROTECTED]
Re: [PERFORM] Takes too long to fetch the data from database
what is the query ?use LIMIT or a restricting where clause.regdsmallah.On 4/10/06, soni de < [EMAIL PROTECTED]> wrote:Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.
Re: [PERFORM] Restore performance?
sorry for the post , i didn' saw the other replies only after posting.On 4/10/06, Rajesh Kumar Mallah <[EMAIL PROTECTED] > wrote: On 4/10/06, Jesper Krogh <[EMAIL PROTECTED] > wrote: HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.I'd run pg_dump | gzip > sqldump.gz on the old system. That took about30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up? was the last restore successfull ? if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.3. make sure you are trapping the restore errors properly psql newdb 2>&1 | cat | tee err works for me. The database contains quite alot of BLOB, thus the size. Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Restore performance?
On 4/10/06, Jesper Krogh <[EMAIL PROTECTED]> wrote: HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.I'd run pg_dump | gzip > sqldump.gz on the old system. That took about30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up?was the last restore successfull ? if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.3. make sure you are trapping the restore errors properly psql newdb 2>&1 | cat | tee err works for me. The database contains quite alot of BLOB, thus the size. Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pls reply ASAP
On 4/9/06, Chethana, Rao (IE10) <[EMAIL PROTECTED]> wrote:> > > > > Hello! > > > > Kindly go through the following , > >> >> > I wanted to know whether, the command line arguments(function arguments) -- $1 $2 $3 -- can be used as in the following , like, > >> >> > CREATE TYPE TT AS(something,something,……etc……); > > CREATE OR REPLACE FUNCTION f1(varchar,varchar,varchar,varchar) RETURNS ………..(something). > the overall idea expressed is doable. following are comments 1. you have to put RETURNS setof TT (if you plan to return TT) since you used RETURN NEXT 2. you have to use SELECT INTO rec in the function where rec is rowtype TT hope it helps --- non technical comments -- 3. its not a performance question , it shud have been marked more appropriately to pgsql-sql i think. 4. its not a good etiquette to address email to someone and mark Cc to a list. kind regds mallah. >> > BEGIN > > SELECT a1,a2,a3,a4,a5,a6 > > FROM (SELECT * FROM T1, T2……WHERE etc… Flag = 0 $1 $2 $3 $4) > > ORDER BY > > ……. > > RETURN NEXT ………; > > END LOOP; > > RETURN; > > END; > > ' LANGUAGE 'plpgsql'; > >> > NOTE :The values for $1 $2 $3 $4 will be passed when the function is invoked(called) from the command prompt. > >> > I tried implementing the above, but this type of usage is not supported , how should use it?> >> > I am converting from (sprintf, "SELECT query stmts (which uses %s %s %s %s …… )to functions. > >> >> > Any help will be deeply appreciated. Thank you. > >> >> > Kind regards, > > Chethana. > >> >> >
Re: [PERFORM] optimizing db for small table with tons of updates
Dear Kenji, we had similar issuse with a banner impression update system, that had high concurrency. we modfied the system to use insert instead of update of the same row. performance wise things are much better , but you have to keep deleting old data. hope you extrapolate what i mean if its applicable to your case. Regds Rajesh Kumar Mallah On 4/3/06, Kenji Morishige <[EMAIL PROTECTED]> wrote: > I am using postgresql to be the central database for a variety of tools for > our testing infrastructure. We have web tools and CLI tools that require > access > to machine configuration and other states for automation. We have one tool > that > uses a table that looks like this: > > systest_live=# \d cuty > Table "public.cuty" >Column| Type | Modifiers > -+--+--- > resource_id | integer | not null > lock_start | timestamp with time zone | > lock_by | character varying(12)| > frozen | timestamp with time zone | > freeze_end | timestamp with time zone | > freeze_by | character varying(12)| > state | character varying(15)| > Indexes: > "cuty_pkey" PRIMARY KEY, btree (resource_id) > "cuty_main_idx" btree (resource_id, lock_start) > Foreign-key constraints: > "cuty_resource_id_fkey" FOREIGN KEY (resource_id) REFERENCES > resource(resource_id) ON UPDATE CASCADE ON DELETE CASCADE > > Various users run a tool that updates this table to determine if the > particular > resource is available or not. Within a course of a few days, this table can > be updated up to 200,000 times. There are only about 3500 records in this > table, but the update and select queries against this table start to slow > down considerablly after a few days. Ideally, this table doesn't even need > to be stored and written to the filesystem. After I run a vacuum against this > table, the overall database performance seems to rise again. When database > is running with recent vacuum the average server load is about .40, but after > this table is updated 200,000+ times, the server load can go up to 5.0. > > here is a typical update query: > 2006-04-03 10:53:39 PDT testtool systest_live kyoto.englab.juniper.net(4888) > LOG: duration: 2263.741 ms statement: UPDATE cuty SET > lock_start = NOW(), > lock_by = 'tlim' > WHERE resource_id='2262' and (lock_start IS NULL OR lock_start < > (NOW() - interval '3600 second')) > > We used to use MySQL for these tools and we never had any issues, but I > believe > it is due to the transactional nature of Postgres that is adding an overhead > to this problem. Are there any table options that enables the table contents > to be maintained in ram only or have delayed writes for this particular table? > > Thanks in advance, > Kenji > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow concurrent update of same row in a given table
On 9/29/05, Gavin Sherry <[EMAIL PROTECTED]> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > > > Number of Copies | Update perl Sec > > > > > > > > 1 --> 119 > > > > 2 ---> 59 > > > > 3 ---> 38 > > > > 4 ---> 28 > > > > 5 --> 22 > > > > 6 --> 19 > > > > 7 --> 16 > > > > 8 --> 14 > > > > 9 --> 11 > > > > 10 --> 11 > > > > 11 --> 10 > > > > > > So, 11 instances result in 10 updated rows per second, database wide or > > > per instance? If it is per instance, then 11 * 10 is close to the > > > performance for one connection. > > > > > > Sorry do not understand the difference between "database wide" > > and "per instance" > > Per instance. > > > > > > > > > That being said, when you've got 10 connections fighting over one row, I > > > wouldn't be surprised if you had bad performance. > > > > > > Also, at 119 updates a second, you're more than doubling the table's > > > initial size (dead tuples) each second. How often are you vacuuming and > > > are you using vacuum or vacuum full? > > > > > > Yes I realize the obvious phenomenon now, (and the uselessness of the > > script) > > , we should not consider it a performance degradation. > > > > I am having performance issue in my live database thats why i tried to > > simulate the situation(may the the script was overstresser). > > > > My original problem is that i send 100 000s of emails carrying a > > beacon for tracking readership every tuesday and on wednesday i see > > lot of the said query in pg_stat_activity each of these query update > > the SAME row that corresponds to the dispatch of last day and it is > > then i face the performance problem. > > > > I think i can only post further details next wednesday , please lemme > > know how should i be dealing with the situation if each the updates takes > > 100times more time that normal update duration. > > I see. These problems regularly come up in database design. The best thing > you can do is modify your database design/application such that instead of > incrementing a count in a single row, you insert a row into a table, > recording the 'dispatch_id'. Counting the number of rows for a given > dispatch id will give you your count. > sorry i will be accumulating huge amount of rows in seperate table with no extra info when i really want just the count. Do you have a better database design in mind? Also i encounter same problem in implementing read count of articles in sites and in counting banner impressions where same row get updated by multiple processes frequently. Thanks & Regds mallah. > Thanks, > > Gavin > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow concurrent update of same row in a given table
On 9/28/05, Gavin Sherry <[EMAIL PROTECTED]> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > Hi > > > > While doing some stress testing for updates in a small sized table > > we found the following results. We are not too happy about the speed > > of the updates particularly at high concurrency (10 clients). > > > > Initially we get 119 updates / sec but it drops to 10 updates/sec > > as concurrency is increased. > > > > PostgreSQL: 8.0.3 > > --- > > TABLE STRUCTURE: general.stress > > --- > > | dispatch_id | integer | not null | > > | query_id | integer | | > > | generated| timestamp with time zone | | > > | unsubscribes | integer | | > > | read_count | integer | | > > | status | character varying(10)| | > > | bounce_tracking | boolean | | > > | dispatch_hour| integer | | > > | dispatch_date_id | integer | | > > +--+--+---+ > > Indexes: > > "stress_pkey" PRIMARY KEY, btree (dispatch_id) > > > > UPDATE STATEMENT: > > update general.stress set read_count=read_count+1 where dispatch_id=114 > > This means you are updating only one row, correct? Correct. > > > Number of Copies | Update perl Sec > > > > 1 --> 119 > > 2 ---> 59 > > 3 ---> 38 > > 4 ---> 28 > > 5 --> 22 > > 6 --> 19 > > 7 --> 16 > > 8 --> 14 > > 9 --> 11 > > 10 --> 11 > > 11 --> 10 > > So, 11 instances result in 10 updated rows per second, database wide or > per instance? If it is per instance, then 11 * 10 is close to the > performance for one connection. Sorry do not understand the difference between "database wide" and "per instance" > > That being said, when you've got 10 connections fighting over one row, I > wouldn't be surprised if you had bad performance. > > Also, at 119 updates a second, you're more than doubling the table's > initial size (dead tuples) each second. How often are you vacuuming and > are you using vacuum or vacuum full? Yes I realize the obvious phenomenon now, (and the uselessness of the script) , we should not consider it a performance degradation. I am having performance issue in my live database thats why i tried to simulate the situation(may the the script was overstresser). My original problem is that i send 100 000s of emails carrying a beacon for tracking readership every tuesday and on wednesday i see lot of the said query in pg_stat_activity each of these query update the SAME row that corresponds to the dispatch of last day and it is then i face the performance problem. I think i can only post further details next wednesday , please lemme know how should i be dealing with the situation if each the updates takes 100times more time that normal update duration. Best Regards Mallah. > > Gavin > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Slow concurrent update of same row in a given table
Hi While doing some stress testing for updates in a small sized table we found the following results. We are not too happy about the speed of the updates particularly at high concurrency (10 clients). Initially we get 119 updates / sec but it drops to 10 updates/sec as concurrency is increased. PostgreSQL: 8.0.3 --- TABLE STRUCTURE: general.stress --- | dispatch_id | integer | not null | | query_id | integer | | | generated| timestamp with time zone | | | unsubscribes | integer | | | read_count | integer | | | status | character varying(10)| | | bounce_tracking | boolean | | | dispatch_hour| integer | | | dispatch_date_id | integer | | +--+--+---+ Indexes: "stress_pkey" PRIMARY KEY, btree (dispatch_id) UPDATE STATEMENT: update general.stress set read_count=read_count+1 where dispatch_id=114 TOOL USED: Perl/DBI , with prepared statement handlers CONCURRENCY METHOD: executing multiple copies of same program from different shells (linux enviornment) CLIENT SERVER LINK : 10/100 Mbits , LAN CLIENT CODE: stress.pl - #!/opt/perl/bin/perl -I/usr/local/masonapache/lib/perl #overview: update the table as fast as possible (while(1){}) #on every 100th commit , print the average update frequency #of last 100 updates ## use strict; use Time::HiRes qw(gettimeofday tv_interval); use Utils; my $dbh = &Utils::db_connect(); my $sth = $dbh -> prepare("update general.stress set read_count=read_count+1 where dispatch_id=114"); my $cnt=0; my $t0 = [ gettimeofday ]; while(1) { $sth -> execute(); $dbh->commit(); $cnt++; if ($cnt % 100 == 0) { my $t1 = [ gettimeofday ]; my $elapsed = tv_interval ( $t0 , $t1 ); $t0 = $t1; printf "Rate: %d updates / sec\n" , 100.0/$elapsed ; } } $sth->finish(); $dbh->disconnect(); -- -- RESULTS: -- Number of Copies | Update perl Sec 1 --> 119 2 ---> 59 3 ---> 38 4 ---> 28 5 --> 22 6 --> 19 7 --> 16 8 --> 14 9 --> 11 10 --> 11 11 --> 10 - Note that the table was vacuum analyzed during the tests total number of records in table: 93 ------------- Regds Rajesh Kumar Mallah. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL and Kernel 2.6.x
V i s h a l Kashyap @ [Sai Hertz And Control Systems] wrote: Dear all, Have anyone compiled PostgreSQL with kernel 2.6.x if YES 1. Was their any performance gains Else 1. Is it possible 2. What problems would keeping us away from compiling on kernel 2.6 We run pgsql on 2.6.6 there was upto 30% improvement in performance for certain queries. None, everything works just fine. Regds Mallah. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] seeking consultant for high performance, complex searching
Have you checked Tsearch2 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ is the most feature rich Full text Search system available for postgresql. We are also using the same system in the revamped version of our website. Regds Mallah. Mark Stosberg wrote: Hello, I work for Summersault, LLC. We've been using Postgres since the days of Postgres 6.5. We're focused on building database-driven websites using Perl and Postgres. We are currently seeking help developing a search system that needs to perform complex queries with high performance. Although we have strong skills in Perl and Postgres, we are new to the arena of complex, high-performance search systems. We are seeking to hire a consultant to help this as part of the re-vamp of the 1-800-Save-A-Pet.com website. 1-800-Save-A-Pet.com is a not-for-profit organization whose website finds homes for homeless pets, promoting pet adoption and saving thousands of animal lives. Summersault, LLC is a website development firm focused on creating highly customized database driven websites. The ideal consultant has expert experience with the PostgreSQL RDBMS and the Perl programming language, and is intimately familiar with the architecture and implementation of complex database queries for high-traffic web applications. The consultant should also have a strong background in creating solutions complementary to this work, e.g. assessing hardware requirements, designing a hosting and network infrastructure, and optimizing the algorithm based on real-world feedback. The consultant will work with Summersault developers as a part of a larger application development process. Interested persons or organizations should contact Chris Hardie of Summersault, LLC at [EMAIL PROTECTED] for more information. Thanks! Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
I am running an update on the same table update rfis set inquiry_status='APPROVED' where inquiry_status='a'; Its running for past 20 mins. and top output is below. The PID which is executing the query above is 6712. Can anyone tell me why it is in an uninterruptable sleep and does it relate to the apparent poor performance? Is it problem with the disk hardware. I know at nite this query will run reasonably fast. I am running on a decent hardware . Regds mallah. 1:41pm up 348 days, 21:10, 1 user, load average: 11.59, 13.69, 11.49 85 processes: 83 sleeping, 1 running, 0 zombie, 1 stopped CPU0 states: 8.1% user, 2.3% system, 0.0% nice, 89.0% idle CPU1 states: 3.3% user, 2.3% system, 0.0% nice, 93.2% idle CPU2 states: 7.4% user, 1.4% system, 0.0% nice, 90.0% idle CPU3 states: 9.3% user, 7.4% system, 0.0% nice, 82.2% idle Mem: 2064796K av, 2053964K used, 10832K free, 0K shrd, 22288K buff Swap: 2048244K av, 88660K used, 1959584K free 1801532K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 6712 postgres 16 0 86592 84M 83920 D 11.1 4.1 1:36 postmaster 13103 postgres 15 0 54584 53M 52556 S 3.5 2.6 0:01 postmaster 13034 root 16 0 1072 1072 848 R 2.1 0.0 0:02 top 13064 postgres 15 0 67256 65M 64516 D 2.1 3.2 0:01 postmaster 13088 postgres 16 0 43324 42M 40812 D 2.1 2.0 0:00 postmaster 13076 postgres 15 0 49016 47M 46628 S 1.9 2.3 0:00 postmaster 26931 postgres 15 0 84880 82M 83888 S 1.7 4.1 3:52 postmaster 13107 postgres 15 0 18400 17M 16488 S 1.5 0.8 0:00 postmaster 13068 postgres 15 0 44632 43M 42324 D 1.3 2.1 0:00 postmaster 13074 postgres 15 0 68852 67M 66508 D 1.3 3.3 0:00 postmaster 13108 postgres 15 0 11692 11M 10496 S 1.3 0.5 0:00 postmaster 13075 postgres 15 0 50860 49M 47680 S 1.1 2.4 0:04 postmaster 13066 postgres 15 0 56112 54M 53724 S 0.9 2.7 0:01 postmaster 13109 postgres 15 0 14528 14M 13272 S 0.9 0.7 0:00 postmaster 24454 postgres 15 0 2532 2380 1372 S 0.7 0.1 11:58 postmaster 12 root 15 0 0 0 0 SW 0.5 0.0 816:30 bdflush 24455 postgres 15 0 1600 1476 1380 S 0.5 0.0 9:11 postmaster 12528 postgres 15 0 84676 82M 79920 S 0.3 4.0 0:02 postmaster 12575 postgres 15 0 76660 74M 75796 D 0.3 3.7 0:09 postmaster 13038 postgres 15 0 48952 47M 46436 D 0.3 2.3 0:00 postmaster 13069 postgres 15 0 57464 56M 54852 S 0.3 2.7 0:00 postmaster 13102 postgres 15 0 17864 17M 16504 D 0.3 0.8 0:00 postmaster Richard Huxton wrote: On Thursday 15 April 2004 17:19, Rajesh Kumar Mallah wrote: Bill Moran wrote: Rajesh Kumar Mallah wrote: Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. If dropping/recreating the table improves things, then we can reasonably assume that the table is pretty active with updates/inserts. Correct? Yes the table results from an import process and under goes lots of inserts and updates , but thats before the vacuum full operation. the table is not accessed during vacuum. What i want to know is is there any wat to automate the dumping and reload of a table individually. will the below be safe and effective: Shouldn't be necessary assuming you vacuum (not full) regularly. However, looking back at your original posting, the vacuum output doesn't seem to show any rows that need removing. # VACUUM full verbose eyp_rfi; INFO: vacuuming "public.eyp_rfi" INFO: "eyp_rfi": found 0 removable, 505960 nonremovable row versions in 71987 pages DETAIL: 0 dead row versions cannot be removed yet. Since your select count(*) showed 505960 rows, I can't see how dropping/replacing could make a difference on a sequential scan. Since we're not using any indexes I don't see how it could be related to that. begin work; create table new_tab AS select * from tab; truncate table tab; insert into tab select * from new_tab; drop table new_tab; commit; analyze tab; i havenot tried it but plan to do so. but i feel insert would take ages to update the indexes if any. It will have to update them, which will take time. BTW is there any way to disable checks and triggers on a table temporarily while loading data (is updating reltriggers in pg_class safe?) You can take a look at pg_restore and copy how it does it.
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
Bill Moran wrote: Rajesh Kumar Mallah wrote: Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. If dropping/recreating the table improves things, then we can reasonably assume that the table is pretty active with updates/inserts. Correct? Yes the table results from an import process and under goes lots of inserts and updates , but thats before the vacuum full operation. the table is not accessed during vacuum. What i want to know is is there any wat to automate the dumping and reload of a table individually. will the below be safe and effective: begin work; create table new_tab AS select * from tab; truncate table tab; insert into tab select * from new_tab; drop table new_tab; commit; analyze tab; i havenot tried it but plan to do so. but i feel insert would take ages to update the indexes if any. BTW is there any way to disable checks and triggers on a table temporarily while loading data (is updating reltriggers in pg_class safe?) The problem is that such phenomenon obscures our judgement used in optimising queries and database. Lots of phenomenon obscure that ... true. but there should not be too many. If a query runs slow we really cant tell if its a problem with query itself , hardware or dead rows. I already did vacumm full on the table but it still did not have that effect on performance. In fact the last figures were after doing a vacuum full. If the data gets too fragmented, a vacuum may not be enough. Also, read up on the recommendations _against_ vacuum full (recommending only using vacuum on databases) With full, vacuum condenses the database, which may actually hurt performance. A regular vacuum just fixes things up, and may leave unused space lying around. However, this should apparently achieve a balance between usage and vacuum. See the docs, they are much better at describing this than I am. i understand simultaneous vacuum and usage detoriates performance mostly. but this case is different. Can there be any more elegent solution to this problem. As a guess, look into CLUSTER (a Postgres SQL command). CLUSTER will basically recreate the table while ordering rows based on an index. (this might benefit you in other ways as well) Don't forget to analyze after cluster. If the problem is caused by frequent updates/inserts, you may find that re-clustering the table on a certain schedule is worthwhile. i could consider that option also. Be warned, this suggestion is based on an educated guess, I make no guarantees that it will help your problem. Read the docs on cluster and come to your own conclusions. Thanks . Regds mallah. Regds Mallah. Richard Huxton wrote: On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon Hmm - doesn't seem good, does it? If you run it again, is it much faster (since the data should be cached then)? What does "vmstat 10" show while you're running the query? One thing you should have done is read the performance tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The default values are very conservative, and you will need to change them. What kind of upgrades shoud be put on the server for it to become reasonable fast. If you've only got one disk, then a second disk for OS/logging. Difficult to say more without knowing numbers of users/activity etc. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
Richard Huxton wrote: On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon Hmm - doesn't seem good, does it? If you run it again, is it much faster (since the data should be cached then)? What does "vmstat 10" show while you're running the query? One thing you should have done is read the performance tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The default values are very conservative, and you will need to change them. Hi, Thanks for the interest . my config are not the default ones. i was running iostat while running the query. Looks like one of the disks doesnt' go past a read performance of 20 ,000 KBytes/sec while the other disk it goes as high as 40,000 . What i am ding currently is loading the table in both the disks and compare the table scan speeds. The performance is definitely better in the newly loaded table in the other disk . the load in server is 13 because i am simultaneously re-loading the data in other table. rt2=# SELECT count(*) from rfis where sender_uid > 0; ++ | count | ++ | 564870 | ++ (1 row) Time: 10288.359 ms rt2=# shall post the comparitive details under normal load soon regds mallah. What kind of upgrades shoud be put on the server for it to become reasonable fast. If you've only got one disk, then a second disk for OS/logging. Difficult to say more without knowing numbers of users/activity etc. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. The problem is that such phenomenon obscures our judgement used in optimising queries and database. If a query runs slow we really cant tell if its a problem with query itself , hardware or dead rows. I already did vacumm full on the table but it still did not have that effect on performance. In fact the last figures were after doing a vacuum full. Can there be any more elegent solution to this problem. Regds Mallah. Richard Huxton wrote: On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon Hmm - doesn't seem good, does it? If you run it again, is it much faster (since the data should be cached then)? What does "vmstat 10" show while you're running the query? One thing you should have done is read the performance tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The default values are very conservative, and you will need to change them. What kind of upgrades shoud be put on the server for it to become reasonable fast. If you've only got one disk, then a second disk for OS/logging. Difficult to say more without knowing numbers of users/activity etc. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
The relation size for this table is 1.7 GB tradein_clients=# SELECT public.relation_size ('general.rfis'); +--+ | relation_size| +--+ |1,762,639,872 | +--+ (1 row) Regds mallah. Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; ++ | count | ++ | 564870 | ++ Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon What kind of upgrades shoud be put on the server for it to become reasonable fast. Regds mallah. Richard Huxton wrote: On Wednesday 14 April 2004 18:53, Rajesh Kumar Mallah wrote: Hi I have .5 million rows in a table. My problem is select count(*) takes ages. VACUUM FULL does not help. can anyone please tell me how to i enhance the performance of the setup. SELECT count(*) from eyp_rfi; If this is the actual query you're running, and you need a guaranteed accurate result, then you only have one option: write a trigger function to update a table_count table with every insert/delete to eyp_rfi. There is loads of info on this (and why it isn't as simple as you might think) in the archives. First though: 1. Is this the actual query, or just a representation? 2. Do you need an accurate figure or just something "near enough"? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; ++ | count | ++ | 564870 | ++ Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon What kind of upgrades shoud be put on the server for it to become reasonable fast. Regds mallah. Richard Huxton wrote: On Wednesday 14 April 2004 18:53, Rajesh Kumar Mallah wrote: Hi I have .5 million rows in a table. My problem is select count(*) takes ages. VACUUM FULL does not help. can anyone please tell me how to i enhance the performance of the setup. SELECT count(*) from eyp_rfi; If this is the actual query you're running, and you need a guaranteed accurate result, then you only have one option: write a trigger function to update a table_count table with every insert/delete to eyp_rfi. There is loads of info on this (and why it isn't as simple as you might think) in the archives. First though: 1. Is this the actual query, or just a representation? 2. Do you need an accurate figure or just something "near enough"? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] select count(*) very slow on an already vacuumed table.
Hi I have .5 million rows in a table. My problem is select count(*) takes ages. VACUUM FULL does not help. can anyone please tell me how to i enhance the performance of the setup. Regds mallah. postgresql.conf -- max_fsm_pages = 55099264 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 5000 tradein_clients=# explain analyze SELECT count(*) from eyp_rfi; QUERY PLAN - Aggregate (cost=78311.37..78311.37 rows=1 width=0) (actual time=42306.902..42306.903 rows=1 loops=1) -> Seq Scan on eyp_rfi (cost=0.00..77046.49 rows=505949 width=0) (actual time=0.032..41525.007 rows=505960 loops=1) Total runtime: 42306.995 ms (3 rows) tradein_clients=# SELECT count(*) from eyp_rfi; count 505960 (1 row) tradein_clients=# VACUUM full verbose eyp_rfi; INFO: vacuuming "public.eyp_rfi" INFO: "eyp_rfi": found 0 removable, 505960 nonremovable row versions in 71987 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 186 to 2036 bytes long. There were 42587 unused item pointers. Total free space (including removable row versions) is 21413836 bytes. 0 pages are or will become empty, including 0 at the end of the table. 38693 pages containing 19146684 free bytes are potential move destinations. CPU 2.62s/0.40u sec elapsed 38.45 sec. INFO: index "eyp_rfi_date" now contains 505960 row versions in 1197 pages DETAIL: 0 index row versions were removed. 4 index pages have been deleted, 4 are currently reusable. CPU 0.05s/0.29u sec elapsed 0.87 sec. INFO: index "eyp_rfi_receiver_uid" now contains 505960 row versions in 1163 pages DETAIL: 0 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU 0.03s/0.42u sec elapsed 1.33 sec. INFO: index "eyp_rfi_inhouse" now contains 505960 row versions in 1208 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.04s/0.21u sec elapsed 1.20 sec. INFO: index "eyp_rfi_rfi_id_key" now contains 505960 row versions in 1201 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.03s/0.33u sec elapsed 0.81 sec. INFO: index "eyp_rfi_list_id_idx" now contains 505960 row versions in 1133 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.43u sec elapsed 1.12 sec. INFO: index "eyp_rfi_status" now contains 505960 row versions in 1448 pages DETAIL: 0 index row versions were removed. 4 index pages have been deleted, 4 are currently reusable. CPU 0.05s/0.22u sec elapsed 1.08 sec. INFO: index "eyp_rfi_list_id" now contains 505960 row versions in 1133 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.02s/0.43u sec elapsed 1.00 sec. INFO: index "eyp_rfi_receiver_email" now contains 505960 row versions in 2801 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.16s/0.52u sec elapsed 10.38 sec. INFO: index "eyp_rfi_subj" now contains 80663 row versions in 463 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.03s/0.14u sec elapsed 3.20 sec. INFO: index "eyp_rfi_sender" now contains 505960 row versions in 3025 pages DETAIL: 0 index row versions were removed. 6 index pages have been deleted, 6 are currently reusable. CPU 0.10s/0.39u sec elapsed 4.99 sec. INFO: index "eyp_sender_uid_idx" now contains 505960 row versions in 1216 pages DETAIL: 0 index row versions were removed. 5 index pages have been deleted, 5 are currently reusable. CPU 0.04s/0.36u sec elapsed 2.61 sec. INFO: index "eyp_rfi_rec_uid_idx" now contains 505960 row versions in 1166 pages DETAIL: 0 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU 0.05s/0.41u sec elapsed 2.04 sec. INFO: index "eyp_rfi_index" now contains 505960 row versions in 2051 pages DETAIL: 0 index row versions were removed. 7 index pages have been deleted, 7 are currently reusable. CPU 0.10s/0.28u sec elapsed 8.16 sec. INFO: "eyp_rfi": moved 0 row versions, truncated 71987 to 71987 pages DETAIL: CPU 2.03s/2.09u sec elapsed 95.24 sec. INFO: vacuuming "pg_toast.pg_toast_19609" INFO: "pg_toast_19609": found 0 removable, 105342 nonremovable row versions in 21038 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 37 to 2034 bytes long. There were 145 unused item pointers. Total free space (including removable row versions) is 16551072 bytes. 0 pages are or will become empty, including 0 at the end of the table. 18789 pages containing 16512800 free bytes are potential move destinatio
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
Richard Huxton wrote: On Wednesday 14 April 2004 18:53, Rajesh Kumar Mallah wrote: Hi I have .5 million rows in a table. My problem is select count(*) takes ages. VACUUM FULL does not help. can anyone please tell me how to i enhance the performance of the setup. SELECT count(*) from eyp_rfi; If this is the actual query you're running, and you need a guaranteed accurate result, then you only have one option: write a trigger function to update a table_count table with every insert/delete to eyp_rfi. it is just an example. in general all the queries that involves eyp_rfi become slow. reloading the table makes the query faster. mallah. There is loads of info on this (and why it isn't as simple as you might think) in the archives. First though: 1. Is this the actual query, or just a representation? 2. Do you need an accurate figure or just something "near enough"? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Deleting certain duplicates
Shea,Dan [CIS] wrote: The index is Indexes: "forecastelement_rwv_idx" btree (region_id, wx_element, valid_time) -Original Message- From: Shea,Dan [CIS] [mailto:[EMAIL PROTECTED] Sent: Monday, April 12, 2004 10:39 AM To: Postgres Performance Subject: [PERFORM] Deleting certain duplicates We have a large database which recently increased dramatically due to a change in our insert program allowing all entries. PWFPM_DEV=# select relname,relfilenode,reltuples from pg_class where relname = 'forecastelement'; relname | relfilenode | reltuples -+-+- forecastelement | 361747866 | 4.70567e+08 Column |Type | Modifiers +-+--- version| character varying(99) | origin | character varying(10) | timezone | character varying(99) | region_id | character varying(20) | wx_element | character varying(99) | value | character varying(99) | flag | character(3)| units | character varying(99) | valid_time | timestamp without time zone | issue_time | timestamp without time zone | next_forecast | timestamp without time zone | reception_time | timestamp without time zone | The program is supposed to check to ensure that all fields but the reception_time are unique using a select statement, and if so, insert it. Due an error in a change, reception time was included in the select to check for duplicates. The reception_time is created by a program creating the dat file to insert. Essentially letting all duplicate files to be inserted. I tried the delete query below. PWFPM_DEV=# delete from forecastelement where oid not in (select min(oid) from forecastelement group by version,origin,timezone,region_id,wx_element,value,flag,units,valid_time,iss ue_time,next_forecast); It ran for 3 days creating what I assume is an index in pgsql_tmp of the group by statement. The query ended up failing with "dateERROR:write failed". Well the long weekend is over and we do not have the luxury of trying this again. So I was thinking maybe of doing the deletion in chunks, perhaps based on reception time. its more of an sql question though. to deduplicate on basis of version,origin,timezone,region_id,wx_element,value,flag,units,valid_time, issue_time,next_forecast You could do this. begin work; create temp_table as select distinct on (version,origin,timezone,region_id,wx_element,value,flag,units,valid_time, issue_time,next_forecast) * from forecastelement ; truncate table forecastelement ; drop index ; insert into forecastelement select * from temp_table ; commit; create indexes Analyze forecastelement ; note that distinct on will keep only one row out of all rows having distinct values of the specified columns. kindly go thru the distinct on manual before trying the queries. regds mallah. Are there any suggestions for a better way to do this, or using multiple queries to delete selectively a week at a time based on the reception_time. I would say there are a lot of duplicate entries between mid march to the first week of April. ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Effect of too many columns
Greetings, Is there any performance penalty of having too many columns in a table in terms of read and write speeds. To order to keep operational queries simple (avoid joins) we plan to add columns in the main customer dimension table. Adding more columns also means increase in concurrency in the table as more and more applications will access the same table. Any ideas if its better to split the table application wise or is it ok? Regds mallah. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] slow GIST index creation
Greetings! Why does creation of gist indexes takes significantly more time than normal btree index. Can any configuration changes lead to faster index creation? query: CREATE INDEX co_name_index_idx ON profiles USING gist (co_name_index public.gist_txtidx_ops); regds mallah. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] COUNT & Pagination
scott.marlowe wrote: On Tue, 13 Jan 2004, David Shadovitz wrote: We avert the subsequent execution of count(*) by passing the value of count(*) as a query parameter through the link in page numbers. Mallah, and others who mentioned caching the record count: Yes, I will certainly do this. I can detect whether the query's filter has been changed, or whether the user is merely paging through the results or sorting* the results. I'd love to completely eliminate the cost of the COUNT(*) query, but I guess that I cannot have everything. * My HTML table column headers are hyperlinks which re-execute the query, sorting the results by the selected column. The first click does an ASC sort; a second click does a DESC sort. another useful trick is to have your script save out the count(*) result in a single row table with a timestamp, and every time you grab if, check to see if x number of minutes have passed, and if so, update that row with a count(*). Greetings! The count(*) can get evaluated with any arbitrary combination in whre clause how do you plan to store that information ? In a typical application pagination could be required in n number of contexts . I would be interested to know more about this trick and its applicability in such situations. Offtopic: Does PostgreSQL optimise repeated execution of similar queries ie queries on same table or set of tables (in a join) with same where clause and only differing in LIMIT and OFFSET. I dont know much about MySQL, Is their "Query Cache" achieving better results in such cases? and do we have anything similar in PostgreSQL ? I think the most recently accessed tables anyways get loaded in shared buffers in PostgreSQL so that its not accessed from the disk. But is the "Query Cache" really different from this. Can anyone knowing a little better about the working of MySQLs' query cache throw some light? Regds Mallah. You can even have a cron job do it so your own scripts don't incur the cost of the count(*) and delay output to the user. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] VACUUM problems with 7.4
I am sure there is no transaction open with the table banner_stats2. Still VACUUM FULL does not seems to effective in removing the dead rows. Can any one please help? Regds mallah tradein_clients=# VACUUM FULL verbose banner_stats2 ; INFO: vacuuming "public.banner_stats2" INFO: "banner_stats2": found 0 removable, 741912 nonremovable row versions in 6710 pages DETAIL: 737900 dead row versions cannot be removed yet. Nonremovable row versions range from 61 to 72 bytes long. There were 120 unused item pointers. Total free space (including removable row versions) is 246672 bytes. 0 pages are or will become empty, including 0 at the end of the table. 557 pages containing 61344 free bytes are potential move destinations. CPU 0.15s/1.23u sec elapsed 1.38 sec. INFO: index "banner_stats_pkey" now contains 741912 row versions in 2173 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.03s/0.05u sec elapsed 0.09 sec. INFO: "banner_stats2": moved 0 row versions, truncated 6710 to 6710 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM tradein_clients=# ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] problem with select count(*) ..
Ever Since i upgraded to 7.4RC2 i am facing problem with select count(*) . In 7.3 the problem was not there select count(*) from data_bank.profiles used to return almost instantly , but in 7.4 explain analyze SELECT count(*) from data_bank.profiles; QUERY PLAN --- Aggregate (cost=48361.30..48361.30 rows=1 width=0) (actual time=23456.870..23456.871 rows=1 loops=1) -> Seq Scan on profiles (cost=0.00..47431.84 rows=371784 width=0) (actual time=12174.999..23262.823 rows=123928 loops=1) Total runtime: 23458.460 ms (3 rows) tradein_clients=# If i dump and reload the performance improves and it takes < 1 sec. This is what i have been doing since the upgrade. But its not a solution. The Vacuum full is at the end of a loading batch SQL file which makes lot of insert , deletes and updates. Regds Mallah. VACUUM FULL VERBOSE ANALYZE data_bank.profiles; INFO: vacuuming "data_bank.profiles" INFO: "profiles": found 430524 removable, 371784 nonremovable row versions in 43714 pages INFO: index "profiles_pincode" now contains 371784 row versions in 3419 pages INFO: index "profiles_city" now contains 371784 row versions in 3471 pages INFO: index "profiles_branch" now contains 371784 row versions in 2237 pages INFO: index "profiles_area_code" now contains 371784 row versions in 2611 pages INFO: index "profiles_source" now contains 371784 row versions in 3165 pages INFO: index "co_name_index_idx" now contains 371325 row versions in 3933 pages INFO: index "address_index_idx" now contains 371490 row versions in 4883 pages INFO: index "profiles_exp_cat" now contains 154836 row versions in 2181 pages INFO: index "profiles_imp_cat" now contains 73678 row versions in 1043 pages INFO: index "profiles_manu_cat" now contains 87124 row versions in 1201 pages INFO: index "profiles_serv_cat" now contains 19340 row versions in 269 pages INFO: index "profiles_pid" now contains 371784 row versions in 817 pages INFO: index "profiles_pending_branch_id" now contains 0 row versions in 1 pages INFO: "profiles": moved 0 row versions, truncated 43714 to 43714 pages INFO: vacuuming "pg_toast.pg_toast_67748379" INFO: "pg_toast_67748379": found 0 removable, 74 nonremovable row versions in 17 pages INFO: index "pg_toast_67748379_index" now contains 74 row versions in 2 pages INFO: "pg_toast_67748379": moved 1 row versions, truncated 17 to 17 pages INFO: index "pg_toast_67748379_index" now contains 74 row versions in 2 pages INFO: analyzing "data_bank.profiles" INFO: "profiles": 43714 pages, 3000 rows sampled, 3634 estimated total rows VACUUM Time: 1001525.19 ms ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly