Re: [PERFORM] More cores or higer frequency ?
On Tue, May 23, 2017 at 1:49 PM, Sven R. Kunzewrote: > On 23.05.2017 22:14, Jarek wrote: > >> I have pool of clients (~30) inserting to database about 50 records per >> second (in total from all clients) and small numer (<10) clients >> querying database for those records once per 10s. >> Other queries are rare and irregular. >> The biggest table has ~ 100mln records (older records are purged >> nightly). Database size is ~13GB. >> I near future I'm expecting ~150 clients and 250 inserts per second and >> more clients querying database. >> Server is handling also apache with simple web application written in >> python. >> For the same price, I can get 8C/3.2GHz or 14C/2.6GHz. Which one will be >> better ? >> >> I would start by trying a few things on your existing equipment. If your inserts are coming from individual connections, say, via the web-app in a connect-insert-disconnect fashion then pooling can be a huge win. Connection overhead is a bigger factor than you might imagine and I've seen as much as a 10x improvement in small queries when pooling was added. If the every-10-second queries are running on the recently inserted data then partitioning by time range could substantially improve the speed of inserts, queries and purging. It's pretty easy to do, now, with pg_partman or similar but built-in auto-partitioning is coming in version 10. Fast commit to disk is a win - think SSD or RAID with BBU cache and with a relatively modest 13GB database you should be able to spec enough RAM to keep everything in memory. Cheers, Steve
Re: [PERFORM] More cores or higer frequency ?
The answer, as always, is "it depends." Can you give us an overview of your setup? The appropriate setup for small numbers of long-running analytical queries (typically faster CPUs) will be different than a setup for handling numerous simultaneous connections (typically more cores). But CPU is often not the limiting factor. With a better understanding of your needs, people here can offer suggestions for memory, storage, pooling, network, etc. Cheers, Steve On Tue, May 23, 2017 at 11:29 AM, Jarekwrote: > Hello! > > I've heavy loaded PostgreSQL server, which I want to upgrade, so it will > handle more traffic. Can I estimate what is better: more cores or > higher frequency ? I expect that pg_stat should give some tips, but > don't know where to start... > > best regards > Jarek > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Performance problems with 9.2.15
> > ...(BTW, I wonder why you are moving only to 9.2 and not something more >> recent.) >> > > Well, 9.2.15 is what comes bundled with RHEL 7, so I decided to go with > that to avoid dependency issues. But I could install a more fresh version > from scratch if that would solve my problem. > Generally my first step is to get the latest stable directly from the PostgreSQL Development Group, i.e.: yum install https://download.postgresql.org/pub/repos/yum/9.5/redhat/rhel-7-x86_64/pgdg-redhat95-9.5-2.noarch.rpm Then I know I'm starting with the latest and greatest and will get critical updates without worrying about any distribution packager delays. Cheers, Steve
Re: [PERFORM] New server: SSD/RAID recommendations?
On 07/02/2015 07:01 AM, Wes Vaske (wvaske) wrote: What about a RAID controller? Are RAID controllers even available for PCI-Express SSD drives, or do we have to stick with SATA if we need a battery-backed RAID controller? Or is software RAID sufficient for SSD drives? Quite a few of the benefits of using a hardware RAID controller are irrelevant when using modern SSDs. The great random write performance of the drives means the cache on the controller is less useful and the drives you’re considering (Intel’s enterprise grade) will have full power protection for inflight data. For what it's worth, in my most recent iteration I decided to go with the Intel Enterprise NVMe drives and no RAID. My reasoning was thus: 1. Modern SSDs are so fast that even if you had an infinitely fast RAID card you would still be severely constrained by the limits of SAS/SATA. To get the full speed advantages you have to connect directly into the bus. 2. We don't typically have redundant electronic components in our servers. Sure, we have dual power supplies and dual NICs (though generally to handle external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no backup RAID card. Intel Enterprise SSD already have power-fail protection so I don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD I'm left to wonder if placing a RAID card in front merely adds a new point of failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at you, RAID backup battery). 3. I'm streaming to an entire redundant server and doing regular backups anyway so I'm covered for availability and recovery should the SSD (or anything else in the server) fail. BTW, here's an article worth reading: https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ Cheers, Steve
Re: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) version
On 06/05/2015 10:54 AM, Sheena, Prabhjot wrote: Postgresql 9.3 Version Guys Here is the issue that I’m facing for couple of weeks now. I have table (size 7GB) *If I run this query with this specific registration id it is using the wrong execution plan and takes more than a minute to complete. Total number of rows for this registration_id is only 414 in this table* explain analyze SELECT max(last_update_date) AS last_update_date FROM btdt_responses WHERE registration_id = 8718704208 AND response != 4; QUERY PLAN -- Result (cost=2902.98..2903.01 rows=1 width=0) (actual time=86910.730..86910.731 rows=1 loops=1) InitPlan 1 (returns $0) - Limit (cost=0.57..2902.98 rows=1 width=8) (actual time=86910.725..86910.725 rows=1 loops=1) - Index Scan Backward using btdt_responses_n5 on btdt_responses (cost=0.57..6425932.41 rows=2214 width=8) (actual time=86910.723..86910.723 rows=1 loops=1) Index Cond: (last_update_date IS NOT NULL) Filter: ((response 4) AND (registration_id = 8718704208::bigint)) Rows Removed by Filter: 52145434 Total runtime: 86910.766 ms *Same query with any other registration id will come back in milli seconds * explain analyze SELECT max(last_update_date) AS last_update_date FROM btdt_responses WHERE registration_id = 8688546267 AND response != 4; QUERY PLAN -- Aggregate (cost=529.75..529.78 rows=1 width=8) (actual time=19.723..19.723 rows=1 loops=1) - Index Scan using btdt_responses_u2 on btdt_responses (cost=0.57..529.45 rows=119 width=8) (actual time=0.097..19.689 rows=72 loops=1) Index Cond: (registration_id = 8688546267::bigint) Filter: (response 4) Rows Removed by Filter: 22 Total runtime: 19.769 ms A couple initial questions: 1. Does the result change if you analyze the table and rerun the query? 2. Are there any non-default settings for statistics collection on your database? -Steve
Re: [PERFORM] Re: [GENERAL] Re: Query running slow for only one specific id. (Postgres 9.3) version
On 06/05/2015 12:28 PM, Steve Crawford wrote: On 06/05/2015 11:38 AM, Sheena, Prabhjot wrote: When I run vacuum analyze it fixes the problem but after 1 or 2 days the problem comes back Is autovacuum running and using what settings? (select name, setting from pg_settings where name ~ 'autovacuum' Konsole output or name ~ 'statistics';) Cheers, Steve P.S. The convention on the PostgreSQL mailing lists it to bottom-post, not top-post replies. Konsole outpor name ~ 'statistics';) And just to confirm, are there any table-specific overrides to the system-wide settings? Cheers, Steve
[PERFORM] Re: [GENERAL] Re: Query running slow for only one specific id. (Postgres 9.3) version
On 06/05/2015 11:38 AM, Sheena, Prabhjot wrote: When I run vacuum analyze it fixes the problem but after 1 or 2 days the problem comes back Is autovacuum running and using what settings? (select name, setting from pg_settings where name ~ 'autovacuum' Konsole output or name ~ 'statistics';) Cheers, Steve P.S. The convention on the PostgreSQL mailing lists it to bottom-post, not top-post replies. Konsole outpor name ~ 'statistics';)
[PERFORM] New server optimization advice
I will soon be migrating to some recently acquired hardware and seek input from those who have gone before. A quick overview: the dataset size is ~100GB, (~250-million tuples) with a workload that consists of about 2/3 writes, mostly single record inserts into various indexed tables, and 1/3 reads. Queries per second peak around 2,000 and our application typically demands fast response - for many of these queries the timeout is set to 2-seconds and the application moves forward and recovers later if that is exceeded. Although by count they are minimal, every hour there are dozens both of import and of analysis queries involving multiple tables and tens of thousands of records. These queries may take up to a few minutes on our current hardware. Old hardware is 4-core, 24GB RAM, battery-backed RAID-10 with four 15k drives. New hardware is quite different. 2x10-core E5-2660v3 @2.6GHz, 128GB DDR4-2133 RAM and 800GB Intel DC P3700 NVMe PCIe SSD. In essence, the dataset will fit in RAM and will be backed by exceedingly fast storage. This new machine is very different than any we've had before so any current thinking on optimization would be appreciated. Do I leave indexes as is and evaluate which ones to drop later? Any recommendations on distribution and/or kernels (and kernel tuning)? PostgreSQL tuning starting points? Whatever comes to mind. Thanks, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3
On 08/21/2014 03:51 PM, Josh Berkus wrote: On 08/21/2014 02:26 PM, Scott Marlowe wrote: I'm running almost the exact same setup in production as a spare. It has 4 of those CPUs, 256G RAM, and is currently set to use HT. Since it's a spare node I might be able to do some testing on it as well. It's running a 3.2 kernel right now. I could probably get a later model kernel on it even. You know about the IO performance issues with 3.2, yes? Were those 3.2 only and since fixed or are there issues persisting in 3.2+? The 12.04 LTS release of Ubuntu Server was 3.2 but the 14.04 is 3.13. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB sessions 100 times of DB connections
On 07/03/2014 06:59 PM, Huang, Suya wrote: BTW, I'm using the pgbadger report to check for peak connections/sessions. *From:*pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] *On Behalf Of *Huang, Suya *Sent:* Friday, July 04, 2014 11:44 AM *To:* pgsql-performance@postgresql.org *Subject:* [PERFORM] DB sessions 100 times of DB connections Hi, We've experienced a DB issue yesterday and after checked the log found that the peak sessions is 3000 while the peak DB connections is only around 30. The application is having problem of pulling data but no warnings in DB log as it doesn't exceed max_connections. How could this happen? How does sessions/connections work in Postgres? As handy as pgbadger is, I have found that its max-connections values don't pass the sniff test as it generally shows peak values that exceed the configured number of connections. I haven't dug in to find out why but could conjecture that the fact that log entries are generally truncated to the nearest second could cause this sort of thing. Unexpected connection buildup is often a side-effect of something else like a large resource-intensive query, a query holding locks that prevent the other connections' queries from completing or a variety of other things. If you are looking to solve/prevent the undescribed issue, please provide more detail. -Steve
Re: [PERFORM] Regarding Hardware Tuning
On 12/18/2013 12:12 PM, prashant Pandey wrote: Could you tell me each and every hardware parameters and OS parameters the performance depends on. I need the complete list of all the required parameters and how to extract them on Linux through system calls and files. Please it will be highly great full of you to do so. Thank you and regards. This is not even possible given the variety of hardware, OS variants, work-loads, database sizes, etc. The answer is really all of them. Those who specialize in tuning constantly run test cases to tease out clues to performance tuning and even they most likely couldn't answer this. The closest you are likely to come is to read and reread PostgreSQL High Performance which is an invaluable resource. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of complicated query
On 05/23/2013 05:21 PM, Jonathan Morra wrote: Sorry for the messy query, I'm very new to writing these complex queries. I'll try and make it easier to read by using WITH clauses. However, just to clarify, the WITH clauses only increase readability and not performance in any way, right? It depends. The planner is a tricky beast and sometimes rewriting a seeming identical query will result in a much more (or less) efficient plan. A classic case was the difference between where foo in (select bar from...)... vs. where exists (select 1 from bar where...) In an ideal world the planner would figure out that both are the same and optimize accordingly but there was a point where one was typically more efficient then it switched to the other being better for the planner. I don't recall the current state. Casting can be important - sometimes the planner needs a nudge to use an index on, say, a varchar column being compared to, perhaps, a text value or column in which case casting to the exact data-type being indexed can be a big win. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of complicated query
On 05/23/2013 10:19 AM, Jonathan Morra wrote: I am fairly new to squeezing performance out of Postgres, but I hope this mailing list can help me. I have read the instructions found at http://wiki.postgresql.org/wiki/Slow_Query_Questions and have tried to abide by them the best that I can. I am running PostgreSQL 9.1.7, compiled by Visual C++ build 1500, 64-bit on an x64 Windows 7 Professional Service Pack 1 machine with 8 GB of RAM. I'm not sure under what constraints you are operating but you will find most people on the list will recommend running live systems on Linux/Unix for a variety of reasons. CREATE TABLE reads ... ALTER TABLE reads OWNER TO postgres; To avoid future grief you should set up a user (see CREATE ROLE...) for your database that is not the cluster superuser (postgres). I assume you set up a database (see CREATE DATABASE...) for your app. The base databases (postgres, template*) should be used for administrative purposes only. ... Ultimately what I want to do is to find a sum of values for each patient. The scenario is that each patient is assigned a device and they get incremental values on their device. Since these values are incremental if a patient never switches devices, the reported value should be the last value for a patient. However, if a patient switches devices then the reported value should be the sum of the last value for each device that the patient was assigned. I'm afraid I'm a bit confused about what you are after due to switching between sum and last. It sounds like a patient is issued a device which takes a number of readings. Do you want the sum of those readings for a given patient across all devices they have been issued, the sum of readings for a specific device, the most recent reading for a specific patient regardless of which device was in use for that reading, or the sum of the most recent readings on each device issued to a specific patient? Are you looking to generate a report across all patients/devices or lookup information on a specific patient or device? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of complicated query
On 05/23/2013 10:57 AM, Jonathan Morra wrote: Ultimately I'm going to deploy this to Heroku on a Linux machine (my tests have so far indicated that Heroku is MUCH slower than my machine), but I wanted to get it fast on my local machine first. I agree with your role partitioning, however, this is only a dev machine. For the sum vs. last, the idea is that each patient is issued a device and reads are recorded. The nature of the reads are that they are incremental, so if a patient never changes devices there is no need for a sum. However, patients will be changing devices, and the patient_device table records when each patient had a given device. What I want to sum up is the total value for a patient regardless of how many times they changed devices If the reads are always incremented - that is the read you want is always the largest read - then something along these lines might work well and be more readable (untested code); -- distill out max value for each device with device_maxreads as ( select device_id, max(value) as max_read from reads group by device_id) -- then sum into a totals for each patient patient_value as ( select p.patient_id, sum(max_read) patient_value from device_maxreads d join patient_devices p on p.device_id = d.device_id group by p.patient_id ) select ...whatever... from ...your tables. join patient_value p on p.patient_id = ... ; If the values increment and decrement or patients are issued devices at overlapping times (i.e. using two devices at one time) then the query gets more complicated but with... is still a likely usable construct. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance database for backup/restore
On 05/21/2013 06:18 AM, Jeison Bedoya wrote: Hi people, i have a database with 400GB running in a server with 128Gb RAM, and 32 cores, and storage over SAN with fiberchannel, the problem is when i go to do a backup whit pg_dumpall take a lot of 5 hours, next i do a restore and take a lot of 17 hours, that is a normal time for that process in that machine? or i can do something to optimize the process of backup/restore. It would help to know what you wish to solve. I.e. setting up a test/dev server, testing disaster-recovery, deploying to a new server, etc. Also, are you dumping to a file then restoring from a file or dumping to a pipe into the restore? If you use the custom format in pg_dump *and* are dumping to a file *and* restoring via pg_restore, you can set the -j flag to somewhat fewer than the number of cores (though at 128 cores I can't say where the sweet spot might be) to allow pg_restore to run things like index recreation in parallel to help your restore speed. You can also *temporarily* disable fsync while rebuilding the database - just be sure to turn it back on afterward. Copying the files is not the recommended method for backups but may work for certain cases. One is when you can shut down the database so the whole directory is quiescent while you copy the files. Also, depending on your SAN features, you *might* be able to do a snapshot of the running PostgreSQL data directory and use that. Postgres version 9.2.2 ... ...has a nasty security issue. Upgrade. Now. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
On 03/13/2013 09:15 AM, John Lister wrote: On 13/03/2013 15:50, Greg Jaskiewicz wrote: SSDs have much shorter life then spinning drives, so what do you do when one inevitably fails in your system ? Define much shorter? I accept they have a limited no of writes, but that depends on load. You can actively monitor the drives health level... What concerns me more than wear is this: InfoWorld Article: http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715 Referenced research paper: https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault Kind of messes with the D in ACID. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow concurrent processing
On 03/11/2013 08:55 PM, Misa Simic wrote: Hi all, We have one table with list of records for processing... We loop trough that table and call one long runing function: do_the_math_for_record(record_id)...snip... but - if replace do_the_math_and_save_results with pg_sleep(1); To simulate long running function so processing each row - last 1 sec... Sequential processing last as expected 300 seconds! Concurrent processing last faster with higher number of concurrent connections - about 30 seconds with 30 connections! (much faster - and expected...) however, if we return our: do_the_math_and_save_results - we can't get better results in concurrent processing... Sleep will not have any significant impact on CPU, memory or disk use and thus is not a simulation of actual processing. All you have really shown us so far is a black box. Please provide an overview of your schemas and the type of processing you are attempting on them. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow concurrent processing
On 03/12/2013 08:06 AM, Misa Simic wrote: Thanks Steve Well, the full story is too complex - but point was - whatever blackbox does - it last 0.5 to 2secs per 1 processed record (maybe I was wrong but I thought the reason why it takes the time how much it needs to actually do the task -CPU/IO/memory whatever is not that important) - so I really don't see difference between: call web service, insert row in the table (takes 3 secs) and sleep 3 seconds - insert result in the table... if we do above task for two things sequential - it will last 6 secs...but if we do it concurentelly - it should last 3 secs... (in theory :) ) Not at all - even in theory. Sleep involves little, if any, contention for resources. Real processing does. So if a process requires 100% of available CPU then one process gets it all while many running simultaneously will have to share the available CPU resource and thus each will take longer to complete. Or, if you prefer, think of a file download. If it takes an hour to download a 1GB file it doesn't mean that you can download two 1GB files concurrently in one hour even if simulating the process by a sleep(3600) suggests it is possible. I should note, however, that depending on the resource that is limiting your speed there is often room for optimization through simultaneous processing - especially when processes are CPU bound. Since PostgreSQL associates each back-end with one CPU *core*, you can have a situation where one core is spinning and the others are more-or-less idle. In those cases you may see an improvement by increasing the number of simultaneous processes to somewhere shy of the number of cores. I was guessed somewhere is lock - but wasn't clear where/why when there are no updates - just inserts... But I haven't know that during INSERT is done row lock on refferenced tables as well - from FK columns... So I guess now it is cause of the problem... We will see how it goes with insert into unlogged tables with no FK... It will almost certainly go faster as you have eliminated integrity and data-safety. This may be acceptable to you (non-real-time crunching of data that can be reloaded from external sources or temporary processing that is ultimately written back to durable storage) but it doesn't mean you have identified the actual cause. One thing you didn't state. Is all this processing taking place in PostgreSQL? (i.e. update foo set bar = do_the_math(baz, zap, boom)) where do_the_math is a PL/pgSQL, PL/Python, ... or are external processes involved? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query with limit goes from few ms to hours
On 10/23/2012 11:33 AM, Kevin Grittner wrote: henk de wit wrote: Well, what do you know! That did work indeed. Immediately after the ANALYZE on that parent table (taking only a few seconds) a fast plan was created and the query executed in ms again. Silly me, I should have tried that earlier. Of course, if your autovacuum settings are aggressive enough, you should gernerally not need to run ANALYZE explicitly. You should double-check that autovacuum is turned on and configured at least as aggressively as the default settings, or you will probably get little surprises like this when you least expect them. The exception I'd make to Kevin's good advice is for cases when a process makes substantial statistics-altering changes to your data (bulk insert/delete/update) immediately followed by a query against the updated table(s). In those cases there is a good possibility that the statistics will not have been automatically updated before the subsequent query is planned so an explicit ANALYZE between the update and the query can be of value. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Scaling 10 million records in PostgreSQL table
On 10/08/2012 08:26 AM, Navaneethan R wrote: Hi all, I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need to access the last week data from the table. It takes huge time to process the simple query.So, i throws time out exception error. query is : select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; After a lot of time it responds 1184 as count what are the ways i have to follow to increase the performance of this query? The insertion also going parallel since the daily realtime updation. what could be the reason exactly for this lacking performace? What version of PostgreSQL? You can use select version(); and note that 9.2 has index-only scans which can result in a substantial performance boost for queries of this type. What is the structure of your table? You can use \d+ dealer_vehicle_details in psql. Have you tuned PostgreSQL in any way? If so, what? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Two identical systems, radically different performance
On 10/08/2012 02:45 PM, Craig James wrote: This is driving me crazy. A new server, virtually identical to an old one, has 50% of the performance with pgbench. I've checked everything I can think of. The setups (call the servers old and new): old: 2 x 4-core Intel Xeon E5620 new: 4 x 4-core Intel Xeon E5606 both: memory: 12 GB DDR EC Disks: 12x500GB disks (Western Digital 7200RPM SATA) 2 disks, RAID1: OS (ext4) and postgres xlog (ext2) 8 disks, RAID10: $PGDATA Exact same model of disk, same on-board cache, same RAID-card RAM size, same RAID strip-size, etc.?? Cheers, Steve
Re: [PERFORM] [ADMIN] Messed up time zones
On 08/03/2012 10:21 AM, Laszlo Nagy wrote: All the above are the exact same point in time merely stated as relevant to each location. Note that given a timestamp with time zone and a zone, PostgreSQL returns a timestamp without time zone (you know the zone since you specified it). Yes, I know the zone. But I don't know the offset from UTC. Example: template1= set timezone to 'UTC'; SET template1= select ('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? (1 row) template1= select ('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? What is the offset from UTC here? Can you tell me when it was in UTC? (1 row) template1= I can not from the given information. Can you? The given information is ambiguous as are all times during the hour of fall-back everywhere. That leaves developers with a choice: choose an interpretation or throw an error. PostgreSQL chooses to use an interpretation. It would be nice if there were a specification as to how such ambiguous data should be interpreted. Perhaps someone can point me to one and to any relevant documentation detailing how PostgreSQL handles such data. As it is, you need to be aware of how each part of your system deals with such. For example (using my local time zone) using the date command on Linux I see that date -d '2012-11-04 0130' returns Sun Nov 4 01:30:00 PDT 2012 (Still in Daylight Saving Time) But given the same input, PostgreSQL interprets it as standard time (offset -08): select '2012-11-04 0130'::timestamptz; timestamptz 2012-11-04 01:30:00-08 What is more: template1= select (('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest') is distinct from (('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'); ?column? -- f (1 row) template1= Yeah, we know what time zone it is in, but we don't know when it was, thanks a lot. :-( It would be unambiguous to store the UTC offset along with the value. But it is not how it was implemented. So you took two distinct points in time, threw away some critical information, and are surprised why they are now equal? Then don't do that. It's the equivalent of being surprised that www.microsoft.com is the same as www.apple.com when comparing them on the short hostname only. If you want to know if two points in time differ, just compare them. Spending a couple hours reading http://www.postgresql.org/docs/current/static/datatype-datetime.html will be time well spent. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. It would help if we really understood your use-case. If you want to fully reset your database to a known starting state for test runs, why not just have a base database initialized exactly as you wish, say test_base, then just drop your test database and create the new database from your template: drop database test; create database test template test_base; This should be very fast but it won't allow you to exclude individual tables. Are you interested in absolute fastest as a mind-game or is there a specific use requirement, i.e. how fast is fast enough? This is the basic starting point for tuning, hardware selection, etc. Truncate should be extremely fast but on tables that are as tiny as yours the difference may not be visible to an end-user. I just tried a delete from to empty a 10,000 record table and it took 14 milliseconds so you could do your maximum of 100 tables each containing 10-times your max number of records in less than two seconds. Regardless of the method you choose, you need to be sure that nobody is accessing the database when you reset it. The drop/create database method will, of course, require and enforce that. Truncate requires an exclusive lock so it may appear to be very slow if it is waiting to get that lock. And even if you don't have locking issues, your reluctance to wrap your reset code in transactions means that a client could be updating some table or tables whenever the reset script isn't actively working on that same table leading to unexplained weird test results. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgbouncer - massive overhead?
On 06/19/2012 09:00 AM, Strange, John W wrote: Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory I noticed a large over head for pgbouncer, has anyone seen this before? $ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 1743073 tps = 29049.88 (including connections establishing) tps = 29050.308194 (excluding connections establishing) $ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 8692204 tps = 144857.505107 (including connections establishing) tps = 144880.181341 (excluding connections establishing) processor : 39 vendor_id : GenuineIntel cpu family : 6 model : 47 model name :Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz I'm very dubious that the stats are meaningful as run. Were the above stats generated on consecutive runs on the same machine or was the test database fully returned to baseline between runs and the machine restarted to clear cache? I doubt anyone here would trust the results of a 60-second pgbench run - especially a select-only test on a server that will likely end up with virtually everything ultimately in cache. Make sure each run is started from the same state and run for 30-60 minutes. Still, you *are* adding a layer between the client and the server. Running the simplest of read-only queries against a fully-cached database on a fast many-core machine is likely to emphasize any latency introduced by pgbouncer. But it's also not a use-case for which pgbouncer is intended. If you were to add -C so each query required a new client connection a different picture would emerge. Same thing if you had 2000 client connections of which only a handful were running queries at any moment. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote: Dear Andy , Following the discussion on load average we are now investigating on some other parts of the stack (other than db). Essentially we are bumping up the limits (on appserver) so that more requests goes to the DB server. Which leads to the question: what, other than the db, runs on this machine? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] auto-vacuum vs. full table update
On 04/26/2012 12:49 PM, Craig James wrote: An update to our system means I'm going to be rewriting every row of some large tables (20 million rows by 15 columns). In a situation like this, can auto-vacuum take care of it, or should I plan on vacuum-full/reindex to clean up? If you want to reclaim the space, a vacuum-full/reindex will do it. But you are probably better off using cluster. Way faster and you get new indexes as a by-product. Both methods require an exclusive lock on the table. If you can't afford the downtime, check out pg_reorg (http://pgfoundry.org/projects/reorg/) Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
On 04/13/2012 08:04 AM, Claudio Freire wrote: ...You might want to try unlogged temp tables, which more closely resemble mssql temp tables. If they are permanent tables used for temporary storage then making them unlogged may be beneficial. But actual temporary tables *are* unlogged and attempting to create an unlogged temporary table will raise an error. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
On 04/13/2012 09:43 AM, Claudio Freire wrote: On Fri, Apr 13, 2012 at 1:36 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: If they are permanent tables used for temporary storage then making them unlogged may be beneficial. But actual temporary tables *are* unlogged and attempting to create an unlogged temporary table will raise an error. Interesting, yes, I was wondering why PG didn't make temp tables unlogged by default. Then, I guess, the docs[0] have to mention it. Especially due to the error condition. Right? [0] http://www.postgresql.org/docs/9.1/static/sql-createtable.html Well, the fact that temporary and unlogged cannot be simultaneously specified *is* documented: CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name But it would probably be worth adding a note under the description of temporary tables that they are, in fact, unlogged. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux machine aggressively clearing cache
On 03/30/2012 05:51 PM, Josh Berkus wrote: So this turned out to be a Linux kernel issue. Will document it on www.databasesoup.com. Anytime soon? About to build two PostgreSQL servers and wondering if you have uncovered a kernel version or similar issue to avoid. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice sought : new database server
On 03/04/2012 03:50 AM, Michael Friedl wrote: Hey! On 04.03.2012 10:58, Rory Campbell-Lange wrote: 1U chassis with 8 2.5 disk bays 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache 8 channel Areca ARC-1880i (PCI Express x8 card) presumably with BBU (can't see it listed at present) 2 x 300GB SAS 2.5 disks for operating system (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache RAID 1 4 x 300GB SAS 2.5 storage disks RAID 10 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules) Sorry, no answer for your question and a bit offtopic. Why do you take SAS disks for the OS and not much cheaper SATA ones? Here's Intel's (very general) take. Your OS disks may not justify SAS on performance alone but other aspects may sway you. http://www.intel.com/support/motherboards/server/sb/CS-031831.htm Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
On 02/23/2012 12:39 AM, Reuven M. Lerner wrote: Hi, everyone. ... ...at one installation, we're quickly running out of disk space. The database is already taking up about 200 GB of space, and is growing by 1 GB or so a day. I've been following the discussion of approaches and tuning for bulk deletes and suddenly wondered if you have checked a couple other basics. Do you know the source of the increases in DB size? Is it due strictly to inserted data or are there lots of updates as well? Is autovacuum running properly? Could you, due to bulk deletes and other issues, be suffering from table- or index-bloat? Heavily bloated tables/indexes will exacerbate both your disk-usage and performance problems. If possible you might try clustering your tables and see what happens to disk usage and bulk-delete performance. Clusters are generally reasonably fast - way faster than VACUUM FULL, though they could still take a while on your very large tables. As a bonus, cluster gives you shiny, new non-bloated indexes. They do require an exclusive lock and they do require sufficient disk-space to build the new, albeit smaller, table/indexes so it may not be an option if you are short on disk-space. You may be able to start by clustering your smaller tables and move toward the larger ones as you free disk-space. Be sure to run ANALYZE on any table that you have CLUSTERed. You might find it useful to make CLUSTER part of your regular maintenance. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
On 02/23/2012 12:39 AM, Reuven M. Lerner wrote: Hi, everyone... This is basically what I'm trying to execute: DELETE FROM B WHERE r_id IN (SELECT R.id FROM R, B WHERE r.end_date (NOW() - (interval '1 day' * 30)) AND r.id = b.r_id I don't recall which versions like which approach, but have you tried ...WHERE EXISTS (SELECT... instead of WHERE IN? Depending on the version of PostgreSQL, one or the other may yield a superior result. (2) I tried to grab the rows that *do* interest me, put them into a temporary table, TRUNCATE the existing table, and then copy the rows back. I only tested that with a 1 GB subset of the data, but that took longer than other options. Was the 1GB subset the part you were keeping or the part you were deleting? Which part was slow (creating the temp table or copying it back)? Try running EXPLAIN on the SELECT query that creates the temporary table and try to optimize that. Also, when copying the data back, you are probably having to deal with index and foreign keys maintenance. It will probably be faster to drop those, copy the data back then recreate them. I know you are a *nix-guy in a Windows org so your options are limited, but word-on-the-street is that for high-performance production use, install PostgreSQL on *nix. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] set autovacuum=off
On 02/23/2012 09:35 AM, Alessandro Gagliardi wrote: I should have been more clear. I virtually never delete or do updates, but I insert /a lot/. So the table does change quite a bit, but only in one direction. I was unable to disable autovacuum universally (due to the cant_change_runtime_param error) but I was able to disable it on individual tables. Still, I know this is heavy handed and sub-optimal. I tried set autovacuum_naptime='6min' but got the same 55P02 error. Should/can I set that per table? I did look at autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor but couldn't make sense out of them. (Besides, I'd probably get the same 55P02 error if I tried to change them.) See: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html The documentation has information like This parameter can only be set in the postgresql.conf file or on the server command line. that will tell you in advance which settings will fail when you attempt to set them through SQL statements. But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries. Have you identified that vacuum is actually causing a problem? If not, I'd leave it alone. The system tables have a lot of information on table vacuuming and analyzing: select relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, vacuum_count, autovacuum_count, analyze_count, autoanalyze_count from pg_stat_user_tables; Cheers, Steve
Re: [PERFORM] set autovacuum=off
On 02/23/2012 10:38 AM, Alessandro Gagliardi wrote: On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford scrawf...@pinpointresearch.com mailto:scrawf...@pinpointresearch.com wrote: The documentation has information like This parameter can only be set in the postgresql.conf file or on the server command line. that will tell you in advance which settings will fail when you attempt to set them through SQL statements. Ah. I missed that. Sorry for asking stupid questions. No problem and not stupid. With the manual running to hundreds of pages plus information on wikis and mailing-list histories spanning hundreds of thousands of messages sometimes knowing where to look is 90% of the battle. But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries. Okay, that makes more sense to me; because the stats would be changing quickly and so while vacuuming may not be necessary, analyzing would be. At the same time, I can't afford to analyze if it's causing my inserts to take over 50 ms. Something else I should add: if my selects are slow, that's annoying; but if my inserts are slow, that could be disastrous... You need to rethink things a bit. Databases can fail in all sorts of ways and can slow down during bursts of activity, data dumps, etc. You may need to investigate some form of intermediate buffering. ...Apparently the last four columns don't exist in my database. As for the first four, that is somewhat illuminating Then you are not running a current version of PostgreSQL so the first step to performance enhancement is to upgrade. (As a general rule - there are occasionally specific cases where performance decreases.) So if it's not auto-vacuuming that's making my inserts so slow, what is it? I'm batching my inserts (that didn't seem to help at all actually, but maybe cause I had already turned off synchronous_commit anyway). How are you batching them? Into a temp table that is copied to the main table? As a bunch of insert statements within a single connection (saves probably considerable time due to eliminating multiple connection setups)? With one PREPARE and multiple EXECUTE (to save repeated planning time - I'm not sure this will buy you much for simple inserts, though)? With COPY (much faster as many records are inserted in a single statement but if one fails, all fail)? And what is the 50ms limit? Is that an average? Since you are batching, it doesn't sound like you need every statement to complete in 50ms. There is always a tradeoff between overall maximum throughput and maximum allowed latency. I've gotten rid of a bunch of indices (especially those with low cardinality–that I did around the same time as disabling auto-vacuum, so that could account for the coincidental speed up). Yes, inserts require the indexes to be updated so they can slow down inserts and updates. I'm not sure what else I could be doing wrong. It's definitely better than it was a few days ago, but I still see LOG: duration: 77.315 ms statement: COMMIT every minute or two. That's a huge topic ranging from hardware (CPU speed, RAM, spindle-count, disk-type, battery-backed write caching), OS (you *are* running on some sort of *nix, right?), OS tuning, PG tuning, etc. Fortunately the biggest benefit comes from some basic tuning. I recommend you abandon this thread as it presupposes a now seemingly incorrect cause of the problem and start a new one titled something like Tuning for high insert rate where you describe the problem you want to solve. See http://wiki.postgresql.org/wiki/Guide_to_reporting_problems for a good guide to the information that will be helpful in diagnosis. Cheers, Steve
Re: [PERFORM] set autovacuum=off
On 02/23/2012 01:07 PM, Alessandro Gagliardi wrote: The second one (a bunch of insert statements within a single connection). As I mentioned above, I was going to try the temp table thing, but that wasn't fast enough. COPY might be my next attempt. insert into...; insert into...; insert into...; ... is really (ignoring statement preparation time): begin; insert into...; commit; begin; insert into...; commit; begin; insert into...; commit; It's possible that you might get a nice boost by wrapping the inserts into a transaction: begin; insert into...; insert into...; insert into...; ... commit; This only requires all that disk-intensive stuff that protects your data once at the end instead of 1000 times for you batch of 1000. COPY is even better. I just ran a quick test by restoring a table on my desktop hacking db (untuned, few years old PC, single SATA disk, modest RAM and lots of resource competition). The 22+ million rows restored in 282 seconds which is a rate somewhat north of 78,000 records/second or about 0.13ms/record. You may want to eliminate that trigger, which only seems to exist to silence errors from uniqueness violations, and copy the incoming data into a temp table then move the data with a variant of: INSERT INTO main_table (SELECT ... FROM incoming_table WHERE NOT EXISTS ((SELECT 1 from main_table WHERE ...)) Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why so slow?
On 02/17/2012 10:34 AM, Alessandro Gagliardi wrote: Comparing SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND now()::date::timestamp to SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN (now()::date - interval '8 days')::timestamp AND now()::date::timestamp the difference is 100x. ... Though I could figure it out, it would be helpful to actually specify which query is faster and to post the explain of *both* queries. But in general, it is not terribly unusual to find that rewriting a query can lead the planner to generate a superior plan. Trying and testing different ways of writing a query is a standard tuning technique. There are also version-specific issues with some versions of PostgreSQL preferring ...where foo in (select... and others preferring ...where exists (select... If you are planning to ramp up to high volumes it is also *very* important to test and tune using the size of database you plan to have on the hardware you will use in production. You cannot extrapolate from a dev database on an i486 (?!?) machine to a production server with more spindles, different RAID setup, different CPU, more cores, vastly more memory, etc. In the case of your queries, the second one eliminates a join and gives the planner an easy way to optimize using the available indexes so I'm not surprised it's faster. Note: I am guessing that your seen_its table just grows and grows but is rarely, if ever, modified. If it is basically a log-type table it will be a prime candidate for partitioning on date and queries like this will only need to access a couple relatively small child tables instead of one massive one. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] table size is bigger than expected
On 08/04/2011 11:56 AM, Jian Shi wrote: Hey, I'm a new user of PostgreSQL. I found one of my tables is taking unexpectedly large space:... I did vaccum, reindex, the size is still the same. Is there anything else that I can do? Did you try CLUSTER? A basic vacuum only identifies space as reusable, it doesn't actually shrink on-disk size. If you have workloads that update or delete a small number of tuples per transaction, the autovacuum process should keep things reasonably under control. But if you run transactions that do bulk updates or deletes, you may need to intervene. The CLUSTER statement will completely rewrite and reindex your table (and will physically reorder the table based on the selected index). Note: CLUSTER requires an exclusive lock on the table. Cheers, Steve
Re: [PERFORM] Speeding up loops in pl/pgsql function
On 05/26/2011 05:36 AM, Merlin Moncure wrote: ... got it: select decode(regexp_replace('141142143', '([0-9][0-9][0-9])', $q$\\\1$q$ , 'g'), 'escape'); decode abc (1 row) merlin Nice. A word of warning, in 9.0 this returns a hex string: select decode(regexp_replace('141142143', '([0-9][0-9][0-9])', $q$\\\1$q$ , 'g'), 'escape'); decode -- \x616263 See http://www.postgresql.org/docs/9.0/static/release-9-0.html: E.5.2.3. Data Types bytea output now appears in hex format by default (Peter Eisentraut) The server parameter bytea_output can be used to select the traditional output format if needed for compatibility. Another wrinkle, the function I wrote sort of ignored the bytea issue by using text. But text is subject to character-encoding (for both good and bad) while bytea is not so the ultimate solution will depend on whether the input string is the octal representation of an un-encoded sequence of bytes or represents a string of ASCII/UTF-8/whatever... encoded text. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
On 05/25/2011 11:45 AM, Reuven M. Lerner wrote: Hi, Alex. You wrote: Have you tried something like: SELECT encode(regexp_replace('141142143', '(\d{3})', '\\\1', 'g')::bytea, 'escape'); Hmm, forgot about regexp_replace. It might do the trick, but without a full-blown eval that I can run on the replacement side, it'll be a bit more challenging. But that's a good direction to consider, for sure. The function given didn't work exactly as written for me but it is on the right track. See if this works for you (input validation is left as an exercise for the reader...:)): create or replace function octal_string_to_text(someoctal text) returns text as $$ declare binstring text; begin execute 'select E''' || regexp_replace($1, E'(\\d{3})', E'\\1', 'g') || into binstring; return binstring; end $$ language plpgsql; Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dumpall affecting performance
On 02/15/2011 10:41 AM, Mark Mikulec wrote: Hello, I was under the impression that pg_dumpall didn't affect database performance when dumping while the db is live. However I have evidence to the contrary now - queries that are run during the pg_dumpall time take 10 to a 100 times longer to execute than normal while pg_dumpall is running. The strange thing is that this started after my database grew by about 25% after a large influx of data due to user load. I'm wonder if there is a tipping point or a config setting I need to change now that the db is larger that is causing all this to happen. Don't know where that impression came from. It is true that you can continue to *use* your database normally while running a dump but you are reading the entire database and either transmitting it over the network or writing it to a local drive so it shouldn't be surprising that performance is impacted. There are tipping points - one big one is when you move from having all your data in RAM to needing to read disk. And it can be a whopper. If all your information, through PG or OS caching is in RAM then your dumps may run very quickly. The moment you cross the point that things don't quite fit you can see a sharp decline. Consider a least-recently-used algorithm and a very simplistic scenario. You read the start data. It isn't cached so you go to disk *and* you put those blocks into cache pushing others than you would need later out of cache. This continues and you potentially end up having to read everything from disk plus incur the overhead of checking and updating the cache. Meanwhile, the data you needed for your query may have been pushed out of cache so there is more contention for disk. Admittedly an over-simplified example but you see the problem. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BBU Cache vs. spindles
On 10/20/2010 09:45 PM, Scott Marlowe wrote: On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drakej...@commandprompt.com wrote: On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote: Ben Chobot wrote: On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: I'm weighing options for a new server. In addition to PostgreSQL, this machine will handle some modest Samba and Rsync load. I will have enough RAM so the virtually all disk-read activity will be cached. The average PostgreSQL read activity will be modest - a mix of single-record and fairly large (reporting) result-sets. Writes will be modest as well but will come in brief (1-5 second) bursts of individual inserts. The rate of insert requests will hit 100-200/second for those brief bursts. So... Am I likely to be better off putting $$$ toward battery-backup on the RAID or toward adding a second RAID-set and splitting off the WAL traffic? Or something else? A BBU is, what, $100 or so? Adding one seems a no-brainer to me. Dedicated WAL spindles are nice and all, but they're still spinning media. Raid card cache is wy faster, and while it's best at bursty writes, it sounds like bursty writes are precisely what you have. Totally agree! BBU first, more spindles second. Agreed. note that while you can get incredible burst performance from a battery backed cache, due to both caching and writing out of order, once the throughput begins to saturate at the speed of the disk array, the bbu cache is now only re-ordering really, as it will eventually fill up faster than the disks can take the writes, and you'll settle in at some percentage of your max tps you get for a short benchmark run. It's vitally important that once you put a BBU cache in place, you run a very long running transactional test (pgbench is a simple one to start with) that floods the io subsystem so you see what you're average throughput is with the WAL and data store getting flooded. I know on my system pgbench runs of a few minutes can be 3 or 4 times faster than runs that last for the better part of an hour. Thanks for all the replies. This is what I suspected but since I can't just buy one of everything to try, I wanted a sanity-check before spending the $$$. I am not too worried about saturating the controller cache as the current much lower spec machine can handle the sustained load just fine and the bursts are typically only 1-3 seconds long spaced a minute or more apart. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] BBU Cache vs. spindles
I'm weighing options for a new server. In addition to PostgreSQL, this machine will handle some modest Samba and Rsync load. I will have enough RAM so the virtually all disk-read activity will be cached. The average PostgreSQL read activity will be modest - a mix of single-record and fairly large (reporting) result-sets. Writes will be modest as well but will come in brief (1-5 second) bursts of individual inserts. The rate of insert requests will hit 100-200/second for those brief bursts. So... Am I likely to be better off putting $$$ toward battery-backup on the RAID or toward adding a second RAID-set and splitting off the WAL traffic? Or something else? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional giant spikes in CPU load
...Can you correlate the spikes with anything like that? Not that I know of. Just regular web traffic. On the backup server these events happen occasionally even when there is little or no web traffic, and nobody logged in doing maintenance. What, if anything, are you logging in the PostgreSQL logs? Anything interesting, there? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
Madison Kelly wrote: Hi all, I've got a fairly small DB... It slows down over time and I can't seem to find a way to get the performance to return without doing a dump and reload of the database... Some questions: Is autovacuum running? This is the most likely suspect. If not, things will bloat and you won't be getting appropriate analyze runs. Speaking of which, what happens if you just run analyze? And as long as you are dumping and reloading anyway, how about version upgrading for bug reduction, performance improvement, and cool new features. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
Madison Kelly wrote: Steve Crawford wrote: Madison Kelly wrote: Hi all, I've got a fairly small DB... It slows down over time and I can't seem to find a way to get the performance to return without doing a dump and reload of the database... Yup, I even tried manually running 'VACUUM FULL' and it didn't help. That's because VACUUM reclaims space (er, actually marks space that is available for reuse) while ANALYZE refreshes the statistics that the planner uses. As for upgrading; a) I am trying to find a way around the dump/reload. I am doing it as a last resort only. Agreed - it is the last resort. But since you were doing it I was just suggesting that you could combine with a upgrade and get more benefits. b) I want to keep the version in CentOS' repo. Depends on reasoning. If you absolutely require a fully vanilla particular version of CentOS for some reason then fine. But telling CentOS to use the PostgreSQL Development Group pre-built releases for CentOS is a very easy one-time process (it's what I do on my CentOS machines). From memory (but read to end for warnings): Download the setup rpm: wget http://yum.pgsqlrpms.org/reporpms/8.4/pgdg-centos-8.4-1.noarch.rpm Install it: rpm -i pgdg-centos-8.4-1.noarch.rpm Note: This does not install PostgreSQL - it just updates your repository list to add the repository containing PostgreSQL binaries. Now make sure that you get your updates from PostgreSQL, not CentOS: Edit /etc/yum.repos.d/CentOS-Base.repo and add exclude=postgresql* to the [base] and [updates] sections. Now you can use yum as normal and you will get PostgreSQL 8.4 and updates thereto rather than using 8.1. BUT!! I have only done this on new installs. I have not tried it on an already running machine. As always, test first on a dev machine and do your pre-update dump using the new version of the pg_dump utilities, not the old ones. Cheers, Steve I'd not tried simply updating the stats via ANALYZE... I'll keep an eye on performance and if it starts to slip again, I will run ANALYZE and see if that helps. If there is a way to run ANALYZE against a query that I am missing, please let me know. If you stick with 8.1x, you may want to edit postgresql.conf and change default_statistics_target to 100 if it is still at the previous default of 10. 100 is the new default setting as testing indicates that it tends to yield better query plans with minimal additional overhead. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
Madison Kelly wrote: I wanted to get ahead of the problem, hence my question here. :) I've set this to run at night ('iwt' being the DB in question): su postgres -c psql iwt -c \VACUUM ANALYZE VERBOSE\ And why not the vacuumdb command?: su postgres -c vacuumdb --analyze --verbose iwt But this is duct-tape and bailing-wire. You REALLY need to make sure that autovacuum is running - you are likely to have much better results with less pain. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
Madison Kelly wrote: You are right, autovacuum is not running after all. From your comment, I am wondering if you'd recommend I turn it on or not?... I see you are considering an upgrade but FWIW on your 8.1 instance, my remaining 8.1 server has been running for years with it on. Read up on it at: http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM Basically you need to turn on some stats stuff so autovacuum can determine when to run (in postgresql.conf): stats_start_collector = on stats_row_level = on And you need to enable autovacuum (in postgresql.conf): autovacuum = on autovacuum_naptime = 300# time between autovacuum runs, in secs Then you can tune it if you need to but at least it will be looking for things that are vacuumworthy every 5 minutes. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Greg Smith wrote: Jochen Erwied wrote: - Promise Technology Supertrak ES4650 + additional BBU - Adaptec RAID 5405 SGL/256 SATA/SAS + additional BBU - Adaptec RAID 5405Z SGL/512 SATA/SAS I've never seen a Promise controller that had a Linux driver you would want to rely on under any circumstances...Easier to just buy from a company that has always cared about good Linux support, like 3ware. +1 I haven't tried Promise recently, but last time I did I determined that they got the name because they Promise the Linux driver for your card will be available real-soon-now. Actually got strung along for a couple months before calling my supplier and telling him to swap it out for a 3ware. The 3ware just works. I currently have a couple dozen Linux servers, including some PostgreSQL machines, running the 3ware cards. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database size growing continously
Peter Meszaros wrote: Hi All, I use postgresql 8.3.7 as a huge queue. There is a very simple table with six columns and two indices, and about 6 million records are written into it in every day continously commited every 10 seconds from 8 clients. The table stores approximately 120 million records, because a cron job daily deletes those ones are older than 20 day. Autovacuum is on and every settings is the factory default except some unrelated ones (listen address, authorization). But my database is growing, characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, or even 0!!!)... Can you try running against 8.4.1? I believe there are a number of improvements that should help in your case. For one thing, the max_fsm_pages and max_fsm_relation knobs are gone - it happens automagically. I believe there are some substantial improvements in space reuse along with numerous improvements not directly related to your question. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database size growing continously
Peter Meszaros wrote: Hi All, I use postgresql 8.3.7 as a huge queue. There is a very simple table with six columns and two indices, and about 6 million records are written into it in every day continously commited every 10 seconds from 8 clients. The table stores approximately 120 million records, because a cron job daily deletes those ones are older than 20 day. You may be an ideal candidate for table partitioning - this is frequently used for rotating log table maintenance. Use a parent table and 20 child tables. Create a new child every day and drop the 20-day-old table. Table drops are far faster and lower-impact than delete-from a 120-million row table. Index-bloat is limited to one-day of inserts and will be eliminated in 20-days. No deletes means no vacuum requirement on the affected tables. Single tables are limited to about 6-million records. A clever backup scheme can ignore prior-days' static child-tables (and you could keep historical-data-dumps off-line for later use if desired). Read up on it here: http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
My additional comments: t...@fuzzy.cz wrote: ... For future upgrade, what is the basic steps? 0. Create test database - work out bugs and performance issues before going live. 1. create database ...cluster. You only need to create the individual database if the options you select for the dump do not create the database(s). 2. dump the data from the old database ...using the dump tools from the *new* version. With several cores, you might want to consider using the binary dump options in pg_dump if you want to use the new parallel restore feature in pg_restore with a possible dramatic increase in restore speed (benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup so it's worth the effort). The manual suggests that setting --jobs to the number of cores on the server is a good first approximation. See the -Fc options on pg_dump and the --jobs option in pg_restore for details. Cheers, Steve
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner wrote: Steve Crawford scrawf...@pinpointresearch.com wrote: benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup I'm curious what sort of data in what environment showed that ratio. Was going on memory from a presentation I watched. Reports on the web have shown anything from a 3x increase using 8 cores to other non-detailed reports of up to 8x improvement. If you have one big table, don't expect much if any improvement. If you have lots of smaller tables/indexes then parallel restore will probably benefit you. This is all based on the not-atypical assumption that your restore will be CPU bound. I don't think parallel restore will be much use beyond the point you hit IO limits. Cheers, Steve
Re: [PERFORM] query slow only after reboot
Wei Yan wrote: Hi: Our queries are extremely slow only after db server reboot, not after restart postgres db only. The difference is about 10 mins vs. 8 secs. Not acceptable. I have browsed around , set the postgres db parameters as suggested. Still the same. Any suggestion on how to tackle the problem? What OS, amount of RAM, DB size...? If the queries get progressively faster as you use the system then slow again after a reboot, my initial guess would be that you are getting more and more disk-cache hits the longer you use the system. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] speeding up table creation
Rainer Mager wrote: I have an interesting performance improvement need. As part of the automatic test suite we run in our development environment, we re-initialize our test database a number of times in order to ensure it is clean before running a test. We currently do this by dropping the public schema and then recreating our tables (roughly 30 tables total). After that we do normal inserts, etc, but never with very much data. My question is, what settings can we tweak to improve performance is this scenario? Specifically, if there was a way to tell Postgres to keep all operations in memory, that would probably be ideal. What is the test part? In other words, do you start with a known initial database with all empty tables then run the tests or is part of the test itself the creation of those tables? How much data is in the initial database if the tables aren't empty. Creating 30 empty tables should take a trivial amount of time. Also, are there other schemas than public? A couple ideas/comments: You cannot keep the data in memory (that is, you can't disable writing to the disk). But since you don't care about data loss, you could turn off fsync in postgresql.conf. From a test perspective you should be fine - it will only be an issue in the event of a crash and then you can just restart with a fresh load. Remember, however, that any performance benchmarks won't translate to production use (of course they don't translate if you are using ramdisk anyway). Note that the system tables are updated whenever you add/delete/modify tables. Make sure they are being vacuumed or your performance will slowly degrade. My approach is to create a database exactly as you want it to be at the start of your tests (fully vacuumed and all) and then use it as a template to be used to create the testdb each time. Then you can just (with appropriate connection options) run dropdb thetestdb followed by createdb --template thetestdbtemplate thetestdb which is substantially faster than deleting and recreating tables - especially if they contain much data. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 2GB or not 2GB
Josh Berkus wrote: Folks, Subsequent to my presentation of the new annotated.conf at pgCon last week,... Available online yet? At?... Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
A B wrote: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Go back to step zero - gather information that would be helpful in giving advice. For starters: - What hardware do you currently have? - What OS and version of PG? - How big is the database? - What is the nature of the workload (small queries or data-mining, how many simultaneous clients, transaction rate, etc.)? - Is PG sharing the machine with other workloads? Then edit your postgresql.conf file to gather data (see http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html). With stat collection enabled, you can often find some low-hanging fruit like indexes that aren't used (look in pg_stat_user_indexes) - sometime because the query didn't case something in the where-clause correctly. Also look at http://www.postgresql.org/docs/8.3/interactive/runtime-config-logging.html - especially the log_min_duration_statement setting to find long-running queries. You will probably need to try different settings and watch the log. Logging impacts performance so don't just set to log everything and forget. You need to play with it. Don't discount step 2 - you may find you can rewrite one inefficient but frequent query. Or add a useful index on the server. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] update 600000 rows
[EMAIL PROTECTED] wrote: Hello i have a python script to update 60 rows to one table from a csv file in my postgres database and it takes me 5 hours to do the transaction... Let's see if I guessed correctly. Your Python script is stepping through a 600,000 row file and updating information in a table (of unknown rows/columns) by making 600,000 individual updates all wrapped in a big transaction. If correct, that means you are doing 600,000/(3,600 * 5) = 33 queries/second. If this is correct, I'd first investigate simply loading the csv data into a temporary table, creating appropriate indexes, and running a single query to update your other table. First when i run htop i see that the memory used is never more than 150 MB. I don't understand in this case why setting shmall and shmmax kernel's parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of the transaction a lot compared to a shmall and shmax in (only) 2 GB ?! Are you saying that you did this and the performance improved or you are wondering if it would? The script is run with only one transaction and pause by moment to let the time to postgres to write data to disk. This doesn't make sense. If the transaction completes successfully then PostgreSQL has committed the data to disk (unless you have done something non-standard and not recommended like turning off fsync). If you are adding pauses between updates, don't do that - it will only slow you down. If the full transaction doesn't complete, all updates will be thrown away anyway and if it does complete then they were committed. If the data were writed at the end of the transaction will be the perfomance better ? i wan't that in production data regulary writed to disk to prevent loosinf of data but it there any interest to write temporary data in disk in a middle of a transaction ??? See above. Actual disk IO is handled by the server. PostgreSQL is good at the D in ACID. If your transaction completes, the data has been written to disk. Guaranteed. Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware for PostgreSQL
Ketema wrote: I am trying to build a very Robust DB server that will support 1000+ concurrent users (all ready have seen max of 237 no pooling being used). I have read so many articles now that I am just saturated. I have a general idea but would like feedback from others. Describe a bit better. 1,000 users or 1,000 simultaneous connections? Ie, do you have a front-end where someone logs on, gets a connection, and keeps it for the duration or is it a web-type app where each request might connect-query-disconnect? If the latter, are your connections persistent? How many queries/second do you expect? How complex are the queries (retrieve single record or data-mining)? Read-only or lots of updates? Do the read-queries need to be done every time or are they candidates for caching? RAM? The more the merrier right? Generally, true. But once you reach the point that everything can fit in RAM, more is just wasted $$$. And, according to my reading, there are cases where more RAM can hurt - basically if you manage to create a situation where your queries are large enough to just flush cache so you don't benefit from caching but are hurt by spending time checking cache for the data. Who has built the biggest baddest Pg server out there and what do you use? Not me. Someone just showed me live system monitoring data on one of his several PG machines. That one was clocking multi-thousand TPS on a server (Sun??) with 128GB RAM. That much RAM makes top look amusing. Several of the social-networking sites are using PG - generally spreading load over several (dozens) of servers. They also make heavy use of pooling and caching - think dedicated memcached servers offering a combined pool of several TB RAM. For pooling, pgbouncer seems to have a good reputation. Tests on my current production server show it shaving a few ms off every connect-query-disconnect cycle. Connects are fairly fast in PG but that delay becomes a significant issue under heavy load. Test pooling carefully, though. If you blindly run everything through your pooler instead of just selected apps, you can end up with unexpected problems when one client changes a backend setting like set statement_timeout to 5. If the next client assigned to that backend connection runs a long-duration analysis query, it is likely to fail. Cheers, Steve ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware for PostgreSQL
Magnus Hagander wrote: Ow Mun Heng wrote: You're likely better off (performance-wise) putting it on the same disk as the database itself if that one has better RAID, for example. I'm thinking along the lines of since nothing much writes to the OS Disk, I should(keyword) be safe. Unless it's *always* in the cache (not so likely), reads will also move the heads... And if you aren't mounted noatime, reads will also cause a write. Cheers, Steve ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] [GENERAL] PostgreSQL Performance Tuning
Shohab Abdullah wrote: Dear, We are facing performance tuning problem while using PostgreSQL Database over the network on a linux OS. Our Database consists of more than 500 tables with an average of 10K records per table with an average of 20 users accessing the database simultaneously over the network. Each table has indexes and we are querying the database using Hibernate. The biggest problem is while insertion, updating and fetching of records, ie the database performance is very slow. It take a long time to respond in the above scenario. Please provide me with the tuning of the database. I am attaching my *postgresql.conf* file for the reference of our current configuration Have you changed _anything_ from the defaults? The defaults are set so PG will run on as many installations as practical. They are not set for performance - that is specific to your equipment, your data, and how you need to handle the data. Assuming the record sizes aren't huge, that's not a very large data set nor number of users. Look at these for starters: http://www.varlena.com/GeneralBits/Tidbits/perf.html http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html You might try setting the logging parameters to log queries longer than x (where x is in milliseconds - you will have to decide the appropriate value for too long) and start looking into those first. Make sure that you are running analyze if it is not being run by autovacuum. Use EXPLAIN your query to see how the query is being planned - as a first-pass assume that on any reasonably sized table the words sequential scan means fix this. Note that you may have to cast variables in a query to match the variable in an index in order for the planner to figure out that it can use the index. Read the guidelines then take an educated stab at some settings and see how they work - other than turning off fsync, there's not much in postgresql.conf that will put your data at risk. Cheers, Steve ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly