[PERFORM] Query logging time, not values
HiI have a database and ~150 clients non-stop writing to this database quite big pieces of text. I have a performacne problem so I tried to increase log level, so I could see which queries take most time. My postgresql.conf (Log section) is: log_destination = 'stderr' logging_collector = on log_rotation_size = 1GB log_connections = on log_line_prefix = '%m %p %u %d %r ' log_lock_waits = on log_statement = 'ddl' log_temp_files = 4096 And I got the query times + query parameters values, which makes my log extremly big. How can I set the logging parameters to write query + duration time but without parameter values? Thanks Ludwik -- Ludwik Dyląg
Re: [PERFORM] Best suiting OS
On Mon, 5 Oct 2009, Jean-Michel Pouré wrote: Go for Debian: * It is a free community, very active. * It is guaranteed to be upgradable. * Very easy to administrate via apt-get. http://www.debian.org/News/2009/20091007 If you like Debian, but want to use FreeBSD, now you can have both. Choose Debian SID or testing, which will provide the latest fixes. I disagree. If you want a stable server, choose Debian stable, which was last released in February. It gets all the relevant fixes, just like any other supported stable distribution - just no new major versions of software. The next stable release is scheduled for the new year. If you want the latest and greatest, then you can use Debian testing. Matthew -- The surest protection against temptation is cowardice. -- Mark Twain -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] position in DDL of columns used in indexes
In other SQL engines that I've used, it is recommended that the columns that are used in various indexes be placed at the beginning of a row since at some point (depending on the engine and/or pagesize) wide rows could end up on other pages. From a performance standpoint on large tables this makes a big difference. Is the same true with Postgres. Should I try and make sure that my indexes fit in the first 8192 bytes? Bes Regards -- Michael Gould, Managing Partner Intermodal Software Solutions, LLC 904.226.0978 904.592.5250 fax
Re: [PERFORM] position in DDL of columns used in indexes
On Thu, 8 Oct 2009, Michael Gould wrote: In other SQL engines that I've used, it is recommended that the columns that are used in various indexes be placed at the beginning of a row since at some point (depending on the engine and/or pagesize) wide rows could end up on other pages. From a performance standpoint on large tables this makes a big difference. Is the same true with Postgres. Should I try and make sure that my indexes fit in the first 8192 bytes? Interesting question. AFAIK (I'm not an expert, someone correct me): Postgres does not split rows across multiple pages, so this should never be a concern. When a row is too big for a page, Postgres will select the larger of the columns from the row and compress them. If that fails to bring the row size down, then Postgres will select the larger columns and remove them to a separate storage area, and leave just the references in the actual row. Therefore, the order of columns should not matter. Moreover, whether a row is used in an index should not make any difference. The index stores the values too, right? Postgres will look up in the index, and then fetch the rows, in two separate operations. Matthew -- Let's say I go into a field and I hear "baa baa baa". Now, how do I work out whether that was "baa" followed by "baa baa", or if it was "baa baa" followed by "baa"? - Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] position in DDL of columns used in indexes
Matthew Wakeling writes: > Postgres does not split rows across multiple pages, so this should never > be a concern. When a row is too big for a page, Postgres will select the > larger of the columns from the row and compress them. If that fails to > bring the row size down, then Postgres will select the larger columns and > remove them to a separate storage area, and leave just the references in > the actual row. Therefore, the order of columns should not matter. > Moreover, whether a row is used in an index should not make any > difference. The index stores the values too, right? Postgres will look up > in the index, and then fetch the rows, in two separate operations. Yeah. There can be a small performance advantage to putting the more frequently accessed columns first (so you don't have to skip over other columns to get to them). This has nothing directly to do with whether they are indexed, though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitioned Tables and ORDER BY
We have been using partitioning for some time with great success. Up until now our usage has not included ordering and now that we are trying to use an ORDER BY against an indexed column a rather significant shortcoming seems to be kicking in. Parent table (have cut all but 4 columns to make it easier to post about) CREATE TABLE people ( person_id character varying(36) NOT NULL, list_id character varying(36) NOT NULL, first_name character varying(255), last_name character varying(255), CONSTRAINT people_pkey (person_id, list_id) ); A partition looks like this: CREATE TABLE people_list1 ( -- inherited columns omitted CONSTRAINT people_list1_list_id_check CHECK (list_id::text = 'the_unique_list_id'::text) ) INHERITS (people); Both the parent and the children have indexes on all 4 columns mentioned above. The parent table is completely empty. If I run this query, directly against the partition, performance is excellent: select * from people_list1 order by first_name asc limit 50; The explain analyze output: Limit (cost=0.00..4.97 rows=50 width=34315) (actual time=49.616..522.464 rows=50 loops=1) -> Index Scan using idx_people_first_name_list1 on people_list1 (cost=0.00..849746.98 rows=8544854 width=34315) (actual time=49.614..522.424 rows=50 loops=1) Total runtime: 522.773 ms If I run this query, against the parent, performance is terrible: select * from people where list_id = 'the_unique_list_id' order by first_name asc limit 50; The explain analyze output: Limit (cost=726844.88..726845.01 rows=50 width=37739) (actual time=149864.869..149864.884 rows=50 loops=1) -> Sort (cost=726844.88..748207.02 rows=8544855 width=37739) (actual time=149864.868..149864.876 rows=50 loops=1) Sort Key: public.people.first_name Sort Method: top-N heapsort Memory: 50kB -> Result (cost=0.00..442990.94 rows=8544855 width=37739) (actual time=0.081..125837.332 rows=8545138 loops=1) -> Append (cost=0.00..442990.94 rows=8544855 width=37739) (actual time=0.079..03.743 rows=8545138 loops=1) -> Index Scan using people_pkey on people (cost=0.00..4.27 rows=1 width=37739) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: ((list_id)::text = 'the_unique_list_id'::text) -> Seq Scan on people_list1 people (cost=0.00..442986.67 rows=8544854 width=34315) (actual time=0.068..109781.308 rows=8545138 loops=1) Filter: ((list_id)::text = 'the_unique_list_id'::text) Total runtime: 149865.411 ms Just to show that partitions are setup correctly, this query also has excellent performance: select * from people where list_id = 'the_unique_list_id' and first_name = 'JOE'; Here is the explain analyze for that: Result (cost=0.00..963.76 rows=482 width=37739) (actual time=6.031..25.394 rows=2319 loops=1) -> Append (cost=0.00..963.76 rows=482 width=37739) (actual time=6.029..21.340 rows=2319 loops=1) -> Index Scan using idx_people_first_name on people (cost=0.00..4.27 rows=1 width=37739) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: ((first_name)::text = 'JOE'::text) Filter: ((list_id)::text = 'the_unique_list_id'::text) -> Bitmap Heap Scan on people_list1 people (cost=8.47..959.49 rows=481 width=34315) (actual time=6.018..20.968 rows=2319 loops=1) Recheck Cond: ((first_name)::text = 'JOE'::text) Filter: ((list_id)::text = 'the_unique_list_id'::text) -> Bitmap Index Scan on idx_people_first_name_list1 (cost=0.00..8.35 rows=481 width=0) (actual time=5.566..5.566 rows=2319 loops=1) Index Cond: ((first_name)::text = 'JOE'::text) Total runtime: 25.991 ms This is Postgres 8.3.7 on the 2.6.28 kernel with constraint_exclusion on. Our partitions are in the 8 - 15 million row range. I realize one option is to hit the partition directly instead of hitting the parent table with the check constraint in the WHERE clause, but up until now we have been able to avoid needing partition-awareness in our code. Perhaps we have hit upon something that will require breaking that cleanliness but wanted to see if there were any workarounds. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] concurrent reindex issues
Hey all, it's been a bit however I'm running into some issues with my concurrent index Always get this error during a concurrent index. *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) ERROR: deadlock detected* *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) DETAIL: Process 20939 waits for ShareLock on virtual transaction 16/43817381; blocked by process 1874.* *Process 1874 waits for ExclusiveLock on relation 17428 of database 16384; blocked by process 20939.* *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) STATEMENT: CREATE INDEX CONCURRENTLY prc_temp_idx_impressions_log_date2 ON tracking.impressions USING btree (log_date) TABLESPACE trackingindexspace* This happens all the time, so it's not the occasional deadlock. We even turned off all applications that insert into the database and it still fails. Tried restarting the database as well. Also when looking at active connections there is no process 1874. So I'm at a lost, this first started happening in my slave DB (Slon replication), but it is now happening on my master which is odd. Any idea? postgres 8.3.4 Linux system.
Re: [PERFORM] dump time increase by 1h with new kernel
Hi Everyone On Fri, Oct 02, 2009 at 12:58:12PM -0700, Justin Pryzby wrote: > When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump > duration increased by 20% from 5 hours to 6. My first attempt at On Sat, Oct 03, 2009 at 11:31:11PM -0600, Scott Marlowe wrote: > between these two to tell them exactly which one(s) causes the > regression(s). That and how you compiled them or where they came from These are both ubuntu kernels, and there's none in-between available from their repository for testing. I could compile myself, but it would have to include all the ubuntu patches (apparmor in particular).. On Thu, Oct 08, 2009 at 01:14:52AM -0400, Greg Smith wrote: > report this in kernel land is "is it still slow on 2.6.[last > stable|head]?". I could try *newer* kernels, but not from newer ubuntu releases. This machine is running ubuntu 8.04 with select packages from 8.10. However it's running postgres 8.2, which isn't included with either of those releases.. I tried dumping with 8.3 pg_dump, which had only minimal effect. > If you can try both kernel versions, the other thing you really > should do is collect data from "vmstat 1" during the pg_dump period. > It would help narrow what area is slower. I have sar running on the machine, and came up with this: 07 and 30 are days of the month, 7 is last night, 30 is from September. pg_dump starts around 9pm. tail -15 gets us to about 10pm. On sep 30, the machine was running 2.6.24 and the dump ran until after 2am. Since we rebooted, it runs .27 and the dump runs until after 4am. So the last column shows higher rate values for every metric on the 30th (under .24) except for intr. for a in user system iowait cswch tps rtps wtps intr; do for b in 07 30; do eval t$b='`sadf /var/log/sysstat/sa$b -- -A |grep -wi "$a" |tail -15 |awk "{sum+=\\$NF}END{print sum/NR}"`'; done; printf "%-6s %4.4s %4.4s %5.5s\n" $a $t30 $t07 `calc $t30/$t07`; done s30 o07 s30/o07 user 13.9 6.85 ~2.03 system 0.56 0.37 ~1.52 iowait 0.61 0.52 ~1.16 cswch 873. 672. ~1.29 intr 121. 396. ~0.30 tps412. 346. ~1.19 rtps 147. 143. ~1.02 wtps 264. 202. ~1.30 Not sure if sar can provide other data included by vmstat: IO merged in/out, {,soft}irq ticks? Thanks, Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dump time increase by 1h with new kernel
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote: > Hi Everyone Did your scheduler change between the kernel versions? > Not sure if sar can provide other data included by vmstat: IO merged > in/out, {,soft}irq ticks? > > Thanks, > Justin > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] concurrent reindex issues
Tory M Blue writes: > *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) ERROR: > deadlock detected* > *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) > DETAIL: Process 20939 waits for ShareLock on virtual transaction > 16/43817381; blocked by process 1874.* > *Process 1874 waits for ExclusiveLock on relation 17428 of database > 16384; blocked by process 20939.* > *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) > STATEMENT: CREATE INDEX CONCURRENTLY prc_temp_idx_impressions_log_date2 ON > tracking.impressions USING btree (log_date) TABLESPACE trackingindexspace* Hmm. I suppose that 20939 was running the CREATE INDEX CONCURRENTLY, and what it's trying to do with the ShareLock on a VXID is wait for some other transaction to terminate so that it can safely complete the index creation (because the index might be invalid from the point of view of that other transaction). But the other transaction is waiting for ExclusiveLock on what I assume is the table being indexed (did you check what relation that OID is?). AFAIK there are no built-in operations that take ExclusiveLock on user tables, which means that 1874 would have had to be issuing an explicit LOCK TABLE tracking.impressions IN EXCLUSIVE MODE command. Perhaps that will help you track down what it was. > So I'm at a lost, this first started happening in my slave DB (Slon > replication), but it is now happening on my master which is odd. I wouldn't be too surprised if the LOCK is coming from some Slony operation or other. You might want to ask the slony hackers about it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] concurrent reindex issues
On Thu, Oct 8, 2009 at 11:55 AM, Tom Lane wrote: > I wouldn't be too surprised if the LOCK is coming from some Slony > operation or other. You might want to ask the slony hackers about it. I've had issues like this. Shutting down the slon daemons before running such commands would usually allow them to finish, at the cost of replication falling behind while it runs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dump time increase by 1h with new kernel
On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: > On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote: > > Hi Everyone > Did your scheduler change between the kernel versions? No, it's deadline for both. Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Explain Analyze returns faster than psql or JDBC calls.
Hi there We are runing Postgres 8.3.7 on a We have a problem with Explain Analyze that we haven't seen before. we run an Explain Analyze on a query. Nested Loop (cost=1256.32..2097.31 rows=198 width=1120) (actual time=12.958..20.846 rows=494 loops=1) -> HashAggregate (cost=1256.32..1256.92 rows=198 width=4) (actual time=12.936..13.720 rows=494 loops=1) -> Limit (cost=1255.53..1255.63 rows=198 width=20) (actual time=9.841..11.901 rows=500 loops=1) -> Sort (cost=1255.53..1255.63 rows=198 width=20) (actual time=9.838..10.588 rows=500 loops=1) Sort Key: ((abs((ri_metadata.latitude - 44.0247062::double precision)) + abs((ri_metadata.longitude - (-88.5426136)::double precision Sort Method: quicksort Memory: 52kB -> Bitmap Heap Scan on ri_metadata (cost=385.54..1254.02 rows=198 width=20) (actual time=4.638..8.558 rows=595 loops=1) Recheck Cond: ((latitude > 43.6687062::double precision) AND (latitude < 44.3807062::double precision) AND (longitude > (-88.8986136)::double precision) AND (longitude < (-88.1866136)::double precision)) Filter: (category_id = ANY ('{3,274,4,1,2,275,7,278,8,277,5,280,6,279,11,9,10,15,285,16,14,19,18,17,24,23,21,266,25,32,31,30,29,40,34,48,41,44,313,54,53,55,50,52,62,61,63,302,58,57,59,71,341,69,67,338,68,337,65,66,339,79,352,77,78,74,85,83,324,81,334,335,336,372,376,373,374,363,122,368,127,128,356,355,360,359,118,358,357,140,139,138,137,143,142,141,132,130,129,135,134,191,185,186,187,188,183,202,200,193,219}'::integer[])) -> Bitmap Index Scan on ri_metadata_latitude_longitude_category_id_idx (cost=0.00..385.53 rows=462 width=0) (actual time=4.533..4.533 rows=1316 loops=1) Index Cond: ((latitude > 43.6687062::double precision) AND (latitude < 44.3807062::double precision) AND (longitude > (-88.8986136)::double precision) AND (longitude < (-88.1866136)::double precision)) -> Index Scan using ri_result_result_id_idx on ri_result (cost=0.00..4.24 rows=1 width=1120) (actual time=0.006..0.008 rows=1 loops=494) Index Cond: (ri_result.result_id = ri_metadata.result_id) Total runtime: 21.658 ms (14 rows) It takes only *21* ms. Then we run the same query on psql (on localhost) with timing on select * from ri_result where result_id in (select result_id from ri_metadata WHERE category_id IN (3,274,4,1,2,275,7,278,8,277,5,280,6,279,11,9,10,15,285,16,14,19,18,17,24,23,21,266,25,32,31,30,29,40,34,48,41,44,313,54,53 ,55,50,52,62,61,63,302,58,57,59,71,341,69,67,338,68,337,65,66,339,79,352,77,78,74,85,83,324,81,334,335,336,372,376,373,374,363,122,368,127, 128,356,355,360,359,118,358,357,140,139,138,137,143,142,141,132,130,129,135,134,191,185,186,187,188,183,202,200,193,219) AND latitude >43.66870619995 AND latitude <44.3807062 AND longitude>-88.8986135999 AND longitude<-88.1866136 order by abs(latitude - 44.0247062)+abs(longitude - -88.5426136) limit 500) ; Time: 2611.491 ms The longer runtime from psql console is corroborated when we do same queries via JDBC How can explain-analyze return significantly much faster than other means? (I understand that some data is returned from the query to either client end on psql or to a resultset in jdbc as opposed to a small chunk of data in the form of a plan but still..,) By the way, I run the explain analyze first and then run the query to avoid any caching. Our system : Ubuntu Ubuntu 8.04.3 64 bit, 8GB RAM ,2 GHz single core, running a vm on an esx server. the database is read-only. ri_metadata has 1473864 rows, 200MB, Table "public.ri_metadata" Column |Type | Modifiers +-+-- ri_metadata_id | integer | not null default nextval('ri_metadata_ri_metadata_id_seq'::regclass) result_id | integer | not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null category_id| integer | not null category_name | text| not null location_id| integer | longitude | double precision| latitude | double precision| city | text| state | text| zip| text| street_address | text| Indexes: "ri_metadata_pkey" PRIMARY KEY, btree (ri_metadata_id) "ri_metadata_category_id_idx" btree (category_id) "ri_metadata_category_id_state" btree (category_id, state) "ri_metadata_end_time_idx" btree (end_time) "ri_metadata_latitude_idx" btree (latitude) "ri_metadata_latitude_longitud
Re: [PERFORM] dump time increase by 1h with new kernel
Justin T Pryzby wrote: > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: >> Did your scheduler change between the kernel versions? > No, it's deadline for both. How about write barriers? I had a kernel upgrade which turned them on for xfs, with unfortunate performance impacts. The xfs docs explicitly recommend disabling it if you have a battery backed cache in your RAID controller. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dump time increase by 1h with new kernel
On Thu, Oct 08, 2009 at 03:37:39PM -0500, Kevin Grittner wrote: > Justin T Pryzby wrote: > > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote: > >> Did your scheduler change between the kernel versions? > > No, it's deadline for both. > > How about write barriers? I had a kernel upgrade which turned them on Doesn't seem to be that either :( [ 55.120073] Filesystem "dm-0": Disabling barriers, trial barrier write failed crb2-db2(254, 0) /dev/mapper/crb2-db2 on /media/database Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Explain Analyze returns faster than psql or JDBC calls.
G B writes: > How can explain-analyze return significantly much faster than other means? If the returned data is large or takes a lot of work to convert to text, this could happen, since EXPLAIN ANALYZE doesn't bother to format the tuples for display. How big are those bytea columns, on average? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance