[PERFORM] Query logging time, not values

2009-10-08 Thread Ludwik Dylag
HiI have a database and ~150 clients non-stop writing to this database quite
big pieces of text.
I have a performacne problem so I tried to increase log level, so I could
see which queries take most time.
My postgresql.conf (Log section) is:

log_destination = 'stderr'
logging_collector = on
log_rotation_size = 1GB
log_connections = on
log_line_prefix = '%m %p %u %d %r '
log_lock_waits = on
log_statement = 'ddl'
log_temp_files = 4096

And I got the query times + query parameters values, which makes my log
extremly big.
How can I set the logging parameters to write query + duration time but
without parameter values?

Thanks
Ludwik





-- 
Ludwik Dyląg


Re: [PERFORM] Best suiting OS

2009-10-08 Thread Matthew Wakeling

On Mon, 5 Oct 2009, Jean-Michel Pouré wrote:

Go for Debian:
* It is a free community, very active.
* It is guaranteed to be upgradable.
* Very easy to administrate via apt-get.


http://www.debian.org/News/2009/20091007

If you like Debian, but want to use FreeBSD, now you can have both.


Choose Debian SID or testing, which will provide the latest fixes.


I disagree. If you want a stable server, choose Debian stable, which was 
last released in February. It gets all the relevant fixes, just like any 
other supported stable distribution - just no new major versions of 
software. The next stable release is scheduled for the new year.


If you want the latest and greatest, then you can use Debian testing.

Matthew

--
The surest protection against temptation is cowardice.
 -- Mark Twain
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] position in DDL of columns used in indexes

2009-10-08 Thread Michael Gould
In other SQL engines that I've used, it is recommended that the columns that
are used in various indexes be placed at the beginning of a row since at
some point (depending on the engine and/or pagesize) wide rows could end up
on other pages.  From a performance standpoint on large tables this makes a
big difference.  Is the same true with Postgres.  Should I try and make sure
that my indexes fit in the first 8192 bytes?


 


Bes Regards


--
Michael Gould, Managing Partner
Intermodal Software Solutions, LLC
904.226.0978
904.592.5250 fax



Re: [PERFORM] position in DDL of columns used in indexes

2009-10-08 Thread Matthew Wakeling

On Thu, 8 Oct 2009, Michael Gould wrote:

In other SQL engines that I've used, it is recommended that the columns that 
are used in
various indexes be placed at the beginning of a row since at some point 
(depending on the
engine and/or pagesize) wide rows could end up on other pages.  From a 
performance
standpoint on large tables this makes a big difference.  Is the same true with 
Postgres. 
Should I try and make sure that my indexes fit in the first 8192 bytes?


Interesting question. AFAIK (I'm not an expert, someone correct me):

Postgres does not split rows across multiple pages, so this should never 
be a concern. When a row is too big for a page, Postgres will select the 
larger of the columns from the row and compress them. If that fails to 
bring the row size down, then Postgres will select the larger columns and 
remove them to a separate storage area, and leave just the references in 
the actual row. Therefore, the order of columns should not matter.


Moreover, whether a row is used in an index should not make any 
difference. The index stores the values too, right? Postgres will look up 
in the index, and then fetch the rows, in two separate operations.


Matthew

--
Let's say I go into a field and I hear "baa baa baa". Now, how do I work 
out whether that was "baa" followed by "baa baa", or if it was "baa baa"

followed by "baa"?
- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] position in DDL of columns used in indexes

2009-10-08 Thread Tom Lane
Matthew Wakeling  writes:
> Postgres does not split rows across multiple pages, so this should never 
> be a concern. When a row is too big for a page, Postgres will select the 
> larger of the columns from the row and compress them. If that fails to 
> bring the row size down, then Postgres will select the larger columns and 
> remove them to a separate storage area, and leave just the references in 
> the actual row. Therefore, the order of columns should not matter.

> Moreover, whether a row is used in an index should not make any 
> difference. The index stores the values too, right? Postgres will look up 
> in the index, and then fetch the rows, in two separate operations.

Yeah.  There can be a small performance advantage to putting the more
frequently accessed columns first (so you don't have to skip over other
columns to get to them).  This has nothing directly to do with whether
they are indexed, though.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Partitioned Tables and ORDER BY

2009-10-08 Thread Joe Uhl
We have been using partitioning for some time with great success.  Up 
until now our usage has not included ordering and now that we are trying 
to use an ORDER BY against an indexed column a rather significant 
shortcoming seems to be kicking in.


Parent table (have cut all but 4 columns to make it easier to post about)
CREATE TABLE people
(
  person_id character varying(36) NOT NULL,
  list_id character varying(36) NOT NULL,
  first_name character varying(255),
  last_name character varying(255),
  CONSTRAINT people_pkey (person_id, list_id)
);

A partition looks like this:
CREATE TABLE people_list1
(
  -- inherited columns omitted
  CONSTRAINT people_list1_list_id_check CHECK (list_id::text = 
'the_unique_list_id'::text)

)
INHERITS (people);

Both the parent and the children have indexes on all 4 columns mentioned 
above.  The parent table is completely empty.


If I run this query, directly against the partition, performance is 
excellent:

select * from people_list1 order by first_name asc limit 50;

The explain analyze output:
 Limit  (cost=0.00..4.97 rows=50 width=34315) (actual 
time=49.616..522.464 rows=50 loops=1)
   ->  Index Scan using idx_people_first_name_list1 on people_list1  
(cost=0.00..849746.98 rows=8544854 width=34315) (actual 
time=49.614..522.424 rows=50 loops=1)

 Total runtime: 522.773 ms

If I run this query, against the parent, performance is terrible:
select * from people where list_id = 'the_unique_list_id' order by 
first_name asc limit 50;


The explain analyze output:
 Limit  (cost=726844.88..726845.01 rows=50 width=37739) (actual 
time=149864.869..149864.884 rows=50 loops=1)
   ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739) 
(actual time=149864.868..149864.876 rows=50 loops=1)

 Sort Key: public.people.first_name
 Sort Method:  top-N heapsort  Memory: 50kB
 ->  Result  (cost=0.00..442990.94 rows=8544855 width=37739) 
(actual time=0.081..125837.332 rows=8545138 loops=1)
   ->  Append  (cost=0.00..442990.94 rows=8544855 
width=37739) (actual time=0.079..03.743 rows=8545138 loops=1)
 ->  Index Scan using people_pkey on people  
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.008..0.008 rows=0 
loops=1)
   Index Cond: ((list_id)::text = 
'the_unique_list_id'::text)
 ->  Seq Scan on people_list1 people  
(cost=0.00..442986.67 rows=8544854 width=34315) (actual 
time=0.068..109781.308 rows=8545138 loops=1)
   Filter: ((list_id)::text = 
'the_unique_list_id'::text)

 Total runtime: 149865.411 ms

Just to show that partitions are setup correctly, this query also has 
excellent performance:
select * from people where list_id = 'the_unique_list_id' and first_name 
= 'JOE';


Here is the explain analyze for that:
 Result  (cost=0.00..963.76 rows=482 width=37739) (actual 
time=6.031..25.394 rows=2319 loops=1)
   ->  Append  (cost=0.00..963.76 rows=482 width=37739) (actual 
time=6.029..21.340 rows=2319 loops=1)
 ->  Index Scan using idx_people_first_name on people  
(cost=0.00..4.27 rows=1 width=37739) (actual time=0.010..0.010 rows=0 
loops=1)

   Index Cond: ((first_name)::text = 'JOE'::text)
   Filter: ((list_id)::text = 'the_unique_list_id'::text)
 ->  Bitmap Heap Scan on people_list1 people  
(cost=8.47..959.49 rows=481 width=34315) (actual time=6.018..20.968 
rows=2319 loops=1)

   Recheck Cond: ((first_name)::text = 'JOE'::text)
   Filter: ((list_id)::text = 'the_unique_list_id'::text)
   ->  Bitmap Index Scan on idx_people_first_name_list1  
(cost=0.00..8.35 rows=481 width=0) (actual time=5.566..5.566 rows=2319 
loops=1)

 Index Cond: ((first_name)::text = 'JOE'::text)
 Total runtime: 25.991 ms


This is Postgres 8.3.7 on the 2.6.28 kernel with constraint_exclusion 
on.  Our partitions are in the 8 - 15 million row range.


I realize one option is to hit the partition directly instead of hitting 
the parent table with the check constraint in the WHERE clause, but up 
until now we have been able to avoid needing partition-awareness in our 
code.  Perhaps we have hit upon something that will require breaking 
that cleanliness but wanted to see if there were any workarounds.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] concurrent reindex issues

2009-10-08 Thread Tory M Blue
Hey all, it's been a bit however I'm running into some issues with my
concurrent index


Always get this error during a concurrent index.



*2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) ERROR:
deadlock detected*

*2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706)
DETAIL:  Process 20939 waits for ShareLock on virtual transaction
16/43817381; blocked by process 1874.*

*Process 1874 waits for ExclusiveLock on relation 17428 of database
16384; blocked by process 20939.*

*2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706)
STATEMENT:  CREATE INDEX CONCURRENTLY  prc_temp_idx_impressions_log_date2 ON
tracking.impressions USING btree (log_date) TABLESPACE trackingindexspace*



This happens all the time, so it's not the occasional deadlock. We even
turned off all applications that insert into the database and it still
fails.

Tried restarting the database as well.

Also when looking at active connections there is no process 1874.


So I'm at a lost, this first started happening in my slave DB (Slon
replication), but it is now happening on my master which is odd.


Any idea?

postgres 8.3.4

Linux system.


Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-08 Thread Justin T Pryzby
Hi Everyone

On Fri, Oct 02, 2009 at 12:58:12PM -0700, Justin Pryzby wrote:
> When we upgraded from linux-2.6.24 to linux-2.6.27, our pg_dump
> duration increased by 20% from 5 hours to 6.  My first attempt at

On Sat, Oct 03, 2009 at 11:31:11PM -0600, Scott Marlowe wrote:
> between these two to tell them exactly which one(s) causes the
> regression(s).  That and how you compiled them or where they came from
These are both ubuntu kernels, and there's none in-between available
from their repository for testing.  I could compile myself, but it
would have to include all the ubuntu patches (apparmor in
particular)..

On Thu, Oct 08, 2009 at 01:14:52AM -0400, Greg Smith wrote:
> report this in kernel land is "is it still slow on 2.6.[last
> stable|head]?".
I could try *newer* kernels, but not from newer ubuntu releases.  This
machine is running ubuntu 8.04 with select packages from 8.10.
However it's running postgres 8.2, which isn't included with either of
those releases..  I tried dumping with 8.3 pg_dump, which had only
minimal effect.

> If you can try both kernel versions, the other thing you really
> should do is collect data from "vmstat 1" during the pg_dump period.
> It would help narrow what area is slower.
I have sar running on the machine, and came up with this:

07 and 30 are days of the month, 7 is last night, 30 is from
September.  pg_dump starts around 9pm.  tail -15 gets us to about
10pm.  On sep 30, the machine was running 2.6.24 and the dump ran
until after 2am.  Since we rebooted, it runs .27 and the dump runs
until after 4am.  So the last column shows higher rate values for
every metric on the 30th (under .24) except for intr.

for a in user system iowait cswch tps rtps wtps intr; do for b in 07 30; do 
eval t$b='`sadf /var/log/sysstat/sa$b -- -A |grep -wi "$a" |tail -15 |awk 
"{sum+=\\$NF}END{print sum/NR}"`'; done; printf "%-6s %4.4s %4.4s %5.5s\n" $a 
$t30 $t07 `calc $t30/$t07`; done
   s30  o07  s30/o07
user   13.9 6.85 ~2.03
system 0.56 0.37 ~1.52
iowait 0.61 0.52 ~1.16
cswch  873. 672. ~1.29
intr   121. 396. ~0.30
tps412. 346. ~1.19
rtps   147. 143. ~1.02
wtps   264. 202. ~1.30

Not sure if sar can provide other data included by vmstat: IO merged
in/out, {,soft}irq ticks?

Thanks,
Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-08 Thread Joshua D. Drake
On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote:
> Hi Everyone


Did your scheduler change between the kernel versions? 

> Not sure if sar can provide other data included by vmstat: IO merged
> in/out, {,soft}irq ticks?
> 
> Thanks,
> Justin
> 
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent reindex issues

2009-10-08 Thread Tom Lane
Tory M Blue  writes:
> *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706) ERROR:
> deadlock detected*

> *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706)
> DETAIL:  Process 20939 waits for ShareLock on virtual transaction
> 16/43817381; blocked by process 1874.*

> *Process 1874 waits for ExclusiveLock on relation 17428 of database
> 16384; blocked by process 20939.*

> *2009-10-07 22:18:02 PDT admissionclsdb postgres 10.13.200.70(46706)
> STATEMENT:  CREATE INDEX CONCURRENTLY  prc_temp_idx_impressions_log_date2 ON
> tracking.impressions USING btree (log_date) TABLESPACE trackingindexspace*

Hmm.  I suppose that 20939 was running the CREATE INDEX CONCURRENTLY,
and what it's trying to do with the ShareLock on a VXID is wait for some
other transaction to terminate so that it can safely complete the index
creation (because the index might be invalid from the point of view of
that other transaction).  But the other transaction is waiting for
ExclusiveLock on what I assume is the table being indexed (did you check
what relation that OID is?).

AFAIK there are no built-in operations that take ExclusiveLock on user
tables, which means that 1874 would have had to be issuing an explicit
LOCK TABLE tracking.impressions IN EXCLUSIVE MODE
command.  Perhaps that will help you track down what it was.

> So I'm at a lost, this first started happening in my slave DB (Slon
> replication), but it is now happening on my master which is odd.

I wouldn't be too surprised if the LOCK is coming from some Slony
operation or other.  You might want to ask the slony hackers about it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent reindex issues

2009-10-08 Thread Scott Marlowe
On Thu, Oct 8, 2009 at 11:55 AM, Tom Lane  wrote:
> I wouldn't be too surprised if the LOCK is coming from some Slony
> operation or other.  You might want to ask the slony hackers about it.

I've had issues like this.  Shutting down the slon daemons before
running such commands would usually allow them to finish, at the cost
of replication falling behind while it runs.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-08 Thread Justin T Pryzby
On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
> On Thu, 2009-10-08 at 10:44 -0700, Justin T Pryzby wrote:
> > Hi Everyone
> Did your scheduler change between the kernel versions? 
No, it's deadline for both.

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Explain Analyze returns faster than psql or JDBC calls.

2009-10-08 Thread G B
Hi there

We are runing Postgres 8.3.7 on a
We have a problem with Explain Analyze that we haven't seen before.

  we run an Explain Analyze on a query.

 Nested Loop  (cost=1256.32..2097.31 rows=198 width=1120) (actual
time=12.958..20.846 rows=494 loops=1)
   ->  HashAggregate  (cost=1256.32..1256.92 rows=198 width=4) (actual
time=12.936..13.720 rows=494 loops=1)
 ->  Limit  (cost=1255.53..1255.63 rows=198 width=20) (actual
time=9.841..11.901 rows=500 loops=1)
   ->  Sort  (cost=1255.53..1255.63 rows=198 width=20) (actual
time=9.838..10.588 rows=500 loops=1)
 Sort Key: ((abs((ri_metadata.latitude -
44.0247062::double precision)) + abs((ri_metadata.longitude -
(-88.5426136)::double precision
 Sort Method:  quicksort  Memory: 52kB
 ->  Bitmap Heap Scan on ri_metadata
(cost=385.54..1254.02 rows=198 width=20) (actual time=4.638..8.558 rows=595
loops=1)
   Recheck Cond: ((latitude > 43.6687062::double
precision) AND (latitude < 44.3807062::double precision) AND (longitude >
(-88.8986136)::double precision) AND (longitude < (-88.1866136)::double
precision))
   Filter: (category_id = ANY
('{3,274,4,1,2,275,7,278,8,277,5,280,6,279,11,9,10,15,285,16,14,19,18,17,24,23,21,266,25,32,31,30,29,40,34,48,41,44,313,54,53,55,50,52,62,61,63,302,58,57,59,71,341,69,67,338,68,337,65,66,339,79,352,77,78,74,85,83,324,81,334,335,336,372,376,373,374,363,122,368,127,128,356,355,360,359,118,358,357,140,139,138,137,143,142,141,132,130,129,135,134,191,185,186,187,188,183,202,200,193,219}'::integer[]))
   ->  Bitmap Index Scan on
ri_metadata_latitude_longitude_category_id_idx  (cost=0.00..385.53 rows=462
width=0) (actual time=4.533..4.533 rows=1316 loops=1)
 Index Cond: ((latitude > 43.6687062::double
precision) AND (latitude < 44.3807062::double precision) AND (longitude >
(-88.8986136)::double precision) AND (longitude < (-88.1866136)::double
precision))
   ->  Index Scan using ri_result_result_id_idx on ri_result
(cost=0.00..4.24 rows=1 width=1120) (actual time=0.006..0.008 rows=1
loops=494)
 Index Cond: (ri_result.result_id = ri_metadata.result_id)
 Total runtime: 21.658 ms
(14 rows)

It takes only *21* ms. Then we run the same query on psql (on localhost)
with timing on

 select * from ri_result where result_id in
 (select result_id from ri_metadata
 WHERE category_id IN
(3,274,4,1,2,275,7,278,8,277,5,280,6,279,11,9,10,15,285,16,14,19,18,17,24,23,21,266,25,32,31,30,29,40,34,48,41,44,313,54,53
 
,55,50,52,62,61,63,302,58,57,59,71,341,69,67,338,68,337,65,66,339,79,352,77,78,74,85,83,324,81,334,335,336,372,376,373,374,363,122,368,127,
 
128,356,355,360,359,118,358,357,140,139,138,137,143,142,141,132,130,129,135,134,191,185,186,187,188,183,202,200,193,219)
AND  latitude >43.66870619995 AND latitude <44.3807062 AND
longitude>-88.8986135999 AND longitude<-88.1866136
   order by  abs(latitude  - 44.0247062)+abs(longitude - -88.5426136) limit
500)
;
Time: 2611.491 ms

The longer runtime from psql console is corroborated when we do same
queries via JDBC
How can  explain-analyze return significantly much faster than other means?
(I understand that some data is returned from the query to either client end
on psql or to a resultset in jdbc as opposed to a small  chunk of data in
the form of a plan but still..,)
By the way, I run the explain analyze first and then run the query  to avoid
any caching.

Our system : Ubuntu Ubuntu 8.04.3 64 bit,  8GB RAM ,2 GHz single core,
running a vm on an esx server. the database is read-only.



ri_metadata has 1473864 rows, 200MB,
 Table "public.ri_metadata"
 Column |Type |
Modifiers
+-+--
 ri_metadata_id | integer | not null default
nextval('ri_metadata_ri_metadata_id_seq'::regclass)
 result_id  | integer | not null
 start_time | timestamp without time zone | not null
 end_time   | timestamp without time zone | not null
 category_id| integer | not null
 category_name  | text| not null
 location_id| integer |
 longitude  | double precision|
 latitude   | double precision|
 city   | text|
 state  | text|
 zip| text|
 street_address | text|
Indexes:
"ri_metadata_pkey" PRIMARY KEY, btree (ri_metadata_id)
"ri_metadata_category_id_idx" btree (category_id)
"ri_metadata_category_id_state" btree (category_id, state)
"ri_metadata_end_time_idx" btree (end_time)
"ri_metadata_latitude_idx" btree (latitude)
"ri_metadata_latitude_longitud

Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-08 Thread Kevin Grittner
Justin T Pryzby  wrote:
> On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
>> Did your scheduler change between the kernel versions? 
> No, it's deadline for both.
 
How about write barriers?  I had a kernel upgrade which turned them on
for xfs, with unfortunate performance impacts.  The xfs docs
explicitly recommend disabling it if you have a battery backed cache
in your RAID controller.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dump time increase by 1h with new kernel

2009-10-08 Thread Justin T Pryzby
On Thu, Oct 08, 2009 at 03:37:39PM -0500, Kevin Grittner wrote:
> Justin T Pryzby  wrote:
> > On Thu, Oct 08, 2009 at 10:49:37AM -0700, Joshua D. Drake wrote:
> >> Did your scheduler change between the kernel versions? 
> > No, it's deadline for both.
>  
> How about write barriers?  I had a kernel upgrade which turned them on
Doesn't seem to be that either :(

[   55.120073] Filesystem "dm-0": Disabling barriers, trial barrier write failed
crb2-db2(254, 0)
/dev/mapper/crb2-db2 on /media/database

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Explain Analyze returns faster than psql or JDBC calls.

2009-10-08 Thread Tom Lane
G B  writes:
> How can  explain-analyze return significantly much faster than other means?

If the returned data is large or takes a lot of work to convert to text,
this could happen, since EXPLAIN ANALYZE doesn't bother to format the
tuples for display.  How big are those bytea columns, on average?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance