[PERFORM] How to use indexes for GROUP BY

2011-01-24 Thread Dimi Paun
Hi folks,

I have a table like so:

create table tagRecord (
uid varchar(60) primary key,
[bunch of other fields]
location  varchar(32),
creationTStimestamp
);
create index idx_tagdata_loc_creationTS on tagRecord(location, creationTS);

The number of individual values in location is small (e.g. 2).

I want to simply get the latest creationTS for each location,
but that seems to result in a full table scan:

tts_server_db=# explain analyze select location, max(creationTS) from tagrecord 
group by location;
   QUERY PLAN   
 
-
 HashAggregate  (cost=5330.53..5330.55 rows=2 width=18) (actual 
time=286.161..286.165 rows=3 loops=1)
   -  Seq Scan on tagrecord  (cost=0.00..4771.35 rows=111835 width=18) (actual 
time=0.059..119.828 rows=111739 loops=1)
 Total runtime: 286.222 ms


Now I have the idx_tagdata_loc_creationTS, and it seemed to me that
it should be able to use it to quickly figure out the max creationTS
for each location.

Any way I can make this more efficient?

BTW, I am using postgresql-server-8.1.22-1.el5_5.1

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to use indexes for GROUP BY

2011-01-24 Thread Dimi Paun
On Mon, 2011-01-24 at 12:33 -0700, Scott Marlowe wrote:
 As another poster observed, you're running an ancient version of pgsql
 from a performance perspective.  Upgrading to 8.4 or 9.0 would make a
 huge difference in overall performance, not just with one or two
 queries. 

Thanks for the trips.

I'll try to first upgrade, and I'll report back if that doesn't help :)

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] What is postmaster doing?

2010-10-20 Thread Dimi Paun
Folks,

I am running into a problem with the postmaster: from time to time, it
runs for a long time. E.g., from top:

23425 postgres  20   0 22008  10m  10m R 99.9  0.5  21:45.87 postmaster

I'd like to figure out what it is doing. How can I figure out what
statement causes the problem? 

is there a way I can log all SQL statements to a file, together with the
time it took to execute them?

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is postmaster doing?

2010-10-20 Thread Dimi Paun
On Wed, 2010-10-20 at 15:24 -0400, Reid Thompson wrote:
 This is controlled by settings in the postgresql.conf file.
 see the appropriate doc pagevv  for your version
 http://www.postgresql.org/docs/8.2/static/runtime-config-logging.html

Thanks for the link Reid, this seems to be doing what I need.

Too bad I couldn't figure out what was going on when I was experiencing
the high load, but now that I have the logging enabled, it shouldn't be
a problem to figure things out.

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is postmaster doing?

2010-10-20 Thread Dimi Paun
On Wed, 2010-10-20 at 16:26 -0400, Tom Lane wrote:
  It seems strange that the postmaster is eating 99% cpu. Is there a
  chance that it's flooded with connection attempts?

Maybe, I'll try to figure that one out next time it happens.

 It's probably a backend process, not the postmaster --- I suspect the
 OP is using a version of ps that only tells you the original process
 name by default.  ps auxww or ps -ef (depending on platform)
 is likely to be more informative.  Looking into pg_stat_activity,
 even more so.

I'm running CentOS 5.5, using procps-3.2.7-16.el5. I cannot check
more at this point as postmaster seems to have finished whatever it
was doing, but I'll try to investigate better next time.

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is postmaster doing?

2010-10-20 Thread Dimi Paun
On Wed, 2010-10-20 at 16:45 -0400, Tom Lane wrote:
 Hm, what ps options did you use?  I'm having a hard time reproducing
 your display format on Fedora 13 (procps-3.2.8-7.fc13.x86_64).

Sorry, it wasn't a ps output, it was a line from top(1).
My to header says:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND


23425 postgres  20   0 22008  10m  10m R 99.9  0.5  21:45.87 postmaster

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] DISTINCT vs. GROUP BY

2010-02-09 Thread Dimi Paun
From what I've read on the net, these should be very similar,
and should generate equivalent plans, in such cases:

SELECT DISTINCT x FROM mytable
SELECT x FROM mytable GROUP BY x

However, in my case (postgresql-server-8.1.18-2.el5_4.1),
they generated different results with quite different
execution times (73ms vs 40ms for DISTINCT and GROUP BY
respectively):

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where 
clientRmaInId = 'CPC-RMA-00110' group by userdata;
 QUERY PLAN 


 HashAggregate  (cost=775.68..775.69 rows=1 width=146) (actual 
time=40.058..40.058 rows=0 loops=1)
   -  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) 
(actual time=40.055..40.055 rows=0 loops=1)
 Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 -  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 
rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)
   Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 40.121 ms

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where 
clientRmaInId = 'CPC-RMA-00109';
QUERY PLAN  
  
--
 Unique  (cost=786.63..788.06 rows=1 width=146) (actual time=73.018..73.018 
rows=0 loops=1)
   -  Sort  (cost=786.63..787.34 rows=286 width=146) (actual 
time=73.016..73.016 rows=0 loops=1)
 Sort Key: userdata
 -  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 
width=146) (actual time=72.940..72.940 rows=0 loops=1)
   Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
   -  Bitmap Index Scan on idx_tagdata_clientrmainid  
(cost=0.00..4.00 rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)
 Index Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
 Total runtime: 73.144 ms

What gives?

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DISTINCT vs. GROUP BY

2010-02-09 Thread Dimi Paun
On Tue, 2010-02-09 at 17:38 -0500, Tom Lane wrote:
 The results certainly ought to be the same (although perhaps not with
 the same ordering) --- if they aren't, please provide a reproducible
 test case.

The results are the same, this is not a problem.

 As for efficiency, though, 8.1 didn't understand how to use hash
 aggregation for DISTINCT.  Less-obsolete versions do know how to do
 that.

Indeed, this seem to be the issue:

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where 
clientRmaInId = 'CPC-RMA-00110' group by userdata;
   QUERY PLAN   


 HashAggregate  (cost=253.34..253.50 rows=16 width=15) (actual 
time=0.094..0.094 rows=0 loops=1)
   -  Index Scan using idx_tagdata_clientrmainid on tagrecord  
(cost=0.00..252.85 rows=195 width=15) (actual time=0.091..0.091 rows=0 loops=1)
 Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 0.146 ms
(4 rows)

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where 
clientRmaInId = 'CPC-RMA-00110';
  QUERY 
PLAN  
--
 Unique  (cost=260.27..261.25 rows=16 width=15) (actual time=0.115..0.115 
rows=0 loops=1)
   -  Sort  (cost=260.27..260.76 rows=195 width=15) (actual time=0.113..0.113 
rows=0 loops=1)
 Sort Key: userdata
 -  Index Scan using idx_tagdata_clientrmainid on tagrecord  
(cost=0.00..252.85 rows=195 width=15) (actual time=0.105..0.105 rows=0 loops=1)
   Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 0.151 ms
(6 rows)

For now we are stuck with 8.1, so the easiest fix for us is to use GROUP BY.
Since this is fixed in later versions, I guess there's not much to see here... 
:)

Thanks for the quick reply!

-- 
Dimi Paun d...@lattica.com
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun
Hi folks,

I have a simple table that keeps track of a user's access history.
It has a a few fields, but the important ones are:
  - ownerId: the user's ID, a int8
  - accessTS: the timestamp of the record

The table right now is small, only 1942 records.
The user I test with (10015) has only 89 entries.

What I want is to get the last 5 accesses of a user:
   SELECT * FROM triphistory WHERE ownerId = 10015 ORDER BY accessTS DESC LIMIT 
5
 
If I create a composite index *and* analyze:
   create index IDX_TRIP_HISTORY_OWNER_ACCESS_TS on tripHistory (ownerId, 
accessTS);
   ANALYZE triphistory;

It takes 0.091s (!):
perpedes_db=# EXPLAIN ANALYZE SELECT * FROM triphistory WHERE ownerId = 10015 
ORDER BY accessTS DESC LIMIT 5;

QUERY PLAN  
  
--
 Limit  (cost=0.00..7.99 rows=5 width=106) (actual time=0.024..0.042 rows=5 
loops=1)
   -  Index Scan Backward using idx_trip_history_owner_access_ts on 
triphistory  (cost=0.00..142.20 rows=89 width=106) (actual time=0.021..0.034 
rows=5 loops=1)
 Index Cond: (ownerid = 10015)
 Total runtime: 0.091 ms
(4 rows)


BTW, this is after several runs of the query, shouldn't all this stuff be in 
memory?

This is not a fast machine, but this seems rather excessive, no? 

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 09:53 -0700, Scott Marlowe wrote:
 
 Are you saying it's excessive you need the compound query?  Cause
 that's running in 91microseconds as pointed out by Alan.

Of course, my bad. I read that as 91ms (blush/).

Confusion came from the fact that pgadminIII reports the query
taking 20-40ms, so I read the 0.091 as seconds not ms.

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 10:16 -0700, Scott Marlowe wrote:
 Ahhh.  Keep in mind that if you just run the query, pgadminIII will
 tell you how long it took to run AND return all the data across the
 network, so it will definitely take longer then.  But most of that's
 network io wait so it's not a real issue unless you're saturating your
 network.

But that is brutal -- there's no way it can take 20ms for a request 
across an unloaded network.

Moreover, I got something like this:

pgadminIII | pgsql
w/o index: 45ms  0.620ms
w/ index   20ms  0.091ms

How now I try to replicate, and I get 45ms in both cases. This is
very misleading...

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance on simple query

2008-11-17 Thread Dimi Paun

On Mon, 2008-11-17 at 10:40 -0700, Scott Marlowe wrote:
 I'm guessing a fair bit of that time is pgadminIII prettifying the
 output for you, etc.  I.e. it's not all transfer time.  Hard to say
 without hooking some kind of profiler in pgadminIII.  Is psql running
 local and pgadminIII remotely?  Or are they both remote?  If both psql
 and pgadminIII are remote (i.e. same basic circumstances) then it's
 got to be a difference in the client causing the extra time.  OR is
 this output of explain analyze?

With \timing on I get basically the same output (local vs remote)
in psql (0.668ms vs. 0.760ms). More like it.


WTH is pgadminIII reporting?!?

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Severe performance problems for simple query

2008-04-07 Thread Dimi Paun
# 0-1 credits
 
 # - Background writer -
 
@@ -141,13 +141,13 @@
#   fsync_writethrough
#   open_sync
 #full_page_writes = on # recover from partial page writes
-#wal_buffers = 8   # min 4, 8KB each
+wal_buffers = 128  # min 4, 8KB each
 #commit_delay = 0  # range 0-10, in microseconds
 #commit_siblings = 5   # range 1-1000
 
 # - Checkpoints -
 
-#checkpoint_segments = 3   # in logfile segments, min 1, 16MB each
+checkpoint_segments = 192  # in logfile segments, min 1, 16MB each
 #checkpoint_timeout = 300  # range 30-3600, in seconds
 #checkpoint_warning = 30   # in seconds, 0 is off
 
@@ -175,12 +175,12 @@
 
 # - Planner Cost Constants -
 
-#effective_cache_size = 1000   # typically 8KB each
-#random_page_cost = 4  # units are one sequential page fetch 
+effective_cache_size = 393216  # typically 8KB each
+random_page_cost = 2   # units are one sequential page fetch 
# cost
-#cpu_tuple_cost = 0.01 # (same)
-#cpu_index_tuple_cost = 0.001  # (same)
-#cpu_operator_cost = 0.0025# (same)
+cpu_tuple_cost = 0.002 # (same)
+cpu_index_tuple_cost = 0.0002  # (same)
+cpu_operator_cost = 0.0005 # (same)
 
 # - Genetic Query Optimizer -
 
@@ -329,10 +329,10 @@
 
 # - Query/Index Statistics Collector -
 
-#stats_start_collector = on
-#stats_command_string = off
-#stats_block_level = off
-#stats_row_level = off
+stats_start_collector = on
+stats_command_string = on
+stats_block_level = on
+stats_row_level = on
 #stats_reset_on_server_start = off
 
 
@@ -340,8 +340,8 @@
 # AUTOVACUUM PARAMETERS
 #---
 
-#autovacuum = off  # enable autovacuum subprocess?
+autovacuum = on# enable autovacuum subprocess?
 #autovacuum_naptime = 60   # time between autovacuum runs, in secs
 #autovacuum_vacuum_threshold = 1000# min # of tuple updates before
# vacuum
 #autovacuum_analyze_threshold = 500# min # of tuple updates before 
@@ -400,7 +400,7 @@
 #---
 
 #deadlock_timeout = 1000   # in milliseconds
-#max_locks_per_transaction = 64# min 10
+max_locks_per_transaction = 512# min 10
 # note: each lock table slot uses ~220 bytes of shared memory, and there are
 # max_locks_per_transaction * (max_connections + max_prepared_transactions)
 # lock table slots.


[2] Actual schema for the table:
create table ipligenceIpAddress
(
ipFrom int8 not null default 0,
ipTo int8 not null default 0,
countryCode varchar(10) not null,
countryName varchar(255) not null,
continentCode varchar(10) not null,
continentName varchar(255) not null,
timeZone varchar(10) not null,
regionCode varchar(10) not null,
regionName varchar(255) not null,
owner varchar(255) not null,
cityName varchar(255) not null,
countyName varchar(255) not null,
latitude float8 not null,
longitude float8 not null,
createdTS timestamp with time zone default current_timestamp,
primary key(ipFrom, ipTo)
);

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Dimi Paun

On Mon, 2008-04-07 at 17:32 +0100, Heikki Linnakangas wrote:
 If I understood the original post correctly, the ipFrom and ipTo
 columns actually split a single linear ip address space into
 non-overlapping  chunks. Something like this:
 
 ipFrom  ipTo
 1   10
 10  20
 20  50
 50  60
 ...
 

Indeed.

 In that case, a regular index on (ipFrom, ipTo) should work just fine,
 and that's what he's got. Actually, an index on just ipFrom would
 probably work just as well. 

No, it doesn't:

perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom);
CREATE INDEX
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where 
ipfrom = 2130706433 and 2130706433 = ipto limit 1;
QUERY PLAN  
  
--
 Limit  (cost=0.00..0.07 rows=1 width=145) (actual time=1519.526..1519.527 
rows=1 loops=1)
   -  Index Scan using temp1 on ipligenceipaddress  (cost=0.00..84796.50 
rows=1209308 width=145) (actual time=1519.524..1519.524 rows=1 loops=1)
 Index Cond: (ipfrom = 2130706433)
 Filter: (2130706433 = ipto)
 Total runtime: 1519.562 ms
(5 rows)

This is huge, I'd say...

 The problem is that the planner doesn't know  about that special
 relationship between ipFrom and ipTo. Perhaps it could be hinted by
 explicitly specifying AND ipTo  ipFrom in the query?

Unfortunately, it still does a seq scan:

perpedes_db=# SET enable_seqscan = on;
SET
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where 
ipfrom = 2130706433 and 2130706433 = ipto AND ipTo  ipFrom limit 1;
 QUERY PLAN 


 Limit  (cost=0.00..0.35 rows=1 width=145) (actual time=1245.293..1245.294 
rows=1 loops=1)
   -  Seq Scan on ipligenceipaddress  (cost=0.00..142343.80 rows=403103 
width=145) (actual time=1245.290..1245.290 rows=1 loops=1)
 Filter: ((ipfrom = 2130706433) AND (2130706433 = ipto) AND (ipto  
ipfrom))
 Total runtime: 1245.335 ms
(4 rows)


 I don't know why the single index lookup took  300ms, though. That
 does seem high to me.

That is my feeling. I would have expected order of magnitude faster
execution times, the DB runs on fairly decent hardware...

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance