[PERFORM] Forcing more agressive index scans for BITMAP AND

2008-04-07 Thread Ow Mun Heng
just wondering if there's a special tweak i can do to force more usage
of indexes to do BITMAP ands?

I have a table like

A int
B int
C int
D int
E int
F int
g int

where A/B/C/D/E are indexes

There's ~20millions rows in the table.

Query are something like this.

select * from table 
where A=X
and B = Y
and C = Z
and D = AA
and E = BB

the query plan will only pick 2 indexes to do the bitmap.
I'm not sure how to tweak the config for it to use more indexes.

Box is a celeron 1.7 w/ 768MB ram with shared buffers at 250MB and
effective cache size 350MB



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing more agressive index scans for BITMAP AND

2008-04-07 Thread Matthew

On Mon, 7 Apr 2008, Ow Mun Heng wrote:

just wondering if there's a special tweak i can do to force more usage
of indexes to do BITMAP ands?


There's no need to post this again. You have already had a couple of 
useful answers.


Matthew

--
All of this sounds mildly turgid and messy and confusing... but what the
heck. That's what programming's all about, really
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Hell, Robert
We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel 
Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an EMC² 
CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). 
When we do random I/O with a small test tool (reading random 8k blocks from big 
files in 200 threads) on the disk we retrieve data with about 25 MB/s.

For testing purpose a test set of about 700.000 queries (those were logged 
during a problem situation) are executed against the database in 180 concurrent 
threads.
Some of the queries are very small and fast - other ones read more than 5 
blocks. All queries are selects (using cursors) - there is only read activity 
on the database.

By setting tuple_fraction for cursors to 0.0 instead of 0.1 
(http://archives.postgresql.org/pgsql-performance/2008-04/msg00018.php) we 
reduced reads during the test (pg_statio_all_tables):
- 8.2.7
  reads from disk: 4.395.276, reads from cache: 471.575.925
- 8.2.7 cursor_tuple_fraction=0.0
  Reads from disk: 3.406.164, reads from cache: 37.924.625

But the duration of the test was only reduced by 18 % (from 110 minutes to 90 
minutes).

When running the test with tuple_fraction=0.0 we observe the following on the 
server:
- avg read from disk is at 7 MB/s
- when we start the random I/O tool during the test we again read data with 
about 25 MB/s from disk (for me it seems that disk isn't the bottleneck)
- cpu time is divided between idle and iowait - user and system cpu are 
practically zero
- there are from 5000 to 1 context switches per second

I can't see a bottleneck here. Does anyone has an explanation for that behavior?

Regards,
Robert


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Ivan Voras
Hell, Robert wrote:
 We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 
 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an 
 EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). 
 When we do random I/O with a small test tool (reading random 8k blocks from 
 big files in 200 threads) on the disk we retrieve data with about 25 MB/s.

How do you test random IO? Do you use this utility:
http://arctic.org/~dean/randomio/ ?

If not, try using it, with same parameters. It might be that the latency
is destroying your performance. Do you use NFS or are you accessing the
storage as SAN?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [SOLVED] [PERFORM] Query plan excluding index on view

2008-04-07 Thread Matt Klinker
Removing the constants definitely did take care of the issue on 8.3 (still
same query plan on 8.1).  Thanks for your help in getting this resolved, and
sorry again for not including all relevant information on my initial request

On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Matt Klinker [EMAIL PROTECTED] writes:
  --Joined View:
  CREATE OR REPLACE VIEW directory_listing AS
   SELECT school.id, school.name, school.description, 119075291 AS
  listing_type_fid
 FROM school
  UNION ALL
   SELECT company.id, company.name, company.description, 119074833 AS
  listing_type_fid
 FROM company;

 Ah, there's the problem :-(.  Can you get rid of the constants here?
 The planner's currently not smart about UNION ALL subqueries unless
 their SELECT lists contain just simple column references.

 (Yes, fixing that is on the todo list, but don't hold your breath...
 it'll be 8.4 material at the earliest.)

regards, tom lane



[PERFORM] Severe performance problems for simple query

2008-04-07 Thread Dimi Paun
Hi folks,

Here is the executive summary:
  * RHEL5 (postgresql 8.1, .conf tweaked for performance [1])
  * 2x Intel E5410 @ 2.33GHz (8 cores), 8GB RAM, 15KRPM SAS disks
  * 4.9 million records in a table (IP address info)
  * composite primary key: primary key(ipFrom, ipTo)
  * ipFrom/ipTo are int8 (see below for full schema info [2])
  * bad performance on queries of the form:
select * from ipTable where  ipFrom = val and val = ipTo

Part of the problem is that most of the time PostgreSQL decides to
use seq scans on the table, resulting in queries taking many seconds
(sometimes 3, 7, 20 sec). We did ANALYZE and enabled statistics, and
that sometimes fixes the problem temporarily, but overnight (without
the database being used), it reverts to seq scans. For example:

perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where 
ipfrom = 2130706433 and 2130706433 = ipto;
  QUERY PLAN
   
---
 Seq Scan on ipligenceipaddress  (cost=0.00..139903.80 rows=1209530 width=145) 
(actual time=1233.628..2100.891 rows=1 loops=1)
   Filter: ((ipfrom = 2130706433) AND (2130706433 = ipto))
 Total runtime: 2100.928 ms
(3 rows)



Moreover, even when it is using the index, it is not all that fast:
perpedes_db=# SET enable_seqscan = off;
SET
perpedes_db=# EXPLAIN ANALYZE select * from static.ipligenceipaddress where 
3507360727 between ipfrom and ipto;
 QUERY 
PLAN 

 Index Scan using ipligenceipaddress_pkey on ipligenceipaddress  
(cost=0.00..148143.67 rows=806199 width=146) (actual time=351.316..351.320 
rows=1 loops=1)
   Index Cond: ((3507360727::bigint = ipfrom) AND (3507360727::bigint = ipto))
 Total runtime: 351.355 ms
(3 rows)


So, my questions are:
  * did we miss any obvious settings?
  * why does it decide all of a sudden to do seq scans?
  * adding a limit 1 sometimes causes the query to be even slower, 
when in fact it should have helped the DB to return faster, no?
  * in the ideal case, what execution times should I be expecting?
Is ~400ms reasonable? I would have hoped this to be 40ms... 
  * AFAICT, the (ipFrom, ipTo) intervals should be mutually exclusive,
so the result should be at most one row. Can this info help the
DB do a faster query? If so, how can I express that?
  * the DB takes tens of minutes to do an ANALYZE on this table,
which doesn't happen with the default configuration. Any idea
how I can fix that?

Thank you!


[1] Changes from standard config:
--- /var/lib/pgsql/data/postgresql.conf.orig2008-03-21 11:51:45.0 
-0400
+++ /var/lib/pgsql/data/postgresql.conf 2008-03-21 21:04:38.0 -0400
@@ -90,19 +90,19 @@
 
 # - Memory -
 
-shared_buffers = 1000  # min 16 or max_connections*2, 8KB each
-#temp_buffers = 1000   # min 100, 8KB each
-#max_prepared_transactions = 5 # can be 0 or more
+shared_buffers = 5 # min 16 or max_connections*2, 8KB each
+temp_buffers = 1   # min 100, 8KB each
+max_prepared_transactions = 100# can be 0 or more
 # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
 # per transaction slot, plus lock space (see max_locks_per_transaction).
-#work_mem = 1024   # min 64, size in KB
-#maintenance_work_mem = 16384  # min 1024, size in KB
+work_mem = 2048# min 64, size in KB
+maintenance_work_mem = 131072  # min 1024, size in KB
 #max_stack_depth = 2048# min 100, size in KB
 
 # - Free Space Map -
 
-#max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each
-#max_fsm_relations = 1000  # min 100, ~70 bytes each
+max_fsm_pages = 20 # min max_fsm_relations*16, 6 bytes each
+max_fsm_relations = 1  # min 100, ~70 bytes each
 
 # - Kernel Resource Usage -
 
@@ -111,11 +111,11 @@
 
 # - Cost-Based Vacuum Delay -
 
-#vacuum_cost_delay = 0 # 0-1000 milliseconds
-#vacuum_cost_page_hit = 1  # 0-1 credits
+vacuum_cost_delay = 200# 0-1000 milliseconds
+vacuum_cost_page_hit = 6   # 0-1 credits
 #vacuum_cost_page_miss = 10# 0-1 credits
 #vacuum_cost_page_dirty = 20   # 0-1 credits
-#vacuum_cost_limit = 200   # 0-1 credits
+vacuum_cost_limit = 100 

Re: [PERFORM] Partitioned tables - planner wont use indexes

2008-04-07 Thread Bricklen Anderson

kevin kempter wrote:
One of the things we need to query is the min date from the master table 
- we may explore alternatives for this particular query, however even if 
we fix this query I think we have a fundamental issue with the use of 
indexes (actuallt the non-use) by the planner.


We had a similar requirement, so I've been using a function that loops 
over the child tables, and queries for the min date from each. If all 
you need is the date, you can try a function call. Here is a modified 
version of what I've been using:


CREATE OR REPLACE function get_min_date() RETURNS DATE as $_$
DECLARE
x RECORD;
min_date DATE;
min_date_tmp DATE;
qry TEXT;
BEGIN
/* can also test MIN() aggregate, rather than ORDER BY/LIMIT */
FOR x IN EXECUTE 'select tablename from pg_tables where tablename 
like ''part_20%''' loop
qry := 'SELECT logdate FROM '||x.tablename||' ORDER BY logdate 
LIMIT 1';

EXECUTE qry INTO min_date_tmp;
IF (min_date IS NULL OR (min_date_tmp IS NOT NULL AND 
min_date_tmpmin_date)) THEN

min_date := min_date_tmp;
END IF;
END LOOP;
RETURN min_date;
END;
$_$ language plpgsql immutable;

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Matthew

On Mon, 7 Apr 2008, Dimi Paun wrote:

 * bad performance on queries of the form:
   select * from ipTable where  ipFrom = val and val = ipTo


This type of query is very hard for a normal B-tree index to answer. For 
example, say val is half-way between min and max values. If you have an 
index on ipFrom, it will be able to restrict the entries to about half of 
them, which is no real benefit over a sequential scan. Likewise, an index 
on ipTo will be able to restrict the entries to half of them, with no 
benefit. The intersection of these two halves may be just one entry, but 
finding that out is non-trivial. An index bitmap scan would do it if you 
can persuade Postgres to do that, but really you want an R-tree index on 
the two columns, like I have requested in the past.


You can achieve that to some degree by using Postgres' geometric indexes, 
but it's ugly. Note that the following is completely untested and may not 
work with int8 values.


Firstly, you need to create the index. The index will contain fake boxes 
that stretch from ipFrom to ipTo.


CREATE INDEX index_name ON table_name ((box '((ipFrom, 0), (ipTo, 1))'))

Then, instead of querying simply for fromIp and toIp, query on whether the 
fake box overlaps with a point representing val.


SELECT blah FROM table_name
  WHERE (box '((ipFrom, 0), (ipTo, 2))') @ (point '(val, 1)');

Or alternatively you could adapt the seg GiST index to int8 values.

Hope you get this sorted out - it's something I'll have to do at some 
point soon too.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Matthew

On Mon, 7 Apr 2008, Dimi Paun wrote:

 * bad performance on queries of the form:
   select * from ipTable where  ipFrom = val and val = ipTo


Oh yes, if you can guarantee that no two entries overlap at all, then 
there is a simpler way. Just create a B-tree index on ipFrom as usual, 
sort by ipFrom, and LIMIT to the first result:


SELECT blah FROM table_name
  WHERE ipFrom = 42 ORDER BY ipFrom DESC LIMIT 1

This should run *very* quickly. However, if any entries overlap at all 
then you will get incorrect results.


Matthew

--
I'm always interested when [cold callers] try to flog conservatories.
Anyone who can actually attach a conservatory to a fourth floor flat
stands a marginally better than average chance of winning my custom.
(Seen on Usenet)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Heikki Linnakangas

Matthew wrote:

On Mon, 7 Apr 2008, Dimi Paun wrote:

 * bad performance on queries of the form:
   select * from ipTable where  ipFrom = val and val = ipTo


This type of query is very hard for a normal B-tree index to answer. For 
example, say val is half-way between min and max values. If you have an 
index on ipFrom, it will be able to restrict the entries to about half 
of them, which is no real benefit over a sequential scan. Likewise, an 
index on ipTo will be able to restrict the entries to half of them, with 
no benefit. The intersection of these two halves may be just one entry, 
but finding that out is non-trivial. An index bitmap scan would do it if 
you can persuade Postgres to do that, but really you want an R-tree 
index on the two columns, like I have requested in the past.


If I understood the original post correctly, the ipFrom and ipTo columns 
actually split a single linear ip address space into non-overlapping 
chunks. Something like this:


ipFrom  ipTo
1   10
10  20
20  50
50  60
...

In that case, a regular index on (ipFrom, ipTo) should work just fine, 
and that's what he's got. Actually, an index on just ipFrom would 
probably work just as well. The problem is that the planner doesn't know 
about that special relationship between ipFrom and ipTo. Perhaps it 
could be hinted by explicitly specifying AND ipTo  ipFrom in the query?


I don't know why the single index lookup took  300ms, though. That does 
seem high to me.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Dimi Paun

On Mon, 2008-04-07 at 17:32 +0100, Heikki Linnakangas wrote:
 If I understood the original post correctly, the ipFrom and ipTo
 columns actually split a single linear ip address space into
 non-overlapping  chunks. Something like this:
 
 ipFrom  ipTo
 1   10
 10  20
 20  50
 50  60
 ...
 

Indeed.

 In that case, a regular index on (ipFrom, ipTo) should work just fine,
 and that's what he's got. Actually, an index on just ipFrom would
 probably work just as well. 

No, it doesn't:

perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom);
CREATE INDEX
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where 
ipfrom = 2130706433 and 2130706433 = ipto limit 1;
QUERY PLAN  
  
--
 Limit  (cost=0.00..0.07 rows=1 width=145) (actual time=1519.526..1519.527 
rows=1 loops=1)
   -  Index Scan using temp1 on ipligenceipaddress  (cost=0.00..84796.50 
rows=1209308 width=145) (actual time=1519.524..1519.524 rows=1 loops=1)
 Index Cond: (ipfrom = 2130706433)
 Filter: (2130706433 = ipto)
 Total runtime: 1519.562 ms
(5 rows)

This is huge, I'd say...

 The problem is that the planner doesn't know  about that special
 relationship between ipFrom and ipTo. Perhaps it could be hinted by
 explicitly specifying AND ipTo  ipFrom in the query?

Unfortunately, it still does a seq scan:

perpedes_db=# SET enable_seqscan = on;
SET
perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where 
ipfrom = 2130706433 and 2130706433 = ipto AND ipTo  ipFrom limit 1;
 QUERY PLAN 


 Limit  (cost=0.00..0.35 rows=1 width=145) (actual time=1245.293..1245.294 
rows=1 loops=1)
   -  Seq Scan on ipligenceipaddress  (cost=0.00..142343.80 rows=403103 
width=145) (actual time=1245.290..1245.290 rows=1 loops=1)
 Filter: ((ipfrom = 2130706433) AND (2130706433 = ipto) AND (ipto  
ipfrom))
 Total runtime: 1245.335 ms
(4 rows)


 I don't know why the single index lookup took  300ms, though. That
 does seem high to me.

That is my feeling. I would have expected order of magnitude faster
execution times, the DB runs on fairly decent hardware...

-- 
Dimi Paun [EMAIL PROTECTED]
Lattica, Inc.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Matthew

On Mon, 7 Apr 2008, Heikki Linnakangas wrote:
In that case, a regular index on (ipFrom, ipTo) should work just fine, and 
that's what he's got. Actually, an index on just ipFrom would probably work 
just as well. The problem is that the planner doesn't know about that special 
relationship between ipFrom and ipTo. Perhaps it could be hinted by 
explicitly specifying AND ipTo  ipFrom in the query?


Actually, the problem is that the database doesn't know that the entries 
don't overlap. For all it knows, you could have data like this:


0   10
10  20
20  30
... ten million rows later
10030   10040
10040   10050
0   10050

So say you wanted to search for the value of 50,000,000. The index on 
ipFrom would select five million rows, all of which then have to be 
filtered by the constraint on ipTo. Likewise, an index on ipTo would 
return five million rows, all of which then have to be filtered by the 
constraint on ipFrom. If you just read the index and took the closest 
entry to the value, then you would miss out on the last entry which 
overlaps with the whole range. An R-tree on both fields will correctly 
find the small set of entries that are relevant.


It would be very cool to be able to create an R-tree index that would just 
make the original query run fast without needing alteration. I had a look 
at this a while back, but it is not currently possible in GiST, because 
only one field is handed to the index at a time. So all the current R-tree 
implementations require that you generate an object containing the two 
values, like the box, and then index that.


Something for 8.4?

Matthew

--
$ rm core
Segmentation Fault (core dumped)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance using table partitions in Postgres 8.2.6

2008-04-07 Thread Janet Jacobsen
Hi.  I am looking for information about using table partitions in Postgres,
in particular w.r.t. performance when querying a partitioned table.

I implemented table partitioning following the documentation, which
is quite good and easy to follow (Chapter 5.9).  I am doing some
testing, so at this point my test dataset has 1M records; in the table,
each row will have an autoincrement integer index, two numerics,
one integer (DMID), and one smallint.

I created five master tables and created a different number of
partitions for each one (2, 4, 8, 16, 32).  I am using a range partition
for the integer, DMID, which represents a file index.  The values of
DMID range from 0 to 180360.  I also create and index for DMID.

I don't understand the timing results that I am getting.  I got these
times by averaging the results of querying the database from within
a loop in a Perl script:

no. of partitions  constraint_exclusion off  constraint_exclusion on
 2 0.597 ms0.427 ms
 4 0.653 ms0.414 ms
 8 0.673 ms   0.654 ms
161.068 ms   1.014 ms
322.301 ms   1.537 ms

I expected that the query time would decrease as the number of
partitions increases, but that's not what I am seeing.  I get better results
(0.29 ms) if I simply index DMID and don't use the partitions.

When I run explain analyzeon a query, the results (with and without
constraint_exclusion set) indicate that fewer partitions are being scanned
when constraint_exclusion is set to on.

I am testing table partitioning in Postgres against table partitioning
using MySQL.  The results for MySQL make sense: more partitions,
faster query times.

The underlying application is a file index.  It is expected that groups
of files in selected ranges of DMID values will be accessed more
often, but this is not the key implementation issue.

This is basically a write once, read often database.  We expect that
the database will grow to 50M records in a few years, and I thought
that using range partitions for the DMID value might decrease the
query time.

Should I be using many more partitions?  Am I expecting too much in
terms of performance when using partitions?  Do these results point to
some obvious implementation error?

Thank you for any help/suggestions you can give.

Janet Jacobsen

-- 
Janet Jacobsen
NERSC Analytics/HPCRD Visualization Group
Lawrence Berkeley National Laboratory


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance using table partitions in Postgres 8.2.6

2008-04-07 Thread Albert Cervera Areny
A Dilluns 07 Abril 2008, Janet Jacobsen va escriure:
 no. of partitions  constraint_exclusion off  constraint_exclusion on
  2 0.597 ms0.427 ms
  4 0.653 ms0.414 ms
  8 0.673 ms   0.654 ms
 161.068 ms   1.014 ms
 322.301 ms   1.537 ms

 I expected that the query time would decrease as the number of
 partitions increases, but that's not what I am seeing.  I get better
 results (0.29 ms) if I simply index DMID and don't use the partitions.

I see really small times here so probably the overhead that partitioning 
imposes isn't worth yet. Maybe with 50M rows it'll help, you could try 
feeding those 50M tuples and test again.

-- 
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12


  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individualsender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internete-mail,pleaseinform usinmmediately.



 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance with temporary table

2008-04-07 Thread samantha mahindrakar
Hi
I have written a program that imputes(or rather corrects data) with in
my database.
Iam using a temporary table where in i put data from other partitoined
table. I then query this table to get the desired data.But the thing
is this temporary table has to be craeted for every record that i need
to correct and there are thousands of such records that need to be
corrected.
So the program necessarily creates a temporary table evrytime it has
to correct a record. However this table is dropeed after each record
is corrected.
The program works fine.but it runs for a very long timeor it
runs for days.
Iam particularyly finding that it takes more time during this statement:

NOTICE:  theQuery in createtablevolumelaneshist CREATE TEMPORARY TABLE
predictiontable(lane_id, measurement_start, speed,volume,occupancy) AS
SELECT lane_id, measurement_start, speed,volume,occupancy
FROM samantha.lane_data_I_495 WHERE
lane_id IN (1317) AND
measurement_start BETWEEN '2007-11-18 09:25:00' AND 2007-11-19 01:39:06'

Iam not sure if i can use a cursor to replicate the functionality of
the temp table. Is the performance bad because of the creation and
deletion of the temp table?


Thanks
Samantha

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] what worked: performance improvements for geo-spatial searching on FreeBSD

2008-04-07 Thread Mark Stosberg

The http://www.1-800-save-a-pet.com site is hosted with FreeBSD and
PostgreSQL and as a geo-spatial search as a central feature.

One thing that made a substantial performance improvement was switching
from the geo_distance() search in the earthdistance contrib, to use
the cube based geo-spatial calculations, also in available in contrib/
In our case, the slight loss of precision between the two methods didn't
matter.

The other things that made a noticeable performance improvement was
upgrading our servers from FreeBSD 4.x or 5.x (old, I know!) to FreeBSD
6.2 or later. We also upgrade these systems from PostgreSQL 8.1 to 8.2
at the same time.  I assume the upgrade to 8.2 must be responsible at
least in part for the performance gains.

The result of these two rounds of updates is that our overall CPU
capacity in the cluster seems to be double or triple what it was before.

As the site grows we continue to be very happy with the performance,
features and stability of PostgreSQL.

   Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews

2008-04-07 Thread Mark Stosberg


When traffic to our PostgreSQL-backed website spikes, the first resource
we see being exhausted is the DB slots on the master server (currently
set to about 400).

I expect that as new Apache/mod_perl children are being put to us, they
are creating new database connections.

I'm interested in recommendations to funnel more of that traffic through
 fewer DB slots, if that's possible. (We could also consider increasing
the handles available, since the DB server has some CPU and memory to
spare).

I'm particularly interested in review of DBD::Gofer, which seems like it
would help with this in our Perl application:
http://search.cpan.org/dist/DBI/lib/DBD/Gofer.pm

I realize it has limitations, like no transactions, but I think we
would still able to use it selectively in our application.

   Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews

2008-04-07 Thread Joshua D. Drake
On Mon, 07 Apr 2008 14:36:00 -0400
Mark Stosberg [EMAIL PROTECTED] wrote:

 
 I'm particularly interested in review of DBD::Gofer, which seems like
 it would help with this in our Perl application:
 http://search.cpan.org/dist/DBI/lib/DBD/Gofer.pm
 
 I realize it has limitations, like no transactions, but I think we
 would still able to use it selectively in our application.

I would stick to proven postgresql technologies such as pgbouncer.

Sincerely,

Joshua D. Drake


 
 Mark
 
 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] what worked: performance improvements for geo-spatial searching on FreeBSD

2008-04-07 Thread Oleg Bartunov

Mark,

do you know about our sky segmentation code Q3C,
see details http://www.sai.msu.su/~megera/wiki/SkyPixelization
We use it for billions objects in database and quite happy.

Oleg

On Mon, 7 Apr 2008, Mark Stosberg wrote:


The http://www.1-800-save-a-pet.com site is hosted with FreeBSD and
PostgreSQL and as a geo-spatial search as a central feature.

One thing that made a substantial performance improvement was switching
from the geo_distance() search in the earthdistance contrib, to use
the cube based geo-spatial calculations, also in available in contrib/
In our case, the slight loss of precision between the two methods didn't
matter.

The other things that made a noticeable performance improvement was
upgrading our servers from FreeBSD 4.x or 5.x (old, I know!) to FreeBSD
6.2 or later. We also upgrade these systems from PostgreSQL 8.1 to 8.2
at the same time.  I assume the upgrade to 8.2 must be responsible at
least in part for the performance gains.

The result of these two rounds of updates is that our overall CPU
capacity in the cluster seems to be double or triple what it was before.

As the site grows we continue to be very happy with the performance,
features and stability of PostgreSQL.

  Mark





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Severe performance problems for simple query

2008-04-07 Thread Florian Weimer
* Dimi Paun:

   * 4.9 million records in a table (IP address info)

You should use the ip4r type for that.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] per-review of PgBouncer / Slony design

2008-04-07 Thread Mark Stosberg



I would stick to proven postgresql technologies such as pgbouncer.


Thanks for the fast recommendation, Joshua.  I'll consider it.

Our application is Slony-replicated web/db project with two slaves.

Does this design seem sensible?

- Run one pgbouncer server on the master, with settings to
  service the master and both slaves.

- We already handle balancing traffic between the slaves separately, so 
that can remain unchanged.


- Use Session Pooling both both the masters and the slaves. In theory, 
the slaves should just be doing transaction-less SELECT statements, so a 
more aggressive setting might be possible, but I believe there might be 
a leak in the logic where we create a temporary table on the slave in 
one case.


- Redirect all application connections through pgbouncer

###

From graphs we keep, we can see that the slaves currently use a max of 
about 64 connections...they are far from maxing out what's possible. So 
I was trying to think through if made sense to bother using the 
pgBouncer layer with them. I through of two potential reasons to still 
use it:
 - In the event of a major traffic spike on the web servers, pgbouncer 
would keep the number of db slots under control.
 - Potentially there's a performance gain in having PgBouncer hold the 
connections open.


Does that analysis seem correct?

For the master's pool size, I thought I would just choose a number 
that's a little larger that the daily max number of DB slots in use.


   Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Hell, Robert
I tried different other tools for random IO (including a self written one which 
does random lseek and read).

This tool, started during one of our tests, achieves 2 iops (8k each).
Started alone I get something about 1,500 iops with an avg latency of 100 ms.

We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our 
bottleneck?
Any other tuning ideas?

Regards,
Robert

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ivan Voras
Sent: Montag, 07. April 2008 14:38
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Looking for bottleneck during load test

Hell, Robert wrote:
 We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 
 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an 
 EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). 
 When we do random I/O with a small test tool (reading random 8k blocks from 
 big files in 200 threads) on the disk we retrieve data with about 25 MB/s.

How do you test random IO? Do you use this utility:
http://arctic.org/~dean/randomio/ ?

If not, try using it, with same parameters. It might be that the latency
is destroying your performance. Do you use NFS or are you accessing the
storage as SAN?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Bill Moran
In response to Hell, Robert [EMAIL PROTECTED]:

 I tried different other tools for random IO (including a self written one 
 which does random lseek and read).
 
 This tool, started during one of our tests, achieves 2 iops (8k each).
 Started alone I get something about 1,500 iops with an avg latency of 100 ms.
 
 We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our 
 bottleneck?
 Any other tuning ideas?

You know, with all the performance problems people have been bringing up
with regard to SANs, I'm putting SAN in the same category as RAID-5 ...


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ivan Voras
 Sent: Montag, 07. April 2008 14:38
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Looking for bottleneck during load test
 
 Hell, Robert wrote:
  We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 
  Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an 
  EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). 
  When we do random I/O with a small test tool (reading random 8k blocks from 
  big files in 200 threads) on the disk we retrieve data with about 25 MB/s.
 
 How do you test random IO? Do you use this utility:
 http://arctic.org/~dean/randomio/ ?
 
 If not, try using it, with same parameters. It might be that the latency
 is destroying your performance. Do you use NFS or are you accessing the
 storage as SAN?
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Greg Smith

On Mon, 7 Apr 2008, Bill Moran wrote:


You know, with all the performance problems people have been bringing up
with regard to SANs, I'm putting SAN in the same category as RAID-5 ...


Not really fair, because unlike RAID5 it's at least *possible* to get good 
write performance out of a SAN.  Just harder than most people think it is.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looking for bottleneck during load test

2008-04-07 Thread Bill Moran
In response to Greg Smith [EMAIL PROTECTED]:

 On Mon, 7 Apr 2008, Bill Moran wrote:
 
  You know, with all the performance problems people have been bringing up
  with regard to SANs, I'm putting SAN in the same category as RAID-5 ...
 
 Not really fair, because unlike RAID5 it's at least *possible* to get good 
 write performance out of a SAN.  Just harder than most people think it is.

*shrug*  in theory it's possible to get good write performance out of
RAID-5 as well, with fast enough disks and enough cache ...

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bulk insert performance problem

2008-04-07 Thread Christian Bourque
Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!

I almost tried everything suggested on this list, changed our external
raid array from raid 5 to raid 10, tweaked postgresql.conf to the best
of my knowledge, moved pg_xlog to a different array, dropped the
tables before running the script. But the performance gain was
negligible even after all these changes...

IMHO the hardware that we use should be up to the task: Dell PowerEdge
6850, 4 x 3.0Ghz Dual Core Xeon, 8GB RAM, 3 x 300GB SAS 10K in raid 5
for / and 6 x 300GB SAS 10K in raid 10 (MD1000) for PG data, the data
filesystem is ext3 mounted with noatime and data=writeback. Running on
openSUSE 10.3 with PostgreSQL 8.2.7. The server is dedicated for
PostgreSQL...

We tested the same script and schema with Oracle 10g on the same
machine and it took only 2.5h to complete!

What I don't understand is that with Oracle the performance seems
always consistent but with PG it deteriorates over time...

Any idea? Is there any other improvements I could do?

Thanks

Christian

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread Craig Ringer

Christian Bourque wrote:

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!
  
Would I be correct in guessing that there are foreign key relationships 
between those tables, and that there are significant numbers of indexes 
in use?


The foreign key checking costs will go up as the tables grow, and AFAIK 
the indexes get a bit more expensive to maintain too.


If possible you should probably drop your foreign key relationships and 
drop your indexes, insert your data, then re-create the indexes and 
foreign keys. The foreign keys will be rechecked when you recreate them, 
and it's *vastly* faster to do it that way. Similarly, building an index 
from scratch is quite a bit faster than progressively adding to it. Of 
course, dropping the indices is only useful if you aren't querying the 
tables as you build them.


Also, if you're loading data using stored procedures you should avoid 
the use of exception blocks. I had some major problems with my bulk data 
conversion code due to overuse of exception blocks creating large 
numbers of subtransactions behind the scenes and slowing everything to a 
crawl.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread Chris

Craig Ringer wrote:

Christian Bourque wrote:

Hi,

I have a performance problem with a script that does massive bulk
insert in 6 tables. When the script starts the performance is really
good but will degrade minute after minute and take almost a day to
finish!
  
Would I be correct in guessing that there are foreign key relationships 
between those tables, and that there are significant numbers of indexes 
in use?


The foreign key checking costs will go up as the tables grow, and AFAIK 
the indexes get a bit more expensive to maintain too.


If possible you should probably drop your foreign key relationships and 
drop your indexes, insert your data, then re-create the indexes and 
foreign keys. The foreign keys will be rechecked when you recreate them, 
and it's *vastly* faster to do it that way. Similarly, building an index 
from scratch is quite a bit faster than progressively adding to it. Of 
course, dropping the indices is only useful if you aren't querying the 
tables as you build them.


If you are, add analyze commands through the import, eg every 10,000 
rows. Then your checks should be a bit faster.


The other suggestion would be to do block commits:

begin;
do stuff for 5000 rows;
commit;

repeat until finished.

--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] bulk data loading

2008-04-07 Thread Potluri Srikanth

Hi all,

I need to do a bulk data loading around 704GB (log file size) at present in 8 hrs (1 am - 9am). The data file size may increase 3 to 5 times in future.

Using COPY it takes 96 hrs to finish the task.
What is the best way to do it ?

HARDWARE: SUN THUMPER/ RAID10
OS : SOLARIS 10.
DB: Greenplum/Postgres


Regards,

Srikanth k Potluri

+63 9177444783(philippines) 








Re: [PERFORM] bulk insert performance problem

2008-04-07 Thread bitaoxiao
I use 1 rows,have big blob


2008-04-08 



bitaoxiao 



发件人: Chris 
发送时间: 2008-04-08  11:35:57 
收件人: Christian Bourque 
抄送: pgsql-performance@postgresql.org 
主题: Re: [PERFORM] bulk insert performance problem 
 
Craig Ringer wrote:
 Christian Bourque wrote:
 Hi,

 I have a performance problem with a script that does massive bulk
 insert in 6 tables. When the script starts the performance is really
 good but will degrade minute after minute and take almost a day to
 finish!
   
 Would I be correct in guessing that there are foreign key relationships 
 between those tables, and that there are significant numbers of indexes 
 in use?
 
 The foreign key checking costs will go up as the tables grow, and AFAIK 
 the indexes get a bit more expensive to maintain too.
 
 If possible you should probably drop your foreign key relationships and 
 drop your indexes, insert your data, then re-create the indexes and 
 foreign keys. The foreign keys will be rechecked when you recreate them, 
 and it's *vastly* faster to do it that way. Similarly, building an index 
 from scratch is quite a bit faster than progressively adding to it. Of 
 course, dropping the indices is only useful if you aren't querying the 
 tables as you build them.
If you are, add analyze commands through the import, eg every 10,000 
rows. Then your checks should be a bit faster.
The other suggestion would be to do block commits:
begin;
do stuff for 5000 rows;
commit;
repeat until finished.
-- 
Postgresql  php tutorials
http://www.designmagick.com/
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance