Re: [PERFORM] Remote access to Postgresql slow

2012-09-15 Thread Andrew Barnham
Is your network link between server and client across the public internet?

You need to check bandwidth and latency characteristics of your network.

A simple test run following on server host and run it again on the client
host.

time psql [connect details] -c 'select now()'

I access postgresql database across the public internet (by tunnelling port
5432 across compressed ssh sessions). In my case latency is a significant
penalty.  Locally time response is for above is 10ms but remotely it is 30
times slower (350ms)

You may need to install wireshark or similar and monitor client traffic in
order to figure out the network penalty.  Maybe your app goes back and
forward to postgres multiple time;  does lots of chatter. If so then
latency cost becomes very significant. You want to try and minimise the
number of postgresql calls; retrieve more data will less SQL operations.



On Fri, Sep 14, 2012 at 7:02 PM, Manoj Agarwal m...@ockham.be wrote:

 **

 Hi,

 I have a Centos 6.2 Virtual machine that contains Postgresql version 8.4
 database. The application installed in this Virtual machine uses this
 database that is local to this Virtual machine.  I have tried to offload
 the database, by installing it on a remote Virtual machine, on another
 server, and tried to connect to it from my local Virtual machine. The
 application configuration remains the same, only database is offloaded to a
 remote Virtual machine on another server and the connection parameters have
 changed.  The connection is all fine and the application can access the
 remote database.

 I have observed that the Postgresql is responding extremely slow.  What
 should I do to improve its performance?

 Please suggest.


 Kind Regards,
 Manoj Agarwal




[PERFORM] Query planner suggestion, for indexes with similar but not exact ordering.

2011-11-14 Thread Andrew Barnham
Hi all.  Been using postgres for years, and lurking on this newsgroup for a
short while now to help me gain the benefit of your expertise and
experience and learn how to get most out of postgresql possible.

I do a fair bit of work on tables using composite keys.  I have discovered
a couple of things lately that may or may not be worth looking at in terms
of query planner.

Consider my archetypal table. Based on real data.

table master
franchise  smallint,
partnumber varchar

It is a spare parts list, combining part numbers from multiple suppliers
(franchise). Part numbers are typically unique but sometimes there are
duplicates.  I have use cases which concern both finding parts by a
specific franchise or finding parts system wide. In my table I have follow
stats:
* Number of records : 2,343,569
* Number of unique partnumber records : 2,130,379  (i.e. for a given
partnumber there is on average, 1.1 records. i.e. a partnumber is used by
1.1 suppliers. The partnumber with the most number of records = 8 records.
* Number of unique suppliers : 35

Now consider following query: its purpose is to render next 20 rows at an
aribtrary position. The position being after record matching franchise=10,
partnumber='1' in partnumber then franchise order.

select * from master where partnum='1' and (partnum'1' or franchise10)
order by partnum,franchise limit 20;

Now if I have a composite index on partnum + franchise. This query performs
the way you would expect and very quickly.

But if I have an index on partnum only the system seqscan's master. And
yields poor performance. i.e.:

 Limit  (cost=143060.23..143060.28 rows=20 width=93) (actual
time=2307.986..2307.998 rows=20 loops=1)
   -  Sort  (cost=143060.23..148570.14 rows=2203967 width=93) (actual
time=2307.982..2307.986 rows=20 loops=1)
 Sort Key: partnum, franchise
 Sort Method:  top-N heapsort  Memory: 19kB
 -  Seq Scan on master  (cost=0.00..84413.46 rows=2203967
width=93) (actual time=0.019..1457.001 rows=2226792 loops=1)
   Filter: (((partnum)::text = '1'::text) AND
(((partnum)::text  '1'::text) OR (franchise  10)))
 Total runtime: 2308.118 ms


I wonder, if it is possible and worthwhile, to setup the query planner to
recognize that because of the stats I indicate above, that a sort by
partnum is almost exactly the same as a sort by partnum+franchise.  And
doing a Index scan on partnum index, and sorting results in memory will be
dramatically faster.  The sort buffer only needs to be very small, will
only grow to 8 records only at most in my above example.  The buffer will
scan partnum index, and as long as partnum is the same, it will sort that
small segment, as soon as the partnum increments when walking the index,
the buffer zeros out again for next sort group.

Artificially simulating this in SQL (only works with foreknowledge of max
count of records for a given part. i.e. +8 ) shows the dramatic theoretical
performance gain over the above.
explain analyze select * from (select * from master where partnum='1'
order by partnum limit 20+8) x where partnum'1' or franchise10 order by
partnum,franchise limit 20;
=
 Limit  (cost=77.71..77.75 rows=16 width=230) (actual time=0.511..0.555
rows=20 loops=1)
   -  Sort  (cost=77.71..77.75 rows=16 width=230) (actual
time=0.507..0.524 rows=20 loops=1)
 Sort Key: x.partnum, x.franchise
 Sort Method:  quicksort  Memory: 21kB
 -  Subquery Scan x  (cost=0.00..77.39 rows=16 width=230) (actual
time=0.195..0.367 rows=28 loops=1)
   Filter: (((x.partnum)::text  '1'::text) OR (x.franchise 
10))
   -  Limit  (cost=0.00..76.97 rows=28 width=93) (actual
time=0.180..0.282 rows=28 loops=1)
 -  Index Scan using master_searchpartkey on master
(cost=0.00..6134000.35 rows=2231481 width=93) (actual time=0.178..0.240
rows=28 loops=1)
   Index Cond: ((partnum)::text = '1'::text)
 Total runtime: 0.695 ms

Of course I could just make sure I create indexes with match my order by
fields perfectly; which is exactly what I am doing right now.  But I
thought that maybe it might be worth while considering looking at allowing
some sort of in memory sort to be overlaid on an index if the statistics
indicate that the sorts are very nearly ordered.

Andrew