Re: [PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server

2014-02-02 Thread Tom Kincaid
On Fri, Jan 31, 2014 at 8:55 AM, Julien Cigar jci...@ulb.ac.be wrote:

 On Sat, Jan 25, 2014 at 12:02:59PM +0530, RAMAKRISHNAN KANDASAMY wrote:
  Hi All,
 
  I have configured the blow parameters for a 32 GB server . I this
 correct ?
 
  shared_buffers = 6GB

 going over 2GB probably doesn't help


That is true on 32 bit system. On a 64 bit system with 32GB of RAM, there
is a lot of value to be potentially gained by having shared buffers
significantly higher than 2GB.



 It's considered as a bad habit to change the cost settings, but I often
 raise the default cpu_tuple_cost to 0.08 (instead of 0.01) too.

  --
  --Regards
  RAMAKRISHNAN KANDASAMY

 --
 No trees were killed in the creation of this message.
 However, many electrons were terribly inconvenienced.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Thomas John


Re: [PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server

2014-01-31 Thread Julien Cigar
On Sat, Jan 25, 2014 at 12:02:59PM +0530, RAMAKRISHNAN KANDASAMY wrote:
 Hi All,
 
 I have configured the blow parameters for a 32 GB server . I this correct ?
 
 shared_buffers = 6GB

going over 2GB probably doesn't help

 work_mem = 24MB maintenance_work_mem = 250MB

work_mem depends a lot of your queries and the number of clients, but
with 32GB RAM setting a default work_mem of 128MB would probably not
hurt. Your maintenance_work_mem is too low, raise it to 2GB. 

 effective_cache_size = 16GB

if it's a dedicated server you can raise it to 24GB

 shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max =
 1 pg_stat_statements.track = all
 wal_buffers = 8MB
 checkpoint_segments = 32

depends of your load, 10's reasonable for light loads. 50 or 100 isn't
uncommon for heavier ones. Keep in mind that every increase
of 30 will cost you 1 gigabyte of disk space in pg_xlog and an extra
~2-5 minutes (depends of your i/o) of recovery time after a crash.

 checkpoint_completion_target = 0.9
 
 

It's considered as a bad habit to change the cost settings, but I often
raise the default cpu_tuple_cost to 0.08 (instead of 0.01) too.

 -- 
 --Regards
 RAMAKRISHNAN KANDASAMY

-- 
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL 9.3.2 Performance tuning for 32 GB server

2014-01-29 Thread RAMAKRISHNAN KANDASAMY
Hi All,

I have configured the blow parameters for a 32 GB server . I this correct ?

shared_buffers = 6GB
work_mem = 24MB maintenance_work_mem = 250MB
effective_cache_size = 16GB
shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max =
1 pg_stat_statements.track = all
wal_buffers = 8MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9


-- 
--Regards
RAMAKRISHNAN KANDASAMY


[PERFORM] PostgreSQL 9.3.2 Performance issues

2014-01-24 Thread fburgess
 We have 64GB of Memory on RHEL 6.4  shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 1GB effective_cache_size = 48GBI found this list of recommended parameters for memory management in PostgreSQL.  About shared_buffers: Below 2GB, set to 20% of total system memory.Below 32GB, set to 25% of total system memory.Above 32GB, set to 8GB About work_mem, this parameter can cause a huge speed-up if set properly, however it can use that amount of memory per planning node. Here are some recommendations to set it up. Start low: 32-64MBLook for ‘temporary file’ lines in logsSet to 2-3x the largest temp file About maintenance_work_mem, some recommendations were: 10% of system memory, up to1GBMaybe even higher if you are having VACUUM problems About effective_cache_size, guidelines suggested. Set to the amount of file system cache availableIf you don’t know, set it to 50% of total system memoryWe have real time 24/7 data ingest processes running on our 9.3.2 database 7TB in sizeDo these settings look correct for 9.3.2?thanks



Re: [PERFORM] PostgreSQL 9.3.2 Performance issues

2014-01-24 Thread Kevin Grittner
fburg...@radiantblue.com fburg...@radiantblue.com wrote:

 We have 64GB of Memory on RHEL 6.4

 shared_buffers = 8GB
 work_mem = 64MB
 maintenance_work_mem = 1GB
 effective_cache_size = 48GB

 Do these settings look correct for 9.3.2?

Maybe.

What is your max_connections setting?

I find that a good place to start with work_mem is to ignore the
factors you quoted, and to set it somewhere near 25% of machine RAM
divided by max_connections.  It might be possible to go up from
there, but monitor closely for peaks of activity which cause enough
memory allocation to flush the OS cache and cause high disk read
rates, killing performance until the cache repopulates.  The can
take a while since the high disk read rates slow queries, causing
more of them to compete, leading to higher total work_mem
allocations, and thus preventing recovery from the performance
degradation.  In other words, setting this too high leads to
unstable performance.  It looks better than a lower setting until
too many users hit Enter at about the same time, causing
performance to collapse for a while.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-05-18 Thread jonranes
I am having the same exact problems.  I reduced shared buffers as that seems
to have done the trick for now in this thread.  If things improve I'll post
back and confirm.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PostgreSQL-9-2-3-performance-problem-caused-Exclusive-locks-tp5747909p5756113.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-03-14 Thread Emre Hasegeli
On Thu, 14 Mar 2013 06:53:55 +0200, Jeff Janes jeff.ja...@gmail.com  
wrote:



On Friday, March 8, 2013, Emre Hasegeli wrote:

PostgreSQL writes several following logs during the problem which I never

saw before 9.2.3:

LOG:  process 4793 acquired ExclusiveLock on extension of relation  
305605

of database 16396 after 2348.675 ms



The key here is not that it is an ExclusiveLock, but rather than it is  
the

relation extension lock.  I don't think the extension lock is ever held
across user-code, or transaction boundaries, or anything like that.  It  
is
held over some small IOs.  So if it blocked on that for over 2 seconds,  
you

almost surely have some serious IO congestion.

And this particular message is probably more a symptom of that congestion
than anything else.

You said you rolled back to 9.2.2 and the stalling is still there.  Are  
you

still seeing the log message, or are you now seeing silently stalls?  Did
you roll back all other changes that were made at the same time as the
upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)?


I did not try with different kernel or file system. It was not because of
9.2.3, same problem occurred in both 9.2.2 and 9.2.3. Increasing max
connections make it worse. It lasts almost 15 minutes in the last time.

There were not much disk utilization while it is happening, top was
pointing out most of the CPU usage on the %sy column, there were no IO  
wait.

I saw allocstalls increasing on atop. There were a lot of slow insert
statements in the logs except ExclusiveLock waits.

We were using 64 GiB of shared buffers. RhodiumToad suggested to reduce it
on the IRC channel. It did not happen since then.

It was a real problem for us. I could not find anything related to it. I
cannot let it happen again on the production environment but I would be
happy to share more experience, if it would help you fix it.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-03-13 Thread Joshua Berkus
Emre,

  LOG:  process 4793 acquired ExclusiveLock on extension of relation
  305605 of database 16396 after 2348.675 ms

The reason you're seeing that message is that you have log_lock_waits turned on.

That message says that some process waited for 2.3 seconds to get a lock for 
expanding the size of relation 16396/305605, which is most likely an index. 
This is most likely due to changes in your application, or an increase in 
concurrent write activity.

--Josh



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-03-13 Thread Jeff Janes
On Friday, March 8, 2013, Emre Hasegeli wrote:

PostgreSQL writes several following logs during the problem which I never
 saw before 9.2.3:

 LOG:  process 4793 acquired ExclusiveLock on extension of relation 305605
 of database 16396 after 2348.675 ms


The key here is not that it is an ExclusiveLock, but rather than it is the
relation extension lock.  I don't think the extension lock is ever held
across user-code, or transaction boundaries, or anything like that.  It is
held over some small IOs.  So if it blocked on that for over 2 seconds, you
almost surely have some serious IO congestion.

And this particular message is probably more a symptom of that congestion
than anything else.

You said you rolled back to 9.2.2 and the stalling is still there.  Are you
still seeing the log message, or are you now seeing silently stalls?  Did
you roll back all other changes that were made at the same time as the
upgrade to 9.2.3 (kernel versions, filesystem changes/versions, etc.)?

Cheers,

Jeff


Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-03-12 Thread Emre Hasegeli

2013-03-08 13:27:16 +0200 Emre Hasegeli emre.haseg...@tart.com.tr:


PostgreSQL writes several following logs during the problem which I never
saw before 9.2.3:

LOG:  process 4793 acquired ExclusiveLock on extension of relation  
305605 of database 16396 after 2348.675 ms


I tried

* to downgrade to 9.2.2
* to disable autovacuum
* to disable synchronous commit
* to write less on the big tables
* to increase checkpoint segments
* to increase max connections
* to move pg_xlog to sepe

None of them helps to avoid downtimes. I could not find anything related
to it? Do you have any idea? Have you ever experience something like this?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.2.3 performance problem caused Exclusive locks

2013-03-08 Thread Emre Hasegeli

On Fri, 08 Mar 2013 20:39:45 +0200, AI Rumman rumman...@gmail.com wrote:


Knowing your problem, I read the docs and found that :
*
*

*Fix performance problems with autovacuum truncation in busy workloads  
(Jan

Wieck)*

*Truncation of empty pages at the end of a table requires exclusive lock,
but autovacuum was coded to fail (and release the table lock) when there
are conflicting lock requests. Under load, it is easily possible that
truncation would never occur, resulting in table bloat. Fix by  
performing a

partial truncation, releasing the lock, then attempting to re-acquire the
lock and  
continuehttp://www.postgresql.org/docs/9.2/static/release-9-2-3.html#.

This fix also greatly reduces the average time before autovacuum releases
the lock after a conflicting request arrives.*

This could be a reason of your locking.


Yes, I saw this. It is commit b19e4250b45e91c9cbdd18d35ea6391ab5961c8d by
Jan Wieck. He also seems worried in the commit message about this patch. Do
you think this is the exact reason of the problem?

I have downgraded to 9.2.2, decreased the autovacuum_max_workers to 2 from
10 and increase max_connections to 500 from 200 in the mean time. There are
not any ExclusiveLock's since then.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Replication Performance

2012-01-04 Thread Saurabh
Thank you all for the valuable information. Now we have decide to go
with streaming replication. I did the setup on machine and it is
working good. Now I have to implement the automatic failover. Please
share a solution for the same.

Saurabh Gupta

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Replication Performance

2012-01-04 Thread Aleksej Trofimov

On 01/04/2012 12:41 PM, Saurabh wrote:

Thank you all for the valuable information. Now we have decide to go
with streaming replication. I did the setup on machine and it is
working good. Now I have to implement the automatic failover. Please
share a solution for the same.

Saurabh Gupta

You ca use pgpool-II for automatic failover and connection cache. This 
article is good enough

http://pgpool.projects.postgresql.org/contrib_docs/simple_sr_setting/index.html

Also do not forget to configure Postgres max_connections = (pgpool) 
num_init_children*max_pool if you'll use connections cache.


--
Best regards

Aleksej Trofimov



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Replication Performance

2011-12-29 Thread Merlin Moncure
On Thu, Dec 29, 2011 at 3:33 AM, sgupta saurabh@gmail.com wrote:
 I am doing POC on Posgtresql replication. I am using latest version of
 postgresql i.e. 9.1. There are multiple replication solutions avaliable in
 the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built
 replication solutions (Streaming replication, Warm Standby and hot standby).
 I am confused which solution is best for the financial application for which
 I am doing POC. The application will write around 160 million records with
 row size of 2.5 KB in database. My questions is for following scenarios
 which replication solution will be suitable:

 If I would require replication for backup purpose only
 If I would require to scale the reads
 If I would require High Avaliability and Consistency
 Also It will be very helpful if you can share the perfomance or experience
 with postgresql replication solutions.

The built in HS/SR integrates with the postgres engine (over the WAL
system) at a very low level and is going to be generally faster and
more robust.  More importantly, it has a very low administrative
overhead -- the underlying mechanism of log shipping has been tweaked
and refined continually since PITR was released in 8.0.  Once you've
done it a few times, it's a five minute procedure to replicate a
database (not counting, heh, the base database copy).

The main disadvantage of HS/SR is inflexibility: you get an exact
replica of a database cluster.  Slony (which is a trigger based
system) and pgpool (which is statement replication) can do a lot of
funky things that hs/sr can't do -- so they definitely fill a niche
depending on what your requirements are.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Replication Performance

2011-12-29 Thread Aleksej Trofimov

On 12/29/2011 11:33 AM, sgupta wrote:

I am doing POC on Posgtresql replication. I am using latest version of
postgresql i.e. 9.1. There are multiple replication solutions avaliable in
the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built
replication solutions (Streaming replication, Warm Standby and hot standby).
I am confused which solution is best for the financial application for which
I am doing POC. The application will write around 160 million records with
row size of 2.5 KB in database. My questions is for following scenarios
which replication solution will be suitable:

If I would require replication for backup purpose only
If I would require to scale the reads
If I would require High Avaliability and Consistency
Also It will be very helpful if you can share the perfomance or experience
with postgresql replication solutions.

Thanks

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Postgresql-Replication-Performance-tp5107278p5107278.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.



Which replication solution will be suitable depends on your needs and 
database architecture.
Starting with PGCluster, I can say only, that PGCluster last released in 
2005 year, so you can not use it with Postgres 9.0 =)
Slony-I is a good solution if you want to have cascade replication from 
Slave to Slave or you want to replicate only several parts of your 
database (because Slony performs table level like replication)
PGPool-II is an advanced load balancer and pooling solution. Which also 
has replication support. Pgpool-II is query based replication utility, 
which performs queries on several database servers. If you are looking 
for performance and stability I do not recommend using PGPool as 
replication software.
Postgres Streaming replication is WAL based replication, so using this 
type of replication you will have absolutely identical database servers, 
what is best choice for HA and scaling reads. Also this choice is not 
practically affecting performance, because it is not adding any latency 
to database layer.


Also you could read about difference between Slony and Streaming 
replications here 
http://scanningpages.wordpress.com/2010/10/09/9-0-streaming-replication-vs-slony/



--
Best regards

Aleksej Trofimov



Re: [PERFORM] Postgresql Replication Performance

2011-12-29 Thread Aleksej Trofimov

On 12/29/2011 05:00 PM, Claudio Freire wrote:

Second, is that WAL activity on streaming replication or WAL shipping
is documented to contain more data than on non-replicated setups. What
is not clear is how much more data. This not only affects our network
bandwidth estimations, but also I/O load on the master server, slowing
writes (and some reads that cannot happen on the slave).
Our database has about 2MB/s writes on the WAL array, we had about 160 
IOPS in average when replications was switched off, and 165-170 IOPS in 
replication. This I think could be explained with statistical error, so 
we have not experienced any I/O load on our master server since 
replication was configured.



--
Best regards

Aleksej Trofimov


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgresql Replication Performance

2011-12-29 Thread sgupta
I am doing POC on Posgtresql replication. I am using latest version of
postgresql i.e. 9.1. There are multiple replication solutions avaliable in
the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built
replication solutions (Streaming replication, Warm Standby and hot standby).
I am confused which solution is best for the financial application for which
I am doing POC. The application will write around 160 million records with
row size of 2.5 KB in database. My questions is for following scenarios
which replication solution will be suitable:

If I would require replication for backup purpose only
If I would require to scale the reads
If I would require High Avaliability and Consistency
Also It will be very helpful if you can share the perfomance or experience
with postgresql replication solutions.

Thanks

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Postgresql-Replication-Performance-tp5107278p5107278.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Replication Performance

2011-12-29 Thread Claudio Freire
On Thu, Dec 29, 2011 at 11:33 AM, Aleksej Trofimov
aleksej.trofi...@ruptela.lt wrote:
 Postgres Streaming replication is WAL based replication, so using this type
 of replication you will have absolutely identical database servers, what is
 best choice for HA and scaling reads. Also this choice is not practically
 affecting performance, because it is not adding any latency to database
 layer.

Let me chime in, because I'm in a similar situation. I'm preparing a
POC WAL-replicated environment, and testing up until now has been
inconclusive since we lack the kind of hardware in our test
environment. I know I should require it, testing on similar hardware
is the only way to get reliable results, but getting the budget
approved would take way too long, and right now we're in a hurry to
scale reads.

So getting the hardware is not an option, my option is asking those
who have the experience :-)

I gather WAL replication introduces only a few possible bottlenecks.

First, network bandwidth between master and slaves, and my app does
write a lot - our monitoring tools show, today, an average of 1MB/s
writes on the WAL array, with peaks exceeding 8MB/s, which can easily
saturate our lowly 100Mb/s links. No worries, we can upgrade to 1Gb/s
links.

Second, is that WAL activity on streaming replication or WAL shipping
is documented to contain more data than on non-replicated setups. What
is not clear is how much more data. This not only affects our network
bandwidth estimations, but also I/O load on the master server, slowing
writes (and some reads that cannot happen on the slave).

So, my question is, in your experience, how much of an increase in WAL
activity can be expected?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-05 Thread Merlin Moncure
On Tue, Aug 4, 2009 at 4:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Scott Carey sc...@richrelevance.com writes:
 There are a handful of other compression algorithms very similar to LZO in
 performance / compression level under various licenses.
 LZO is just the best known and most widely used.

 And after we get done with the license question, we need to ask about
 patents.  The compression area is just a minefield of patents.  gzip is
 known to avoid all older patents (and would be pretty solid prior art
 against newer ones).  I'm far less confident about lesser-known systems.

I did a little bit of research.  LZO and friends are variants of LZW.
The main LZW patent died in 2003, and AFAIK there has been no patent
enforcement cases brought against LZO or it's cousins (LZO dates to
1996).  OK, I'm no attorney, etc, but the internet seems to believe
that the algorithms are patent free.  LZO is quite widely used, in
both open source and some relatively high profile commercial projects.

I downloaded the libraries and did some tests.
2.5 G sql dump:

compression time:
zlib: 4m 1s
lzo: 17s
fastlz: 28.8s
liblzf: 26.7s

compression size:
zlib: 609M 75%
lzo: 948M 62%
fastlz: 936M 62.5%
liblzf: 916M 63.5%

A couple of quick notes: liblzf produces (possibly) architecture
dependent archives according to its header, and fastlz is not declared
'stable' according to its website.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-05 Thread Scott Carey



On 8/5/09 7:12 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Aug 4, 2009 at 4:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Scott Carey sc...@richrelevance.com writes:
 There are a handful of other compression algorithms very similar to LZO in
 performance / compression level under various licenses.
 LZO is just the best known and most widely used.
 
 And after we get done with the license question, we need to ask about
 patents.  The compression area is just a minefield of patents.  gzip is
 known to avoid all older patents (and would be pretty solid prior art
 against newer ones).  I'm far less confident about lesser-known systems.
 
 I did a little bit of research.  LZO and friends are variants of LZW.
 The main LZW patent died in 2003, and AFAIK there has been no patent
 enforcement cases brought against LZO or it's cousins (LZO dates to
 1996).  OK, I'm no attorney, etc, but the internet seems to believe
 that the algorithms are patent free.  LZO is quite widely used, in
 both open source and some relatively high profile commercial projects.
 

That doesn't sound right to me, LZW is patent protected in a few ways, and
is a LZ78 scheme.

LZO, zlib, and the others here are LZ77 schemes which avoid the LZW patents.
There are some other patents in the territory with respect to how the hash
lookups are done for the LZ77 'sliding window' approach.  Most notably,
using a tree is patented, and a couple other (obvious) tricks that are
generally avoided anyway for any algorithms that are trying to be fast
rather than produce the highest compression.

http://en.wikipedia.org/wiki/Lossless_data_compression#Historical_legal_issu
es
http://en.wikipedia.org/wiki/LZ77_and_LZ78
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch
http://www.faqs.org/faqs/compression-faq/part1/section-7.html
http://www.ross.net/compression/patents.html

Note, US patents are either 17 years after grant, or 20 years after filing.
A very large chunk of those in this space have expired, but a few were
filed/granted in the early 90's -- though those are generally more specific
and easy to avoid.  Or very obvious duplicates of previous patents.

More notably, one of these, if interpreted broadly, would apply to zlib as
well (Gibson and Graybill) but the patent mentions LZRW1, and any broader
scope would have prior art conflicts with ones that are now long expired.
Its 17 years after grant on that, but not 20 years after filing.




 I downloaded the libraries and did some tests.
 2.5 G sql dump:
 
 compression time:
 zlib: 4m 1s
 lzo: 17s
 fastlz: 28.8s
 liblzf: 26.7s
 
 compression size:
 zlib: 609M 75%
 lzo: 948M 62%
 fastlz: 936M 62.5%
 liblzf: 916M 63.5%
 

Interesting how that conflicts with some other benchmarks out there (where
LZO ad the others are about the same).  But, they're all an order of
magnitude faster than gzip/zlib.


 A couple of quick notes: liblzf produces (possibly) architecture
 dependent archives according to its header, and fastlz is not declared
 'stable' according to its website.
 


 merlin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-04 Thread Merlin Moncure
On Mon, Aug 3, 2009 at 5:30 PM, PFCli...@peufeu.com wrote:

 lzo is much, much, (much) faster than zlib.  Note, I've tried several

 decompression speed is even more awesome...

 times to contact the author to get clarification on licensing terms
 and have been unable to get a response.

 lzop and the LZO library are distributed under the terms of the GNU General
 Public License (GPL).
 source : http://www.lzop.org/lzop_man.php

yeah...I have another project I'm working on that is closed source,
plus I was curious if something could be worked out for pg...lzo seems
ideal for database usage. The author is MIA or too busy hacking to
answer his email :-).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-04 Thread Scott Carey

On 8/4/09 8:30 AM, Alvaro Herrera alvhe...@commandprompt.com wrote:

 Merlin Moncure escribió:
 On Mon, Aug 3, 2009 at 5:30 PM, PFCli...@peufeu.com wrote:
 
 lzo is much, much, (much) faster than zlib.  Note, I've tried several
 
 decompression speed is even more awesome...
 
 times to contact the author to get clarification on licensing terms
 and have been unable to get a response.
 
 lzop and the LZO library are distributed under the terms of the GNU General
 Public License (GPL).
 source : http://www.lzop.org/lzop_man.php
 
 yeah...I have another project I'm working on that is closed source,
 plus I was curious if something could be worked out for pg...lzo seems
 ideal for database usage.
 
 I think this was already discussed here.  It turns out that a specific
 exception for PG wouldn't be acceptable because of the multiple
 commercial derivates.  LZO would have to become BSD, which presumably
 the author just doesn't want to do.
 
 Maybe we could have a --enable-lzo switch similar to what we do with
 readline.  Of course, a non-LZO-enabled build would not be able to read
 a dump from such a build.  (We could also consider LZO for TOAST
 compression).
 

There are a handful of other compression algorithms very similar to LZO in
performance / compression level under various licenses.

LZO is just the best known and most widely used.

http://www.fastlz.org/  (MIT)
http://www.quicklz.com/  (GPL again)
http://oldhome.schmorp.de/marc/liblzf.html  (BSD -ish)

ZFS uses LZJB (CDDL) source code here:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/
compress.c
(a good read for one of the most simple LZ compression algorithms in terms
of lines of code -- about 100 lines)

Fastlz, with its MIT license, is probably the most obvious choice.

 --
 Alvaro Herrerahttp://www.CommandPrompt.com/
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-04 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 There are a handful of other compression algorithms very similar to LZO in
 performance / compression level under various licenses.
 LZO is just the best known and most widely used.

And after we get done with the license question, we need to ask about
patents.  The compression area is just a minefield of patents.  gzip is
known to avoid all older patents (and would be pretty solid prior art
against newer ones).  I'm far less confident about lesser-known systems.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread Scott Carey

On 7/31/09 4:01 PM, PFC li...@peufeu.com wrote:

 On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Greg Stark gsst...@mit.edu writes:
 On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I did some tracing and verified that pg_dump passes data to deflate()
 one table row at a time.  I'm not sure about the performance
 implications of that, but it does seem like it might be something to
 look into.
 
 I suspect if this was a problem the zlib people would have added
 internal buffering ages ago. I find it hard to believe we're not the
 first application to use it this way.
 
 I dug into this a bit more.  zlib *does* have internal buffering --- it
 has to, because it needs a minimum lookahead of several hundred bytes
 to ensure that compression works properly.  The per-call overhead of
 deflate() looks a bit higher than one could wish when submitting short
 chunks, but oprofile shows that pg_dump -Fc breaks down about like
 this:
 
 During dump (size of dump is 2.6 GB),
 
 No Compression :
 - postgres at 70-100% CPU and pg_dump at something like 10-20%
 - dual core is useful (a bit...)
 - dump size 2.6G
 - dump time 2m25.288s
 
 Compression Level 1 :
 - postgres at 70-100% CPU and pg_dump at 20%-100%
 - dual core is definitely useful
 - dump size 544MB
 - dump time 2m33.337s
 
 Since this box is mostly idle right now, eating CPU for compression is no
 problem...
 

I get very different (contradictory) behavior.  Server with fast RAID, 32GB
RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
8.3.6
No disk wait time during any test.  One test beforehand was used to prime
the disk cache.
100% CPU in the below means one core fully used.  800% means the system is
fully loaded.

pg_dump  file  (on a subset of the DB with lots of tables with small
tuples)
6m 27s, 4.9GB;  12.9MB/sec
50% CPU in postgres, 50% CPU in pg_dump

pg_dump -Fc  file.gz
9m6s, output is 768M  (6.53x compression); 9.18MB/sec
30% CPU in postgres, 70% CPU in pg_dump

pg_dump | gzip  file.2.gz
6m22s, 13MB/sec. 
50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip

The default (5) compression level was used.

So, when using pg_dump alone, I could not get significantly more than one
core of CPU (all on the same box).  No matter how I tried, pg_dump plus the
postgres process dumping data always totaled about 102% -- it would
flulctuate in top, give or take 15% at times, but the two always were very
close (within 3%) of this total.

Piping the whole thing to gzip gets some speedup.  This indicates that
perhaps the implementation or use of gzip is inappropriate on pg_dump's side
or the library version is older or slower.  Alternatively, the use of gzip
inside pg_dump fails to pipeline CPU useage as well as piping it does, as
the above shows 50% more CPU utilization when piping.

I can do the same test with a single table that is 10GB later (which does
dump much faster than 13MB/sec and has rows that average about 500 bytes in
size).  But overall I have found pg_dump's performace sorely lacking, and
this is a data risk in the big picture.  Postgres is very good about not
losing data, but that only goes up to the limits of the hardware and OS,
which is not good enough.  Because of long disaster recovery times and poor
replication/contingency features, it is a fairly unsafe place for data once
it gets beyond a certain size and a BC plan requires minimal downtime.

 Adding an option to use LZO instead of gzip could be useful...
 
 Compressing the uncompressed 2.6GB dump :
 
 - gzip -1 :
 
 - compressed size : 565 MB
 - compression throughput : 28.5 MB/s
 - decompression throughput : 74 MB/s
 
 - LZO -1 :
 - compressed size : 696M
 - compression throughput : 86 MB/s
 - decompression throughput : 247 MB/s
 
 Conclusion : LZO could help for fast disks (RAID) or slow disks on a
 CPU-starved server...
 

LZO would be a great option, it is very fast, especially decompression.
With gzip, one rarely gains by going below gzip -3 or above gzip -6.


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 I get very different (contradictory) behavior.  Server with fast RAID, 32GB
 RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
 8.3.6
 No disk wait time during any test.  One test beforehand was used to prime
 the disk cache.
 100% CPU in the below means one core fully used.  800% means the system is
 fully loaded.

 pg_dump  file  (on a subset of the DB with lots of tables with small
 tuples)
 6m 27s, 4.9GB;  12.9MB/sec
 50% CPU in postgres, 50% CPU in pg_dump

 pg_dump -Fc  file.gz
 9m6s, output is 768M  (6.53x compression); 9.18MB/sec
 30% CPU in postgres, 70% CPU in pg_dump

 pg_dump | gzip  file.2.gz
 6m22s, 13MB/sec. 
 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip

I don't see anything very contradictory here.  What you're demonstrating
is that it's nice to be able to throw a third CPU at the compression
part of the problem.  That's likely to remain true if we shift to a
different compression algorithm.  I suspect if you substituted lzo for
gzip in the third case, the picture wouldn't change very much.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread Scott Carey



On 8/3/09 11:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Carey sc...@richrelevance.com writes:
 I get very different (contradictory) behavior.  Server with fast RAID, 32GB
 RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
 8.3.6
 No disk wait time during any test.  One test beforehand was used to prime
 the disk cache.
 100% CPU in the below means one core fully used.  800% means the system is
 fully loaded.
 
 pg_dump  file  (on a subset of the DB with lots of tables with small
 tuples)
 6m 27s, 4.9GB;  12.9MB/sec
 50% CPU in postgres, 50% CPU in pg_dump
 
 pg_dump -Fc  file.gz
 9m6s, output is 768M  (6.53x compression); 9.18MB/sec
 30% CPU in postgres, 70% CPU in pg_dump
 
 pg_dump | gzip  file.2.gz
 6m22s, 13MB/sec.
 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip
 
 I don't see anything very contradictory here.

The other poster got nearly 2 CPUs of work from just pg_dump + postgres.
That contradicts my results (but could be due to data differences or
postgres version differences).
In the other use case, compression was not slower, but just used more CPU
(also contradicting my results).


 What you're demonstrating
 is that it's nice to be able to throw a third CPU at the compression
 part of the problem.

No, 1.5 CPU. A full use of a second would even be great.

I'm also demonstrating that there is some artificial bottleneck somewhere
preventing postgres and pg_dump to operate concurrently.  Instead, one waits
while the other does work.

Your claim earlier in this thread was that there was already pipelined work
being done due to pg_dump + postgresql -- which seems to be true for the
other test case but not mine.

As a consequence, adding compression throttles the postgres process even
though the compression hasn't caused 100% CPU (or close) on any task
involved.

  That's likely to remain true if we shift to a
 different compression algorithm.  I suspect if you substituted lzo for
 gzip in the third case, the picture wouldn't change very much.
 

That is exactly the point.  LZO would be nice (and help mitigate this
problem), but it doesn't solve the real problem here.  Pg_dump is slow and
artificially throttles without even getting 100% CPU from itself or
postgres.

The problem still remains:  dumping with -Fc can be significantly slower
than raw piped to a compression utility, even if no task is CPU or I/O
bound. Dumping and piping to gzip is faster.  But parallel restore won't
work without custom or raw format.



 regards, tom lane
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread PFC


I get very different (contradictory) behavior.  Server with fast RAID,  
32GB

RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
8.3.6


	That's a very different serup from my (much less powerful) box, so that  
would explain it...



No disk wait time during any test.  One test beforehand was used to prime
the disk cache.
100% CPU in the below means one core fully used.  800% means the system  
is

fully loaded.

pg_dump  file  (on a subset of the DB with lots of tables with small
tuples)
6m 27s, 4.9GB;  12.9MB/sec
50% CPU in postgres, 50% CPU in pg_dump


	If there is no disk wait time, then why do you get 50/50 and not 100/100  
or at least 1 core maxed out ? That's interesting...


COPY annonces TO '/dev/null';
COPY 413526
Temps : 13871,093 ms

\copy annonces to '/dev/null'
Temps : 14037,946 ms

time pg_dump -Fc -t annonces -U annonces --compress=0 annonces /dev/null
real0m14.596s
user0m0.700s
sys 0m0.372s

	In all 3 cases postgres maxes out one core (I've repeated the test until  
all data was cached, so there is no disk access at all in vmstat).

Size of dump is 312MB.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread Merlin Moncure
On Mon, Aug 3, 2009 at 2:56 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I don't see anything very contradictory here.  What you're demonstrating
 is that it's nice to be able to throw a third CPU at the compression
 part of the problem.  That's likely to remain true if we shift to a
 different compression algorithm.  I suspect if you substituted lzo for
 gzip in the third case, the picture wouldn't change very much.

lzo is much, much, (much) faster than zlib.  Note, I've tried several
times to contact the author to get clarification on licensing terms
and have been unable to get a response.

[r...@devdb merlin]# time lzop -c dump.sql  /dev/null

real0m16.683s
user0m15.573s
sys 0m0.939s
[r...@devdb merlin]# time gzip -c dump.sql  /dev/null

real3m43.090s
user3m41.471s
sys 0m1.036s

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-08-03 Thread PFC



lzo is much, much, (much) faster than zlib.  Note, I've tried several


decompression speed is even more awesome...


times to contact the author to get clarification on licensing terms
and have been unable to get a response.


lzop and the LZO library are distributed under the terms of the GNU  
General Public License (GPL).

source : http://www.lzop.org/lzop_man.php

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-31 Thread Stefan Kaltenbrunner

Scott Carey wrote:



On 7/30/09 11:24 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:


Kevin Grittner wrote:

Tom Lane t...@sss.pgh.pa.us wrote:

Kevin Grittner kevin.gritt...@wicourts.gov writes:

Since the dump to custom format ran longer than the full pg_dump
piped directly to psql would have taken, the overall time to use
this technique is clearly longer for our databases on our hardware.

Hmmm ... AFAIR there isn't a good reason for dump to custom format
to take longer than plain text dump, except for applying
compression.  Maybe -Z0 would be worth testing?  Or is the problem
that you have to write the data to a disk file rather than just
piping it?

I did some checking with the DBA who normally copies these around for
development and test environments.  He confirmed that when the source
and target are on the same machine, a pg_dump piped to psql takes
about two hours.  If he pipes across the network, it runs more like
three hours.

My pg_dump to custom format ran for six hours.  The single-transaction
restore from that dump file took two hours, with both on the same
machine.  I can confirm with benchmarks, but this guy generally knows
what he's talking about (and we do create a lot of development and
test databases this way).

Either the compression is tripling the dump time, or there is
something inefficient about how pg_dump writes to the disk.

seems about right - compression in pg_dump -Fc is a serious bottleneck
and unless can significantly speed it up or make it use of multiple
cores (either for the dump itself - which would be awsome - or for the
compression) I would recommend to not use it at all.



That's not an option when a dump compressed is 200GB and uncompressed is
1.3TB, for example.


yeah that was not meant as don't use compression at all but rather as 
use a different way to compress than what pg_dump provides internally.



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-31 Thread Rauan Maemirov
That's true. I tried to lower work_mem from 192 to 64, and it caused
total slowdown.
By the way, is there any performance tips for tuning joins? I noticed,
that my joins on 8.4 slowed down, on 8.3 it was faster a bit.

2009/7/31 Scott Marlowe scott.marl...@gmail.com:
 On Thu, Jul 30, 2009 at 10:10 PM, Greg Smithgsm...@gregsmith.com wrote:
 On Thu, 30 Jul 2009, Rauan Maemirov wrote:

 maintenance_work_mem = 1GB
 work_mem = 192MB
 shared_buffers = 7680MB
 max_connections = 80
 My box is Nehalem 2xQuad 2.8 with RAM 32Gb

 While it looks like you sorted out your issue downthread, I wanted to point
 out that your setting for work_mem could be dangerously high here and
 contribute to problems

 The real danger here is that you can set up your pg server to fail
 ONLY under heavy load, when it runs out of memory and goes into a swap
 storm.  So, without proper load testing and profiling, you may not
 know you're headed for danger until your server goes unresponsive
 midday at the most critical of times.  And restarting it will just
 lead to the same failure again as the clients all reconnect and pummel
 your server.

 Meanwhile, going from 192 to 16MB might result in a total slowdown
 measured in a fraction of a percentage overall, and prevent this kind
 of failure.

 If there's one single request you can justify big work_mem for then
 set it for just that one query.  It's not uncommon to have a reporting
 user limited to a few connections and with alter user reportinguser
 set work_mem='512MB'; so that it can run fast but not deplete your
 server's resources on accident during heavy load.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-31 Thread Merlin Moncure
On Thu, Jul 30, 2009 at 10:07 AM, Rauan Maemirovra...@maemirov.com wrote:
 Unfortunately had to downgrade back to 8.3. Now having troubles with
 that and still solving them.

 For future upgrade, what is the basic steps?

Was the database analyzed recently?
 Hm... there was smth like auto analyzer in serverlog when i started it
 first time, but i didn't mention that.
 Should I analyze whole db? How to do it?

 And how should I change _cost variables?

 I/O was very high. at first memory usage grew up and then began to full swap.

There is at least one known case of memory leak 8.4.0.  Possibly you
got hit by that or another early adopter bug.  I think in your case
it's probably to soon to have upgraded...with 10k connections even
minor annoyances can be real nasty.

I'd wait a few months while in the meantime stage your app on a
testing database and double check the important query plans to make
sure there are no big performance regressions.  Each version of pg has
a couple for various reasons.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-31 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I did some tracing and verified that pg_dump passes data to deflate()
 one table row at a time.  I'm not sure about the performance
 implications of that, but it does seem like it might be something to
 look into.

 I suspect if this was a problem the zlib people would have added
 internal buffering ages ago. I find it hard to believe we're not the
 first application to use it this way.

I dug into this a bit more.  zlib *does* have internal buffering --- it
has to, because it needs a minimum lookahead of several hundred bytes
to ensure that compression works properly.  The per-call overhead of
deflate() looks a bit higher than one could wish when submitting short
chunks, but oprofile shows that pg_dump -Fc breaks down about like
this:

samples  %image name   symbol name
1103922  74.7760  libz.so.1.2.3longest_match
215433   14.5927  libz.so.1.2.3deflate_slow
55368 3.7504  libz.so.1.2.3compress_block
41715 2.8256  libz.so.1.2.3fill_window
17535 1.1878  libc-2.9.so  memcpy
13663 0.9255  libz.so.1.2.3adler32
4613  0.3125  libc-2.9.so  _int_malloc
2942  0.1993  libc-2.9.so  free
2552  0.1729  libc-2.9.so  malloc
2155  0.1460  libz.so.1.2.3pqdownheap
2128  0.1441  libc-2.9.so  _int_free
1702  0.1153  libz.so.1.2.3deflate
1648  0.1116  libc-2.9.so  mempcpy

longest_match is the core lookahead routine and is not going to be
affected by submission sizes, because it isn't called unless adequate
data (ie, the longest possible match length) is available in zlib's
internal buffer.  It's possible that doing more buffering on our end
would reduce the deflate_slow component somewhat, but it looks like
the most we could hope to get that way is in the range of 10% speedup.
So I'm wondering if anyone can provide concrete evidence of large
wins from buffering zlib's input.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-31 Thread PFC

On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane t...@sss.pgh.pa.us wrote:


Greg Stark gsst...@mit.edu writes:

On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote:

I did some tracing and verified that pg_dump passes data to deflate()
one table row at a time.  I'm not sure about the performance
implications of that, but it does seem like it might be something to
look into.



I suspect if this was a problem the zlib people would have added
internal buffering ages ago. I find it hard to believe we're not the
first application to use it this way.


I dug into this a bit more.  zlib *does* have internal buffering --- it
has to, because it needs a minimum lookahead of several hundred bytes
to ensure that compression works properly.  The per-call overhead of
deflate() looks a bit higher than one could wish when submitting short
chunks, but oprofile shows that pg_dump -Fc breaks down about like
this:


During dump (size of dump is 2.6 GB),

No Compression :
- postgres at 70-100% CPU and pg_dump at something like 10-20%
- dual core is useful (a bit...)
- dump size 2.6G
- dump time 2m25.288s

Compression Level 1 :
- postgres at 70-100% CPU and pg_dump at 20%-100%
- dual core is definitely useful
- dump size 544MB
- dump time 2m33.337s

Since this box is mostly idle right now, eating CPU for compression is no  
problem...


Adding an option to use LZO instead of gzip could be useful...

Compressing the uncompressed 2.6GB dump :

- gzip -1 :

- compressed size : 565 MB
- compression throughput : 28.5 MB/s
- decompression throughput : 74 MB/s

- LZO -1 :
- compressed size : 696M
- compression throughput : 86 MB/s
- decompression throughput : 247 MB/s

Conclusion : LZO could help for fast disks (RAID) or slow disks on a  
CPU-starved server...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Rauan Maemirov
Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
before and everything worked fine for me.

And now i have ~93% cpu load. Here's changed values of config:

default_statistics_target = 50
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 192MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7680MB
max_connections = 80


My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
postgresql working on it.

For connection pooling i'm using pgbouncer's latest version with
pool_size 20 (used 30 before, but now lowered) and 10k connections.

What parameters i should give more attention on?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread tv
 Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
 before and everything worked fine for me.

 And now i have ~93% cpu load. Here's changed values of config:

 default_statistics_target = 50
 maintenance_work_mem = 1GB
 constraint_exclusion = on
 checkpoint_completion_target = 0.9
 effective_cache_size = 22GB
 work_mem = 192MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 7680MB
 max_connections = 80


 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
 postgresql working on it.

 For connection pooling i'm using pgbouncer's latest version with
 pool_size 20 (used 30 before, but now lowered) and 10k connections.

 What parameters i should give more attention on?


All the values seem quite reasonable to me. What about the _cost variables?

I guess one or more queries are evaluated using a different execution
plan, probably sequential scan instead of index scan, hash join instead of
merge join, or something like that.

Try to log the slow statements - see log_min_statement_duration. That
might give you slow queries (although not necessarily the ones causing
problems), and you can analyze them.

What is the general I/O activity? Is there a lot of data read/written to
the disks, is there a lot of I/O wait?

regards
Tomas

PS: Was the database analyzed recently?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread tv
 Unfortunately had to downgrade back to 8.3. Now having troubles with
 that and still solving them.

 For future upgrade, what is the basic steps?

1. create database
2. dump the data from the old database
3. load the data into the new database
4. analyze etc. (I prefer to do this manually at the beginning)
5. check that everything is working (that the correct execution plans are
used, etc.)

You may even run the (2) and (3) at once - use pipe instead of a file.


Was the database analyzed recently?
 Hm... there was smth like auto analyzer in serverlog when i started it
 first time, but i didn't mention that.
 Should I analyze whole db? How to do it?

Just execute 'ANALYZE' and the whole database will be analyzed, but when
the autovacuum daemon is running this should be performed automatically (I
guess - check the pg_stat_user_tables, there's information about last
manual/automatic vacuuming and/or analysis).

 And how should I change _cost variables?

I haven't noticed you've not modified those variables, so don't change them.

 I/O was very high. at first memory usage grew up and then began to full
 swap.

OK, this seems to be the cause. What were the original values of the
config variables? If you've lowered the work_mem and you need to sort a
lot of data, this may be a problem. What amounts of data are you working
with? If the data were not analyzed recently, the execution plans will be
inefficient and this may be the result.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Rauan Maemirov
Unfortunately had to downgrade back to 8.3. Now having troubles with
that and still solving them.

For future upgrade, what is the basic steps?

Was the database analyzed recently?
Hm... there was smth like auto analyzer in serverlog when i started it
first time, but i didn't mention that.
Should I analyze whole db? How to do it?

And how should I change _cost variables?

I/O was very high. at first memory usage grew up and then began to full swap.

2009/7/30  t...@fuzzy.cz:
 Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
 before and everything worked fine for me.

 And now i have ~93% cpu load. Here's changed values of config:

 default_statistics_target = 50
 maintenance_work_mem = 1GB
 constraint_exclusion = on
 checkpoint_completion_target = 0.9
 effective_cache_size = 22GB
 work_mem = 192MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 7680MB
 max_connections = 80


 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
 postgresql working on it.

 For connection pooling i'm using pgbouncer's latest version with
 pool_size 20 (used 30 before, but now lowered) and 10k connections.

 What parameters i should give more attention on?


 All the values seem quite reasonable to me. What about the _cost variables?

 I guess one or more queries are evaluated using a different execution
 plan, probably sequential scan instead of index scan, hash join instead of
 merge join, or something like that.

 Try to log the slow statements - see log_min_statement_duration. That
 might give you slow queries (although not necessarily the ones causing
 problems), and you can analyze them.

 What is the general I/O activity? Is there a lot of data read/written to
 the disks, is there a lot of I/O wait?

 regards
 Tomas

 PS: Was the database analyzed recently?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Rauan Maemirov
 OK, this seems to be the cause. What were the original values of the
 config variables? If you've lowered the work_mem and you need to sort a
 lot of data, this may be a problem. What amounts of data are you working
 with? If the data were not analyzed recently, the execution plans will be
 inefficient and this may be the result.

The reason is that i'm using config that i used before and it worked
perfect. work_mem is 192mb.
I tried ANALYZE, but it didn't change anything.

Amounts of data... at least, backup is over ~2.2Gb.
I tried to use EXPLAIN ANALYZE for slow queries that i get from
serverlog, but it also didn't change anything.


2009/7/30  t...@fuzzy.cz:
 Unfortunately had to downgrade back to 8.3. Now having troubles with
 that and still solving them.

 For future upgrade, what is the basic steps?

 1. create database
 2. dump the data from the old database
 3. load the data into the new database
 4. analyze etc. (I prefer to do this manually at the beginning)
 5. check that everything is working (that the correct execution plans are
 used, etc.)

 You may even run the (2) and (3) at once - use pipe instead of a file.


Was the database analyzed recently?
 Hm... there was smth like auto analyzer in serverlog when i started it
 first time, but i didn't mention that.
 Should I analyze whole db? How to do it?

 Just execute 'ANALYZE' and the whole database will be analyzed, but when
 the autovacuum daemon is running this should be performed automatically (I
 guess - check the pg_stat_user_tables, there's information about last
 manual/automatic vacuuming and/or analysis).

 And how should I change _cost variables?

 I haven't noticed you've not modified those variables, so don't change them.

 I/O was very high. at first memory usage grew up and then began to full
 swap.

 OK, this seems to be the cause. What were the original values of the
 config variables? If you've lowered the work_mem and you need to sort a
 lot of data, this may be a problem. What amounts of data are you working
 with? If the data were not analyzed recently, the execution plans will be
 inefficient and this may be the result.

 regards
 Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Steve Crawford

My additional comments:

t...@fuzzy.cz wrote:

...
For future upgrade, what is the basic steps?



  
0. Create test database - work out bugs and performance issues before 
going live.

1. create database
  
...cluster. You only need to create the individual database if the 
options you select for the dump do not create the database(s).

2. dump the data from the old database
  
...using the dump tools from the *new* version. With several cores, you 
might want to consider using the binary dump options in pg_dump if you 
want to use the new parallel restore feature in pg_restore with a 
possible dramatic increase in restore speed (benchmarks I've seen 
suggest that with 8 cores you may even see an almost 8x restore speedup 
so it's worth the effort). The manual suggests that setting --jobs to 
the number of cores on the server is a good first approximation. See the 
-Fc options on pg_dump and the --jobs option in pg_restore for details.


Cheers,
Steve



Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Steve Crawford scrawf...@pinpointresearch.com wrote: 
 
 benchmarks I've seen suggest that with 8 cores you may even see an
 almost 8x restore speedup
 
I'm curious what sort of data in what environment showed that ratio.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Steve Crawford

Kevin Grittner wrote:
Steve Crawford scrawf...@pinpointresearch.com wrote: 
 
  

benchmarks I've seen suggest that with 8 cores you may even see an
almost 8x restore speedup

 
I'm curious what sort of data in what environment showed that ratio.
 
  
Was going on memory from a presentation I watched. Reports on the web 
have shown anything from a 3x increase using 8 cores to other 
non-detailed reports of up to 8x improvement. If you have one big 
table, don't expect much if any improvement. If you have lots of smaller 
tables/indexes then parallel restore will probably benefit you. This is 
all based on the not-atypical assumption that your restore will be CPU 
bound. I don't think parallel restore will be much use beyond the point 
you hit IO limits.


Cheers,
Steve



Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Matthew Wakeling

On Thu, 30 Jul 2009, Kevin Grittner wrote:

Steve Crawford scrawf...@pinpointresearch.com wrote:

benchmarks I've seen suggest that with 8 cores you may even see an
almost 8x restore speedup


I'm curious what sort of data in what environment showed that ratio.


It depends on a lot of things. However, certainly for index creation, 
tests on servers over here have indicated that running four CREATE INDEX 
statements at the time runs four times as fast, assuming the table fits in 
maintenance_work_mem.


Matthew

--
I have an inferiority complex. But it's not a very good one.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote: 
 
 tests on servers over here have indicated that running four CREATE
 INDEX statements at the time runs four times as fast, assuming the
 table fits in maintenance_work_mem.
 
I'm benchmarking a patch to the parallel restore, and just out of
curiosity I've been comparing the multi-job approach, with various
numbers of jobs, to a restore within a single database transaction;
and I'm seeing (on serious production-quality servers) the parallel
restore run in 55% to 75% of the time of a restore running off the
same dump file using the -1 switch.  The 16 processor machine got the
best results, running with anywhere from 12 to 20 jobs.  The 2
processor machine got the lesser benefit, running with 2 to 4 jobs. 
(The exact number of jobs really didn't make a difference big enough
to emerge from the noise.)
 
I've got 431 user tables with 578 indexes in a database which, freshly
restored, is 70GB.  (That's 91GB with the fragmentation and reasonable
dead space we have in production.)  Real production data; nothing
synthetic.
 
Since the dump to custom format ran longer than the full pg_dump
piped directly to psql would have taken, the overall time to use this
technique is clearly longer for our databases on our hardware.  I'm
sure there are cases where people don't have the option to pipe things
through, or that there may sometime be a big enough savings in the
multiple jobs to pay off, even without overlapping the dump and
restore, and with the necessity to write and read the data an extra
time; but there are clearly situations where the piped approach is
faster.
 
We may want to try to characterize the conditions under which each is
a win, so we can better target our advice
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Since the dump to custom format ran longer than the full pg_dump
 piped directly to psql would have taken, the overall time to use this
 technique is clearly longer for our databases on our hardware.

Hmmm ... AFAIR there isn't a good reason for dump to custom format to
take longer than plain text dump, except for applying compression.
Maybe -Z0 would be worth testing?  Or is the problem that you have to
write the data to a disk file rather than just piping it?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 
 Hmmm ... AFAIR there isn't a good reason for dump to custom format
 to take longer than plain text dump, except for applying
 compression.  Maybe -Z0 would be worth testing?  Or is the problem
 that you have to write the data to a disk file rather than just
 piping it?
 
I'm not sure without benchmarking that.  I was writing to the same
RAID as the database I was dumping, so contention was probably a
significant issue.  But it would be interesting to compare different
permutations to see what impact each has alone and in combination.
 
I'm OK with setting up a benchmark run each night for a while, to
shake out what I can, on this and the artificial cases.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Since the dump to custom format ran longer than the full pg_dump
 piped directly to psql would have taken, the overall time to use
 this technique is clearly longer for our databases on our hardware.
 
 Hmmm ... AFAIR there isn't a good reason for dump to custom format
 to take longer than plain text dump, except for applying
 compression.  Maybe -Z0 would be worth testing?  Or is the problem
 that you have to write the data to a disk file rather than just
 piping it?
 
I did some checking with the DBA who normally copies these around for
development and test environments.  He confirmed that when the source
and target are on the same machine, a pg_dump piped to psql takes
about two hours.  If he pipes across the network, it runs more like
three hours.
 
My pg_dump to custom format ran for six hours.  The single-transaction
restore from that dump file took two hours, with both on the same
machine.  I can confirm with benchmarks, but this guy generally knows
what he's talking about (and we do create a lot of development and
test databases this way).
 
Either the compression is tripling the dump time, or there is
something inefficient about how pg_dump writes to the disk.
 
All of this is on a RAID 5 array with 5 drives using xfs with
noatime,nobarrier and a 256MB BBU controller.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Kevin Grittner kevin.gritt...@wicourts.gov writes:

Since the dump to custom format ran longer than the full pg_dump
piped directly to psql would have taken, the overall time to use this
technique is clearly longer for our databases on our hardware.


Hmmm ... AFAIR there isn't a good reason for dump to custom format to
take longer than plain text dump, except for applying compression.
Maybe -Z0 would be worth testing?  Or is the problem that you have to
write the data to a disk file rather than just piping it?


I always dump with -Z0(and compress afterwards or even in a pipe to get 
two cores busy) because otherwise custom dump times are simply ridiculous.
However Kevin is on something here - on the typical 4-8 core box I 
tested I managed to an around cores/2 speedup for the restore which 
means that for a pure upgrade or testing similiar to what kevin is doing 
custom dumps + parallel restore might result in no win or even a loss.


On on of our datasets I did some benchmarking a while ago (for those who 
attended bruce pg_migrator talk @pgcon these are same numbers):



* 150GB Database (on-disk - ~100GB as a plain text dump)

time to dump(-C0):  120min
time to restore(single threaded):   180min
time to restore(-j 16): 59min

however the problem is that this does not actually mean that parallel 
restore shaves you ~120min in dump/restore time because you get the 
following real runtimes:


plain text dump + single threaded restore in a pipe: 188min
custom dump to file + parallel restore: 179min


this is without compression, with the default custom dump + parallel 
restore is way slower than the simple approach on reasonable hardware.



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Stefan Kaltenbrunner

Kevin Grittner wrote:
Tom Lane t...@sss.pgh.pa.us wrote: 

Kevin Grittner kevin.gritt...@wicourts.gov writes:

Since the dump to custom format ran longer than the full pg_dump
piped directly to psql would have taken, the overall time to use
this technique is clearly longer for our databases on our hardware.

Hmmm ... AFAIR there isn't a good reason for dump to custom format
to take longer than plain text dump, except for applying
compression.  Maybe -Z0 would be worth testing?  Or is the problem
that you have to write the data to a disk file rather than just
piping it?
 
I did some checking with the DBA who normally copies these around for

development and test environments.  He confirmed that when the source
and target are on the same machine, a pg_dump piped to psql takes
about two hours.  If he pipes across the network, it runs more like
three hours.
 
My pg_dump to custom format ran for six hours.  The single-transaction

restore from that dump file took two hours, with both on the same
machine.  I can confirm with benchmarks, but this guy generally knows
what he's talking about (and we do create a lot of development and
test databases this way).
 
Either the compression is tripling the dump time, or there is

something inefficient about how pg_dump writes to the disk.


seems about right - compression in pg_dump -Fc is a serious bottleneck 
and unless can significantly speed it up or make it use of multiple 
cores (either for the dump itself - which would be awsome - or for the 
compression) I would recommend to not use it at all.



Stefan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey


On 7/30/09 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Since the dump to custom format ran longer than the full pg_dump
 piped directly to psql would have taken, the overall time to use
 this technique is clearly longer for our databases on our hardware.
 
 Hmmm ... AFAIR there isn't a good reason for dump to custom format
 to take longer than plain text dump, except for applying
 compression.  Maybe -Z0 would be worth testing?  Or is the problem
 that you have to write the data to a disk file rather than just
 piping it?
 
 I did some checking with the DBA who normally copies these around for
 development and test environments.  He confirmed that when the source
 and target are on the same machine, a pg_dump piped to psql takes
 about two hours.  If he pipes across the network, it runs more like
 three hours.
 
 My pg_dump to custom format ran for six hours.  The single-transaction
 restore from that dump file took two hours, with both on the same
 machine.  I can confirm with benchmarks, but this guy generally knows
 what he's talking about (and we do create a lot of development and
 test databases this way).
 
 Either the compression is tripling the dump time, or there is
 something inefficient about how pg_dump writes to the disk.
 
 All of this is on a RAID 5 array with 5 drives using xfs with
 noatime,nobarrier and a 256MB BBU controller.
 

Of course Compression has a HUGE effect if your I/O system is half-decent.
Max GZIP compression speed with the newest Intel CPU's is something like
50MB/sec (it is data dependant, obviously -- it is usually closer to
30MB/sec).  Max gzip decompression ranges from 50 to 150MB/sec (it can get
really high only if the ratio is extremely large, like if you compress a
repeating sequence of 256 bytes).

The new parallel restore is nice and all, but we're still limited by the
week it takes to dump the whole thing compressed.   Parallel restore is a
lot faster when restoring compressed dumps though, even without any indexes
to make, since all that decompression is CPU hungry.

 -Kevin
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Scott Carey sc...@richrelevance.com wrote:
 
 Max GZIP compression speed with the newest Intel CPU's is something
 like 50MB/sec (it is data dependant, obviously -- it is usually
 closer to 30MB/sec).
 
Applying 30MB/sec to the 70GB accounts for 40 minutes.  If those
numbers are good, there's something else at play here.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey

On 7/30/09 11:14 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:

 Tom Lane wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Since the dump to custom format ran longer than the full pg_dump
 piped directly to psql would have taken, the overall time to use this
 technique is clearly longer for our databases on our hardware.
 
 Hmmm ... AFAIR there isn't a good reason for dump to custom format to
 take longer than plain text dump, except for applying compression.
 Maybe -Z0 would be worth testing?  Or is the problem that you have to
 write the data to a disk file rather than just piping it?
 
 I always dump with -Z0(and compress afterwards or even in a pipe to get
 two cores busy) because otherwise custom dump times are simply ridiculous.
 However Kevin is on something here - on the typical 4-8 core box I
 tested I managed to an around cores/2 speedup for the restore which
 means that for a pure upgrade or testing similiar to what kevin is doing
 custom dumps + parallel restore might result in no win or even a loss.
 
 On on of our datasets I did some benchmarking a while ago (for those who
 attended bruce pg_migrator talk @pgcon these are same numbers):
 
 
 * 150GB Database (on-disk - ~100GB as a plain text dump)
 
 time to dump(-C0):  120min
 time to restore(single threaded):   180min
 time to restore(-j 16): 59min


Note also that with ext3 and XFS (untuned) parallel restore = HORRIBLY
FRAGMENTED tables, to the point of sequential scans being rather slow.  At
least, they're mostly just interleaved with each other so there is little
seeking backwards, but still... Beware.

XFS with allocsize=64m or so interleaves them in reasonably large chunks
though and prevents significant fragmentation.

 
 however the problem is that this does not actually mean that parallel
 restore shaves you ~120min in dump/restore time because you get the
 following real runtimes:
 
 plain text dump + single threaded restore in a pipe: 188min
 custom dump to file + parallel restore: 179min

On the other hand, I find that the use case where one DB is dumped to a
backup, and then this backup is restored on several others -- that parallel
restore is extremely useful there.

Dump needs to be parallelized or at least pipelined to use more cores.  COPY
on one thread, compression on another?

One trick with a dump, that works only if you have tables or schemas that
can safely dump in different transactions, is to dump concurrently on
different slices of the DB manually.  This makes a huge difference if that
is possible. 

 
 
 this is without compression, with the default custom dump + parallel
 restore is way slower than the simple approach on reasonable hardware.
 
 
 Stefan
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey



On 7/30/09 11:24 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:

 Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Since the dump to custom format ran longer than the full pg_dump
 piped directly to psql would have taken, the overall time to use
 this technique is clearly longer for our databases on our hardware.
 Hmmm ... AFAIR there isn't a good reason for dump to custom format
 to take longer than plain text dump, except for applying
 compression.  Maybe -Z0 would be worth testing?  Or is the problem
 that you have to write the data to a disk file rather than just
 piping it?
 
 I did some checking with the DBA who normally copies these around for
 development and test environments.  He confirmed that when the source
 and target are on the same machine, a pg_dump piped to psql takes
 about two hours.  If he pipes across the network, it runs more like
 three hours.
 
 My pg_dump to custom format ran for six hours.  The single-transaction
 restore from that dump file took two hours, with both on the same
 machine.  I can confirm with benchmarks, but this guy generally knows
 what he's talking about (and we do create a lot of development and
 test databases this way).
 
 Either the compression is tripling the dump time, or there is
 something inefficient about how pg_dump writes to the disk.
 
 seems about right - compression in pg_dump -Fc is a serious bottleneck
 and unless can significantly speed it up or make it use of multiple
 cores (either for the dump itself - which would be awsome - or for the
 compression) I would recommend to not use it at all.
 

That's not an option when a dump compressed is 200GB and uncompressed is
1.3TB, for example.


 
 Stefan
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Arjen van der Meijden

On 30-7-2009 20:46 Scott Carey wrote:

Of course Compression has a HUGE effect if your I/O system is half-decent.
Max GZIP compression speed with the newest Intel CPU's is something like
50MB/sec (it is data dependant, obviously -- it is usually closer to
30MB/sec).  Max gzip decompression ranges from 50 to 150MB/sec (it can get
really high only if the ratio is extremely large, like if you compress a
repeating sequence of 256 bytes).


I just ran some quick numbers on our lightly loaded Nehalem X5570 (2.93+ 
Ghz depending on turbo-mode). I compressed a 192MB text file I had at 
hand using gzip -1, -2, -3, -6 and -9 and outputted its results to 
/dev/null. The file was in the kernels file cache all the time and I did 
the tests 3 times.


Gzip -1 reached 54MB/s, -2 got 47MB/s, -3 got 32MB/s, -6 got 18MB/s and 
-9 got to 12MB/s. Just running cat on the file made it do 6400MB/s (i.e. 
it took 0.030 seconds to copy the file from memory to nowhere).

Those files where respectively 69MB, 66MB, 64MB, 59MB and 58MB.

Gunzip on the -1 file took 1.66 seconds, i.e. it read data at 41MB/s and 
outputted it to /dev/null at 115MB/s. The -9 file took 1.46s, so it read 
40MB/s and wrote 131MB/s.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 Dump needs to be parallelized or at least pipelined to use more cores.  COPY
 on one thread, compression on another?

We already do that (since compression happens on the pg_dump side).

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey



On 7/30/09 11:58 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Scott Carey sc...@richrelevance.com wrote:
 
 Max GZIP compression speed with the newest Intel CPU's is something
 like 50MB/sec (it is data dependant, obviously -- it is usually
 closer to 30MB/sec).
 
 Applying 30MB/sec to the 70GB accounts for 40 minutes.  If those
 numbers are good, there's something else at play here.

It is rather data dependant, try gzip on command line as a test on some
data.  On a random tarball on my Nehalem system, I just got  23MB/sec
compression rate on an uncompressable file.
Decompression with gunzip was 145MB/sec.

On a text file that I manually created with randommly placed repeating
segments that compresses 200x to 1,  compression was 115MB/sec (bytes in per
sec), and decompression (bytes out per sec) was 265MB/sec.

The array in this machine will do 800MB/sec reads/sec with 'dd' and
700MB/sec writes.

One core has no chance.


Now, what needs to be known with the pg_dump is not just how fast
compression can go (assuming its gzip) but also what the duty cycle time of
the compression is.  If it is single threaded, there is all the network and
disk time to cut out of this, as well as all the CPU time that pg_dump does
without compression.

 -Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey

On 7/30/09 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Carey sc...@richrelevance.com writes:
 Dump needs to be parallelized or at least pipelined to use more cores.  COPY
 on one thread, compression on another?
 
 We already do that (since compression happens on the pg_dump side).
 
 regards, tom lane
 

Well, that isn't what I meant.  pg_dump uses CPU outside of compression
doing various things, If that Cpu is 10% as much as the compression, then
splitting them up would yield ~10% gain when CPU bound. 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Scott Carey sc...@richrelevance.com wrote:
 
 Now, what needs to be known with the pg_dump is not just how fast
 compression can go (assuming its gzip) but also what the duty cycle
 time of the compression is.  If it is single threaded, there is all
 the network and disk time to cut out of this, as well as all the CPU
 time that pg_dump does without compression.
 
Well, I established a couple messages back on this thread that pg_dump
piped to psql to a database on the same machine writes the 70GB
database to disk in two hours, while pg_dump to a custom format file
at default compression on the same machine writes the 50GB file in six
hours.  No network involved, less disk space written.  I'll try it
tonight at -Z0.
 
One thing I've been wondering about is what, exactly, is compressed in
custom format.  Is it like a .tar.gz file, where the compression is a
layer over the top, or are individual entries compressed?  If the
latter, what's the overhead on setting up each compression stream?  Is
there some minimum size before that kicks in?  (I know, I should go
check the code myself.  Maybe in a bit.  Of course, if someone already
knows, it would be quicker)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey

On 7/30/09 1:58 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 Scott Carey sc...@richrelevance.com wrote:
 
 Now, what needs to be known with the pg_dump is not just how fast
 compression can go (assuming its gzip) but also what the duty cycle
 time of the compression is.  If it is single threaded, there is all
 the network and disk time to cut out of this, as well as all the CPU
 time that pg_dump does without compression.
 
 Well, I established a couple messages back on this thread that pg_dump
 piped to psql to a database on the same machine writes the 70GB
 database to disk in two hours, while pg_dump to a custom format file
 at default compression on the same machine writes the 50GB file in six
 hours.  No network involved, less disk space written.  I'll try it
 tonight at -Z0.

So, I'm not sure what the pg_dump custom format overhead is minus the
compression -- there is probably some non-compression overhead from that
format other than the compression.

-Z1 might be interesting too, but obviously it takes some time.  Interesting
that your uncompressed case is only 40% larger.  For me, the compressed dump
is in the range of 20% the size of the uncompressed one.


 
 One thing I've been wondering about is what, exactly, is compressed in
 custom format.  Is it like a .tar.gz file, where the compression is a
 layer over the top, or are individual entries compressed?

It is instructive to open up a compressed custom format file in 'less' or
another text viewer.

Basically, it is the same as the uncompressed dump with all the DDL
uncompressed, but the binary chunks compressed.  It would seem (educated
guess, looking at the raw file, and not the code) that the table data is
compressed and the DDL points to an index in the file where the compressed
blob for the copy lives.


 If the
 latter, what's the overhead on setting up each compression stream?  Is
 there some minimum size before that kicks in?  (I know, I should go
 check the code myself.  Maybe in a bit.  Of course, if someone already
 knows, it would be quicker)

Gzip does have some quirky performance behavior depending on the chunk size
of data you stream into it.

 
 -Kevin
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Scott Carey sc...@richrelevance.com wrote:
 
 Gzip does have some quirky performance behavior depending on the
 chunk size of data you stream into it.
 
Yeah, I've run into that before.  If we're sending each individual
datum to a gzip function rather than waiting until we've got a
decent-size buffer, that could explain it.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 One thing I've been wondering about is what, exactly, is compressed in
 custom format.  Is it like a .tar.gz file, where the compression is a
 layer over the top, or are individual entries compressed?

Individual entries.  Eyeball examination of a dump file shows that we
only compress table-data entries, and don't for example waste time
firing up the compressor to process a function body.  It's possible
that it'd be worth trying to have some lower limit on the amount of
data in a table before we bother to compress it, but I bet that it
wouldn't make any difference on your databases ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 Gzip does have some quirky performance behavior depending on the chunk size
 of data you stream into it.

Can you enlarge on that comment?  I'm not sure that pg_dump is aware
that there's anything to worry about there.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Scott Carey sc...@richrelevance.com writes:
 Gzip does have some quirky performance behavior depending on the
 chunk size of data you stream into it.
 
 Can you enlarge on that comment?  I'm not sure that pg_dump is aware
 that there's anything to worry about there.
 
If the library used here is anything like the native library used by
Java, it'd be worth putting a buffer layer ahead of the calls to gzip,
so it isn't dealing with each individual value as a separate call.  I
seem to remember running into that issue in Java, where throwing a
BufferedOutputStream in there fixed the performance issue.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Rauan Maemirov
Hey guyz, thanks for help. I solved the problems. The reason was in
bad query, that i've accidentally committed right after upgrading.
PostgreSQL 8.4 is perfect! Analyze works like a charm, and MUCH better
than in 8.3.

2009/7/31 Kevin Grittner kevin.gritt...@wicourts.gov:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Scott Carey sc...@richrelevance.com writes:
 Gzip does have some quirky performance behavior depending on the
 chunk size of data you stream into it.

 Can you enlarge on that comment?  I'm not sure that pg_dump is aware
 that there's anything to worry about there.

 If the library used here is anything like the native library used by
 Java, it'd be worth putting a buffer layer ahead of the calls to gzip,
 so it isn't dealing with each individual value as a separate call.  I
 seem to remember running into that issue in Java, where throwing a
 BufferedOutputStream in there fixed the performance issue.

 -Kevin

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Carey

On 7/30/09 2:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Carey sc...@richrelevance.com writes:
 Gzip does have some quirky performance behavior depending on the chunk size
 of data you stream into it.
 
 Can you enlarge on that comment?  I'm not sure that pg_dump is aware
 that there's anything to worry about there.
 
 regards, tom lane
 

For example, one of the things that gzip does is calculate the crc of the
item being compressed.  Calculating that incrementally is less efficient
than doing it in bulk.
For whatever reason, some other internals of gzip tend to perform much
better if submitting say, 4k or 8k or 16k chunks rather than little bits at
a time.  But I'm sure some of that also depends on what library you're using
since they all vary somewhat.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 On 7/30/09 2:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Scott Carey sc...@richrelevance.com writes:
 Gzip does have some quirky performance behavior depending on the chunk size
 of data you stream into it.
 
 Can you enlarge on that comment?  I'm not sure that pg_dump is aware
 that there's anything to worry about there.

 For whatever reason, some other internals of gzip tend to perform much
 better if submitting say, 4k or 8k or 16k chunks rather than little bits at
 a time.  But I'm sure some of that also depends on what library you're using
 since they all vary somewhat.

AFAIK there is only one widely-used implementation of zlib, and it
hasn't changed much in a long time.

I did some tracing and verified that pg_dump passes data to deflate()
one table row at a time.  I'm not sure about the performance
implications of that, but it does seem like it might be something to
look into.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Greg Stark
On Thu, Jul 30, 2009 at 11:30 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I did some tracing and verified that pg_dump passes data to deflate()
 one table row at a time.  I'm not sure about the performance
 implications of that, but it does seem like it might be something to
 look into.

I suspect if this was a problem the zlib people would have added
internal buffering ages ago. I find it hard to believe we're not the
first application to use it this way.

I suppose it wouldn't be the first time a problem like this went
unfixed though. Is the zlib software actively maintained or was your
earlier comment implying it's currently an orphaned codebase?


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Kevin Grittner
Greg Stark gsst...@mit.edu wrote: 
 
 I suspect if this was a problem the zlib people would have added
 internal buffering ages ago. I find it hard to believe we're not the
 first application to use it this way.
 
I think that most uses of this library are on entire files or streams.
They may have felt that adding another layer of buffering would just
hurt performance for the typical use case, and anyone using it in some
other way could always use their own buffering layer.  In Java adding
that layer took 30 characters of code, so it didn't make a very big
impression on me -- it took a while to even remember I'd had to do it.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Greg Smith

On Thu, 30 Jul 2009, Rauan Maemirov wrote:


maintenance_work_mem = 1GB
work_mem = 192MB
shared_buffers = 7680MB
max_connections = 80
My box is Nehalem 2xQuad 2.8 with RAM 32Gb


While it looks like you sorted out your issue downthread, I wanted to 
point out that your setting for work_mem could be dangerously high here 
and contribute to problems with running out memory or using swap.  If each 
of your 80 clients was doing a sort at the same time, you'd be using 80 * 
192MB + 7680MB = 15360GB of RAM just for the server.  The problem is that 
each client could do multiple sorts, so usage might even got higher. 
Unless you have a big data warehouse setup, more common work_mem settings 
are in the 16-64MB range rather than going this high.  Just something to 
keep an eye on if you find a lot of memory is being used by the database 
processes.  I really need to refine the pgtune model to more carefully 
account for this particular problem, it's a bit too aggressive here for 
people who aren't proactively watching the server's RAM after changing the 
settings.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Scott Marlowe
On Thu, Jul 30, 2009 at 10:10 PM, Greg Smithgsm...@gregsmith.com wrote:
 On Thu, 30 Jul 2009, Rauan Maemirov wrote:

 maintenance_work_mem = 1GB
 work_mem = 192MB
 shared_buffers = 7680MB
 max_connections = 80
 My box is Nehalem 2xQuad 2.8 with RAM 32Gb

 While it looks like you sorted out your issue downthread, I wanted to point
 out that your setting for work_mem could be dangerously high here and
 contribute to problems

The real danger here is that you can set up your pg server to fail
ONLY under heavy load, when it runs out of memory and goes into a swap
storm.  So, without proper load testing and profiling, you may not
know you're headed for danger until your server goes unresponsive
midday at the most critical of times.  And restarting it will just
lead to the same failure again as the clients all reconnect and pummel
your server.

Meanwhile, going from 192 to 16MB might result in a total slowdown
measured in a fraction of a percentage overall, and prevent this kind
of failure.

If there's one single request you can justify big work_mem for then
set it for just that one query.  It's not uncommon to have a reporting
user limited to a few connections and with alter user reportinguser
set work_mem='512MB'; so that it can run fast but not deplete your
server's resources on accident during heavy load.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL OR performance

2008-11-17 Thread Віталій Тимчишин
2008/11/15 Tom Lane [EMAIL PROTECTED]

 =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
  I am not. I can't see how materialize can multiply number of rows it gets
  from sort by 100.

 Is it the right-hand input of a merge join?  If so you're looking at
 mark/restore rescans, ie, repeated fetches of the same tuples.  There
 must be a huge number of duplicate join keys in that relation to make
 for such an increase though.  Normally the planner avoids putting a
 table with lots of duplicates as the RHS of a merge, but if it doesn't
 have good statistics for the join key then it might not realize the
 problem.


OK, thanks for cleaning-up some mystery.
But, returning to original Q: Do anyone known why does it choose plan from *
OR-plan.txt* instead of *union-plan.txt*? The first is
cost=4588.13..960900482668.95, the latter is cost=266348.42..272953.14
according to statistics postgres have, so I suppose planner would select it
if it could evaluate it.


Re: [PERFORM] PostgreSQL OR performance

2008-11-15 Thread Віталій Тимчишин
Sorry, for delayed response - It was very busy week.

2008/11/7 David Wilson [EMAIL PROTECTED]

 On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote:
  Merge Join  (cost=518771.07..62884559.80 rows=1386158171 width=32)
 (actual
  time=30292.802..755751.242 rows=34749 loops=1)

 Have you tried increasing the default_statistics_target? The planner
 is expecting 1.3 billion rows to be produced from a query that's only
 actually producting 35k, which probably indicates some very bad
 statistics.


 The planner seems to think that every second pair from company-company
join will succeed with this join expression (1386158171 ~  52648^2 / 2).
That is not true.
Anyway, I've tried to set default_statistics_target to 1000, then analyze.
Nothing've changed

At the same time, the materialize step produces 242
 million rows when the planner only expects to produce 2.3, indicating
 a similar problem in the opposite direction. This probably means that
 the planner is choosing plans that would be optimal if it was making
 good guesses but are decidedly sub-optimal for your actual data.


That is even more strange, because materialize step must produce exactly the
rows it takes from sort, that is 2316503, so I don't get how table scan +
sort + materialize can multiply number of rows by 100.


Re: [PERFORM] PostgreSQL OR performance

2008-11-15 Thread Tom Lane
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
 I am not. I can't see how materialize can multiply number of rows it gets
 from sort by 100.

Is it the right-hand input of a merge join?  If so you're looking at
mark/restore rescans, ie, repeated fetches of the same tuples.  There
must be a huge number of duplicate join keys in that relation to make
for such an increase though.  Normally the planner avoids putting a
table with lots of duplicates as the RHS of a merge, but if it doesn't
have good statistics for the join key then it might not realize the
problem.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL OR performance

2008-11-07 Thread David Wilson
On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote:
 Merge Join  (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
 time=30292.802..755751.242 rows=34749 loops=1)

Have you tried increasing the default_statistics_target? The planner
is expecting 1.3 billion rows to be produced from a query that's only
actually producting 35k, which probably indicates some very bad
statistics. At the same time, the materialize step produces 242
million rows when the planner only expects to produce 2.3, indicating
a similar problem in the opposite direction. This probably means that
the planner is choosing plans that would be optimal if it was making
good guesses but are decidedly sub-optimal for your actual data.



-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Helio Campos Mello de Andrade
For what i see in four OR-plan.txt tou are doing too much sequencial scan
. Create some indexes for those tables using the fields that you use an it
may help you.

OBS: If you already have lots of indexes in your tables it may be a good
time for you re-think your strategy because it´s ot working.
Tips:
  1 - create indexes for the tables with the fields that you will use in the
query if it is your most important query. If you have others querys that are
used please post those here and we can help you to desing a better plan.
  2 - You cold give us the configuration os the hardware and the posgresql
configuration file and we can see what is going on.

Regards

On Thu, Nov 6, 2008 at 8:46 AM, Віталій Тимчишин [EMAIL PROTECTED] wrote:


 My main message is that I can see this in many queries and many times. But
 OK, I can present exact example.

 2008/11/5 Jeff Davis [EMAIL PROTECTED]

 On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
  For a long time already I can see very poor OR performance in
  postgres.
  If one have query like select something from table where condition1
  or condition2 it may take ages to execute while
  select something from table where condition1 and select something
  from table where condition2 are executed very fast and
  select something from table where condition1 and not condition2 union
  all select something from table where condition2 gives required
  results fast
 

 What version are you using?


 Server version 8.3.3




 Have you run VACUUM ANALYZE?


 I have autovacuum, but for this example I did vacuum analyze of the whole
 DB.
 The real-life query (autogenerated) looks like the next:
 select t0.id as pk1,t1.id as pk2 ,t0.run_id as f1_run_id,t1.run_id as
 f2_run_id
 from tmpv_unproc_null_production_company_dup_cons_company as t0, (select *
 from production.company where run_id in (select id from production.run where
 name='test')) as t1
 where
 t0.name = t1.name
 or
 (t0.name,t1.name) in (select s1.name, s2.name from atom_match inner join
 atoms_string s1 on atom_match.atom1_id = s1.id  inner join atoms_string s2
 on atom_match.atom2_id = s2.id where s1.atom_type_id = -1 and
 match_function_id = 2)

 with tmpv_unproc_null_production_company_dup_cons_company:

 create temporary view tmpv_unproc_null_production_company_dup_cons_company
 as select * from production.company where 1=1 and status='unprocessed' and
 run_id in (select id from production.run where name='test')



 Next, do:

 EXPLAIN ANALYZE select something from table where condition1 or
 condition2;


 without analyze is in OR-plan.txt
 Also plans for only condition1, only condition2 and union is attached



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Helio Campos Mello de Andrade


Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Віталій Тимчишин
My main message is that I can see this in many queries and many times. But
OK, I can present exact example.

2008/11/5 Jeff Davis [EMAIL PROTECTED]

 On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
  For a long time already I can see very poor OR performance in
  postgres.
  If one have query like select something from table where condition1
  or condition2 it may take ages to execute while
  select something from table where condition1 and select something
  from table where condition2 are executed very fast and
  select something from table where condition1 and not condition2 union
  all select something from table where condition2 gives required
  results fast
 

 What version are you using?


Server version 8.3.3




 Have you run VACUUM ANALYZE?


I have autovacuum, but for this example I did vacuum analyze of the whole
DB.
The real-life query (autogenerated) looks like the next:
select t0.id as pk1,t1.id as pk2 ,t0.run_id as f1_run_id,t1.run_id as
f2_run_id
from tmpv_unproc_null_production_company_dup_cons_company as t0, (select *
from production.company where run_id in (select id from production.run where
name='test')) as t1
where
t0.name = t1.name
or
(t0.name,t1.name) in (select s1.name, s2.name from atom_match inner join
atoms_string s1 on atom_match.atom1_id = s1.id  inner join atoms_string s2
on atom_match.atom2_id = s2.id where s1.atom_type_id = -1 and
match_function_id = 2)

with tmpv_unproc_null_production_company_dup_cons_company:

create temporary view tmpv_unproc_null_production_company_dup_cons_company
as select * from production.company where 1=1 and status='unprocessed' and
run_id in (select id from production.run where name='test')



 Next, do:

 EXPLAIN ANALYZE select something from table where condition1 or
 condition2;


without analyze is in OR-plan.txt
Also plans for only condition1, only condition2 and union is attached
Nested Loop  (cost=4588.13..960900482668.95 rows=1386158171 width=32)
  Join Filter: (((production.company.name)::text = 
(production.company.name)::text) OR (subplan))
  -  Hash IN Join  (cost=1.56..73814.22 rows=52648 width=30)
Hash Cond: ((production.company.run_id)::bigint = production.run.id)
-  Seq Scan on company  (cost=0.00..64599.29 rows=2316503 width=30)
  Filter: ((status)::text = 'unprocessed'::text)
-  Hash  (cost=1.55..1.55 rows=1 width=8)
  -  Seq Scan on run  (cost=0.00..1.55 rows=1 width=8)
Filter: ((name)::text = 'test'::text)
  -  Nested Loop  (cost=1183.27..39219.67 rows=52648 width=30)
-  HashAggregate  (cost=1.55..1.56 rows=1 width=8)
  -  Seq Scan on run  (cost=0.00..1.55 rows=1 width=8)
Filter: ((name)::text = 'test'::text)
-  Bitmap Heap Scan on company  (cost=1181.72..38435.51 rows=62608 
width=30)
  Recheck Cond: ((production.company.run_id)::bigint = 
production.run.id)
  -  Bitmap Index Scan on comp_run  (cost=0.00..1166.07 
rows=62608 width=0)
Index Cond: ((production.company.run_id)::bigint = 
production.run.id)
  SubPlan
-  Materialize  (cost=3403.29..4005.74 rows=35745 width=28)
  -  Hash Join  (cost=928.57..3122.55 rows=35745 width=28)
Hash Cond: ((atom_match.atom1_id)::integer = s1.id)
-  Hash Join  (cost=445.80..1880.19 rows=35745 width=18)
  Hash Cond: ((atom_match.atom2_id)::integer = s2.id)
  -  Seq Scan on atom_match  (cost=0.00..674.81 
rows=35745 width=8)
Filter: ((match_function_id)::integer = 2)
  -  Hash  (cost=260.91..260.91 rows=14791 width=18)
-  Seq Scan on atoms_string s2  
(cost=0.00..260.91 rows=14791 width=18)
-  Hash  (cost=297.89..297.89 rows=14791 width=18)
  -  Seq Scan on atoms_string s1  (cost=0.00..297.89 
rows=14791 width=18)
Filter: ((atom_type_id)::integer = (-1))

Merge Join  (cost=89373.23..97526.15 rows=525975 width=32) (actual 
time=276.869..523.669 rows=34749 loops=1)
  Merge Cond: ((production.company.name)::text = 
(production.company.name)::text)
  -  Sort  (cost=44764.87..44896.49 rows=52648 width=30) (actual 
time=120.036..144.925 rows=15507 loops=1)
Sort Key: production.company.name
Sort Method:  external merge  Disk: 704kB
-  Nested Loop  (cost=1183.27..39376.19 rows=52648 width=30) (actual 
time=1.898..72.693 rows=15507 loops=1)
  -  HashAggregate  (cost=1.55..1.56 rows=1 width=8) (actual 
time=0.020..0.022 rows=1 loops=1)
-  Seq Scan on run  (cost=0.00..1.55 rows=1 width=8) 
(actual time=0.010..0.012 rows=1 loops=1)
  Filter: ((name)::text = 'test'::text)
  -  Bitmap Heap Scan on company  (cost=1181.72..38592.03 
rows=62608 width=30) (actual time=1.873..29.251 rows=15507 loops=1)
Recheck 

Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Віталій Тимчишин
2008/11/6 Helio Campos Mello de Andrade [EMAIL PROTECTED]

 For what i see in four OR-plan.txt tou are doing too much sequencial scan
 . Create some indexes for those tables using the fields that you use an it
 may help you.

 OBS: If you already have lots of indexes in your tables it may be a good
 time for you re-think your strategy because it´s ot working.
 Tips:
   1 - create indexes for the tables with the fields that you will use in
 the query if it is your most important query. If you have others querys that
 are used please post those here and we can help you to desing a better plan.


As you can see from other plans, it do have all the indexes to perform it's
work fast (when given part by part). It simply do not wish to use them. My
question: Is this a configuration problem or postgresql optimizer simply
can't do such a query rewrite?

Actually I did rewrite the query to work properly as you can see from
union-plan.txt. My question is if postgresql can do this automatically
because such a rewrite is not always easy/possible (esp. for generated
queries)?


Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Richard Huxton
Віталій Тимчишин wrote:
 As you can see from other plans, it do have all the indexes to perform it's
 work fast (when given part by part). It simply do not wish to use them. My
 question: Is this a configuration problem or postgresql optimizer simply
 can't do such a query rewrite?

I must admit, I haven't managed to figure out what your query is trying
to do, but then that's a common problem with autogenerated queries.

The main question that needs answering is why the planner thinks you're
going to get 1.3 billion rows in the or query:

Nested Loop  (cost=4588.13..960900482668.95 rows=1386158171 width=32)

You don't show explain analyse for this query, so there's no way of
knowing how many rows get returned but presumably you're expecting
around 88000. What does explain analyse return?

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Helio Campos Mello de Andrade
As far as i know if you created the indexes properly and postgres sees that
it will give some improvement he will use those.
 - Look at the page of index creation that we may be forgeting some thing.

http://www.postgresql.org/docs/8.3/static/indexes.html

I have to go to the hospital know. Tomorrow i will take a look at the manual
and try to understand all the necessary for the postgresql use an index.

Regards

On Thu, Nov 6, 2008 at 2:33 PM, Віталій Тимчишин [EMAIL PROTECTED] wrote:



 2008/11/6 Helio Campos Mello de Andrade [EMAIL PROTECTED]

 For what i see in four OR-plan.txt tou are doing too much sequencial
 scan . Create some indexes for those tables using the fields that you use
 an it may help you.

 OBS: If you already have lots of indexes in your tables it may be a good
 time for you re-think your strategy because it´s ot working.
 Tips:
   1 - create indexes for the tables with the fields that you will use in
 the query if it is your most important query. If you have others querys that
 are used please post those here and we can help you to desing a better plan.


 As you can see from other plans, it do have all the indexes to perform it's
 work fast (when given part by part). It simply do not wish to use them. My
 question: Is this a configuration problem or postgresql optimizer simply
 can't do such a query rewrite?

 Actually I did rewrite the query to work properly as you can see from
 union-plan.txt. My question is if postgresql can do this automatically
 because such a rewrite is not always easy/possible (esp. for generated
 queries)?




-- 
Helio Campos Mello de Andrade


Re: [PERFORM] PostgreSQL OR performance

2008-11-06 Thread Віталій Тимчишин
2008/11/6 Richard Huxton [EMAIL PROTECTED]

 Віталій Тимчишин wrote:
  As you can see from other plans, it do have all the indexes to perform
 it's
  work fast (when given part by part). It simply do not wish to use them.
 My
  question: Is this a configuration problem or postgresql optimizer simply
  can't do such a query rewrite?

 I must admit, I haven't managed to figure out what your query is trying
 to do, but then that's a common problem with autogenerated queries.


That's easy - I am looking for duplicates from subset of companies. Two
companies are equal when there names are simply equal or there is an entry
in match table for names.




 The main question that needs answering is why the planner thinks you're
 going to get 1.3 billion rows in the or query:

 Nested Loop  (cost=4588.13..960900482668.95 rows=1386158171 width=32)

 You don't show explain analyse for this query, so there's no way of
 knowing how many rows get returned but presumably you're expecting
 around 88000. What does explain analyse return?


Yes, the query should output exactly same result as in Union plan. I will
run slow explain analyze now and will repost after it will complete
(tomorrow?).
BTW: I'd say planner should think rows estimated as sum of ORs estimation
minus intersection, but no more then sum or ORs (if intersection is 0). For
first condition it has rows=525975, for second it has rows=2403 (with other
plans, of course), so it's strange it has such a high estimation It's
exactly 50% of full cartesian join of merge, so it does think that every
second pair would succeed, that is not true.


[PERFORM] PostgreSQL OR performance

2008-11-05 Thread Віталій Тимчишин
Hello.

For a long time already I can see very poor OR performance in postgres.
If one have query like select something from table where condition1 or
condition2 it may take ages to execute while
select something from table where condition1 and select something from
table where condition2 are executed very fast and
select something from table where condition1 and not condition2 union all
select something from table where condition2 gives required results fast

For example, in my current query for condition1 optimizer gives 88252, for
condition1 and not condition2 it is 88258, for condition2 it is 99814.
And for condition1 or condition2 it is 961499627680. And it does perform
this way.

All is more or less good when select part is easy and query can be easily
rewritten. But for complex queries it looks ugly and if the query is
autogenerated, moving autogeneration mechanism from creating simple clean
where to unions is not an easy task.

So the question is: Do I miss something? Can this be optimized?


Re: [PERFORM] PostgreSQL OR performance

2008-11-05 Thread Tom Lane
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
 For a long time already I can see very poor OR performance in postgres.

If you would provide a concrete example rather than handwaving, we might
be able to offer some advice ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL OR performance

2008-11-05 Thread Jeff Davis
On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
 For a long time already I can see very poor OR performance in
 postgres. 
 If one have query like select something from table where condition1
 or condition2 it may take ages to execute while
 select something from table where condition1 and select something
 from table where condition2 are executed very fast and
 select something from table where condition1 and not condition2 union
 all select something from table where condition2 gives required
 results fast
 

What version are you using?

Have you run VACUUM ANALYZE?

Next, do:

EXPLAIN ANALYZE select something from table where condition1 or
condition2;

for each of the queries, unless that query takes so long you don't want
to wait for the result. In that case, omit the ANALYZE and just do
EXPLAIN 

Then post those results to the list. These tell us what plans PostgreSQL
is choosing and what it estimates the costs to be. If it's the output of
EXPLAIN ANALYZE, it also runs the query and tells us what the costs
really are.

From that, we can see where the planner is going wrong, and what you
might do to change it.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Thanks Mark,

We are using DBCP and i found something about pgpool in some forum threads,
which gave me queries on it. But I am clear now.

On Wed, Aug 20, 2008 at 8:59 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 Yes, we use connection pooling.  As I recall Hibernate ships with c3p0
 connection pooling built-in, which is what we use.  We were happy enough
 with c3p0 that we ended up moving our other non-hibernate apps over to
 it, away from DBCP.

 pgpool does connection pooling at a socket level instead of in a local
 library level, so really it's a very different thing.  If your app is
 the only thing talking to this database, and you don't have a
 multi-database configuration, then it will be easier for you to use a
 Java-based connection pooling library like c3p0 or DBCP than to use
 pgpool.

 -- Mark

 On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote:
  Hi Mark,
 
  Thank you very much for the information. I will analyse the DB
  structure and create indexes on PG directly.
  Are you using any connection pooling like DBCP? or PG POOL?
 
  Regards, KP
 
 
  On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED]
  wrote:
 
  On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
   Hi,
  
   Can anyone suggest the performance tips for PostgreSQL using
   Hibernate.
  
   One of the queries:
  
   - PostgreSQL has INDEX concept and Hibernate also has Column
  INDEXes.
   Which is better among them? or creating either of them is
  enough? or
   need to create both of them?
  
   and any more performace aspects ?
 
 
  Hibernate is a library for accessing a database such as
  PostgreSQL.  It
  does not offer any add-on capabilities to the storage layer
  itself.  So
  when you tell Hibernate that a column should be indexed, all
  that it
  does create the associated PostgreSQL index when you ask
  Hibernate to
  build the DB tables for you.  This is part of Hibernate's
  effort to
  protect you from the implementation details of the underlying
  database,
  in order to make supporting multiple databases with the same
  application
  code easier.
 
  So there is no performance difference between a PG index and a
  Hibernate
  column index, because they are the same thing.
 
  The most useful Hibernate performance-tuning advice isn't
  PG-specific at
  all, there are just things that you need to keep in mind when
  developing
  for any database to avoid pathologically bad performance;
  those tips are
  really beyond the scope of this mailing list, Google is your
  friend
  here.
 
  I've been the architect for an enterprise-class application
  for a few
  years now using PostgreSQL and Hibernate together in a
  performance-critical context, and honestly I can't think of
  one time
  that I've been bitten by a PG-specific performance issue (a
  lot of
  performance issues with Hibernate that affected all databases
  though;
  you need to know what you're doing to make Hibernate apps that
  run fast.
  If you do run into problems, you can figure out the actual SQL
  that
  Hibernate is issuing and do the normal PostgreSQL explain
  analyze on it;
  usually caused by a missing index.
 
  -- Mark
 
 
 
  --
 
  Best Regards
  Kranti Kiran Kumar Parisa
  M: +91 - 9391 - 438 - 738
  +91 - 9849 - 625 - 625
 
 




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Thanks Matthew,

does that mean i can just have index1, index3, index4?

On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED]wrote:

 On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:

 creating multiple indexes on same column will effect performance?
  for example:

 index1 : column1, column2, column3
 index2: column1
 index3: column2,
 index4: column3
 index5: column1,column2


 The sole purpose of indexes is to affect performance.

 However, if you have index1, there is no point in having index2 or index5.

 Matthew

 --
 Isn't Microsoft Works something of a contradiction?
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Mark Lewis
On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote:

 On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling
 [EMAIL PROTECTED] wrote:
 On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:
 creating multiple indexes on same column will effect
 performance?
  for example:
 
 index1 : column1, column2, column3
 index2: column1
 index3: column2,
 index4: column3
 index5: column1,column2
 
 
 The sole purpose of indexes is to affect performance.
 
 However, if you have index1, there is no point in having
 index2 or index5.
 
 Matthew
 
 Thanks Matthew,
 
 does that mean i can just have index1, index3, index4?
 

(trying to get the thread back into newest-comments-last order)

Well, yes you can get away with just index1, index3 and index4, and it
may well be the optimal solution for you, but it's not entirely
clear-cut.

It's true that PG can use index1 to satisfy queries of the form SELECT
x FROM y WHERE column1=somevalue or column1=a AND column2=b.  It will
not be as fast as an index lookup from a single index, but depending on
the size of the tables/indexes and the selectivity of leading column(s)
in the index, the difference in speed may be trivial.

On the other hand, if you have individual indexes on column1, column2
and column3 but no multi-column index, PG can combine the individual
indexes in memory with a bitmap.  This is not as fast as a normal lookup
in the multi-column index would be, but can still be a big win over not
having an index at all.

To make an educated decision you might want to read over some of the
online documentation about indexes, in particular these two sections:

http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html

and

http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html

-- Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Hi Mark,

Thanks again for the info.
I shall create diff sets of indexes and see the query execution time.
And one of such tables might get around 700,000 records over a period of 4-5
months. So what kind of other measures I need to focus on.
I thought of the following
1) Indexes
2) Better Hardware (RAM  HDD)

And how can i estimate the size of the row?  is it like based on the data
types of the columns i have in the table?
Do you have any info to guide me on this?

On Thu, Aug 21, 2008 at 7:32 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote:

  On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling
  [EMAIL PROTECTED] wrote:
  On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:
  creating multiple indexes on same column will effect
  performance?
   for example:
 
  index1 : column1, column2, column3
  index2: column1
  index3: column2,
  index4: column3
  index5: column1,column2
 
 
  The sole purpose of indexes is to affect performance.
 
  However, if you have index1, there is no point in having
  index2 or index5.
 
  Matthew
 
  Thanks Matthew,
 
  does that mean i can just have index1, index3, index4?
 

 (trying to get the thread back into newest-comments-last order)

 Well, yes you can get away with just index1, index3 and index4, and it
 may well be the optimal solution for you, but it's not entirely
 clear-cut.

 It's true that PG can use index1 to satisfy queries of the form SELECT
 x FROM y WHERE column1=somevalue or column1=a AND column2=b.  It will
 not be as fast as an index lookup from a single index, but depending on
 the size of the tables/indexes and the selectivity of leading column(s)
 in the index, the difference in speed may be trivial.

 On the other hand, if you have individual indexes on column1, column2
 and column3 but no multi-column index, PG can combine the individual
 indexes in memory with a bitmap.  This is not as fast as a normal lookup
 in the multi-column index would be, but can still be a big win over not
 having an index at all.

 To make an educated decision you might want to read over some of the
 online documentation about indexes, in particular these two sections:

 http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html

 and

 http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html

 -- Mark




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


[PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Kranti K K Parisa™
Hi,

Can anyone suggest the performance tips for PostgreSQL using Hibernate.

One of the queries:

- PostgreSQL has INDEX concept and Hibernate also has Column INDEXes. Which
is better among them? or creating either of them is enough? or need to
create both of them?

and any more performace aspects ?

Thanks in advance.

==
KP


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Mark Lewis
On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
 Hi,
 
 Can anyone suggest the performance tips for PostgreSQL using
 Hibernate.
 
 One of the queries:
 
 - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes.
 Which is better among them? or creating either of them is enough? or
 need to create both of them?
 
 and any more performace aspects ?

Hibernate is a library for accessing a database such as PostgreSQL.  It
does not offer any add-on capabilities to the storage layer itself.  So
when you tell Hibernate that a column should be indexed, all that it
does create the associated PostgreSQL index when you ask Hibernate to
build the DB tables for you.  This is part of Hibernate's effort to
protect you from the implementation details of the underlying database,
in order to make supporting multiple databases with the same application
code easier.

So there is no performance difference between a PG index and a Hibernate
column index, because they are the same thing.

The most useful Hibernate performance-tuning advice isn't PG-specific at
all, there are just things that you need to keep in mind when developing
for any database to avoid pathologically bad performance; those tips are
really beyond the scope of this mailing list, Google is your friend
here.

I've been the architect for an enterprise-class application for a few
years now using PostgreSQL and Hibernate together in a
performance-critical context, and honestly I can't think of one time
that I've been bitten by a PG-specific performance issue (a lot of
performance issues with Hibernate that affected all databases though;
you need to know what you're doing to make Hibernate apps that run fast.
If you do run into problems, you can figure out the actual SQL that
Hibernate is issuing and do the normal PostgreSQL explain analyze on it;
usually caused by a missing index.

-- Mark



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Kranti K K Parisa™
Hi Mark,

Thank you very much for the information. I will analyse the DB structure and
create indexes on PG directly.
Are you using any connection pooling like DBCP? or PG POOL?

Regards, KP


On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
  Hi,
 
  Can anyone suggest the performance tips for PostgreSQL using
  Hibernate.
 
  One of the queries:
 
  - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes.
  Which is better among them? or creating either of them is enough? or
  need to create both of them?
 
  and any more performace aspects ?

 Hibernate is a library for accessing a database such as PostgreSQL.  It
 does not offer any add-on capabilities to the storage layer itself.  So
 when you tell Hibernate that a column should be indexed, all that it
 does create the associated PostgreSQL index when you ask Hibernate to
 build the DB tables for you.  This is part of Hibernate's effort to
 protect you from the implementation details of the underlying database,
 in order to make supporting multiple databases with the same application
 code easier.

 So there is no performance difference between a PG index and a Hibernate
 column index, because they are the same thing.

 The most useful Hibernate performance-tuning advice isn't PG-specific at
 all, there are just things that you need to keep in mind when developing
 for any database to avoid pathologically bad performance; those tips are
 really beyond the scope of this mailing list, Google is your friend
 here.

 I've been the architect for an enterprise-class application for a few
 years now using PostgreSQL and Hibernate together in a
 performance-critical context, and honestly I can't think of one time
 that I've been bitten by a PG-specific performance issue (a lot of
 performance issues with Hibernate that affected all databases though;
 you need to know what you're doing to make Hibernate apps that run fast.
 If you do run into problems, you can figure out the actual SQL that
 Hibernate is issuing and do the normal PostgreSQL explain analyze on it;
 usually caused by a missing index.

 -- Mark




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Nikolas Everett
The only thing thats bitten me about hibernate + postgres is that when
inserting into partitioned tables, postgres does not reply with the number
of rows that hibernate expected.  My current (not great) solution is to
define a specific SQLInsert annotation and tell it not to do any checking
like so:

@SQLInsert(sql=insert into bigmetric (account_id, a, b, timestamp, id)
values (?, ?, ?, ?, ?), check=ResultCheckStyle.NONE)

I just steel the sql from the SQL from hibernate's logs.



On Wed, Aug 20, 2008 at 10:40 AM, Mark Lewis [EMAIL PROTECTED] wrote:

 On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
  Hi,
 
  Can anyone suggest the performance tips for PostgreSQL using
  Hibernate.
 
  One of the queries:
 
  - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes.
  Which is better among them? or creating either of them is enough? or
  need to create both of them?
 
  and any more performace aspects ?

 Hibernate is a library for accessing a database such as PostgreSQL.  It
 does not offer any add-on capabilities to the storage layer itself.  So
 when you tell Hibernate that a column should be indexed, all that it
 does create the associated PostgreSQL index when you ask Hibernate to
 build the DB tables for you.  This is part of Hibernate's effort to
 protect you from the implementation details of the underlying database,
 in order to make supporting multiple databases with the same application
 code easier.

 So there is no performance difference between a PG index and a Hibernate
 column index, because they are the same thing.

 The most useful Hibernate performance-tuning advice isn't PG-specific at
 all, there are just things that you need to keep in mind when developing
 for any database to avoid pathologically bad performance; those tips are
 really beyond the scope of this mailing list, Google is your friend
 here.

 I've been the architect for an enterprise-class application for a few
 years now using PostgreSQL and Hibernate together in a
 performance-critical context, and honestly I can't think of one time
 that I've been bitten by a PG-specific performance issue (a lot of
 performance issues with Hibernate that affected all databases though;
 you need to know what you're doing to make Hibernate apps that run fast.
 If you do run into problems, you can figure out the actual SQL that
 Hibernate is issuing and do the normal PostgreSQL explain analyze on it;
 usually caused by a missing index.

 -- Mark



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Kranti K K Parisa™
creating multiple indexes on same column will effect performance?
 for example:

index1 : column1, column2, column3
index2: column1
index3: column2,
index4: column3
index5: column1,column2

which means, i am trying fire the SQL queries keeping columns in the where
conditions. and the possibilities are like the above.

if we create such indexes will it effect on performance?
and what is the best go in this case?


On Wed, Aug 20, 2008 at 8:10 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
  Hi,
 
  Can anyone suggest the performance tips for PostgreSQL using
  Hibernate.
 
  One of the queries:
 
  - PostgreSQL has INDEX concept and Hibernate also has Column INDEXes.
  Which is better among them? or creating either of them is enough? or
  need to create both of them?
 
  and any more performace aspects ?

 Hibernate is a library for accessing a database such as PostgreSQL.  It
 does not offer any add-on capabilities to the storage layer itself.  So
 when you tell Hibernate that a column should be indexed, all that it
 does create the associated PostgreSQL index when you ask Hibernate to
 build the DB tables for you.  This is part of Hibernate's effort to
 protect you from the implementation details of the underlying database,
 in order to make supporting multiple databases with the same application
 code easier.

 So there is no performance difference between a PG index and a Hibernate
 column index, because they are the same thing.

 The most useful Hibernate performance-tuning advice isn't PG-specific at
 all, there are just things that you need to keep in mind when developing
 for any database to avoid pathologically bad performance; those tips are
 really beyond the scope of this mailing list, Google is your friend
 here.

 I've been the architect for an enterprise-class application for a few
 years now using PostgreSQL and Hibernate together in a
 performance-critical context, and honestly I can't think of one time
 that I've been bitten by a PG-specific performance issue (a lot of
 performance issues with Hibernate that affected all databases though;
 you need to know what you're doing to make Hibernate apps that run fast.
 If you do run into problems, you can figure out the actual SQL that
 Hibernate is issuing and do the normal PostgreSQL explain analyze on it;
 usually caused by a missing index.

 -- Mark



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Matthew Wakeling

On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:

creating multiple indexes on same column will effect performance?
 for example:

index1 : column1, column2, column3
index2: column1
index3: column2,
index4: column3
index5: column1,column2


The sole purpose of indexes is to affect performance.

However, if you have index1, there is no point in having index2 or index5.

Matthew

--
Isn't Microsoft Works something of a contradiction?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Mark Lewis
The tradeoffs for multiple indexes are more or less as follows:

1. Having the right indexes makes queries faster, often dramatically so.

2. But more indexes makes inserts/updates slower, although generally not
dramatically slower.

3. Each index requires disk space.  With several indexes, you can easily
have more of your disk taken up by indexes than with actual data.

I would be careful to only create the indexes you need, but it's
probably worse to have too few indexes than too many.  Depends on your
app though.

-- Mark

On Wed, 2008-08-20 at 20:40 +0530, Kranti K K Parisa™ wrote:
 creating multiple indexes on same column will effect performance?
  for example:
 
 index1 : column1, column2, column3
 index2: column1
 index3: column2,
 index4: column3
 index5: column1,column2
 
 which means, i am trying fire the SQL queries keeping columns in the
 where conditions. and the possibilities are like the above.
 
 if we create such indexes will it effect on performance?
 and what is the best go in this case?
 
 
 On Wed, Aug 20, 2008 at 8:10 PM, Mark Lewis [EMAIL PROTECTED]
 wrote:
 On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
 
 
  Hi,
 
  Can anyone suggest the performance tips for PostgreSQL using
  Hibernate.
 
  One of the queries:
 
  - PostgreSQL has INDEX concept and Hibernate also has Column
 INDEXes.
  Which is better among them? or creating either of them is
 enough? or
  need to create both of them?
 
  and any more performace aspects ?
 
 
 
 Hibernate is a library for accessing a database such as
 PostgreSQL.  It
 does not offer any add-on capabilities to the storage layer
 itself.  So
 when you tell Hibernate that a column should be indexed, all
 that it
 does create the associated PostgreSQL index when you ask
 Hibernate to
 build the DB tables for you.  This is part of Hibernate's
 effort to
 protect you from the implementation details of the underlying
 database,
 in order to make supporting multiple databases with the same
 application
 code easier.
 
 So there is no performance difference between a PG index and a
 Hibernate
 column index, because they are the same thing.
 
 The most useful Hibernate performance-tuning advice isn't
 PG-specific at
 all, there are just things that you need to keep in mind when
 developing
 for any database to avoid pathologically bad performance;
 those tips are
 really beyond the scope of this mailing list, Google is your
 friend
 here.
 
 I've been the architect for an enterprise-class application
 for a few
 years now using PostgreSQL and Hibernate together in a
 performance-critical context, and honestly I can't think of
 one time
 that I've been bitten by a PG-specific performance issue (a
 lot of
 performance issues with Hibernate that affected all databases
 though;
 you need to know what you're doing to make Hibernate apps that
 run fast.
 If you do run into problems, you can figure out the actual SQL
 that
 Hibernate is issuing and do the normal PostgreSQL explain
 analyze on it;
 usually caused by a missing index.
 
 -- Mark
 
 
 
 
 --
 Sent via pgsql-performance mailing list
 (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 
 Best Regards
 Kranti Kiran Kumar Parisa
 M: +91 - 9391 - 438 - 738
 +91 - 9849 - 625 - 625
 
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Mark Lewis
Yes, we use connection pooling.  As I recall Hibernate ships with c3p0
connection pooling built-in, which is what we use.  We were happy enough
with c3p0 that we ended up moving our other non-hibernate apps over to
it, away from DBCP.

pgpool does connection pooling at a socket level instead of in a local
library level, so really it's a very different thing.  If your app is
the only thing talking to this database, and you don't have a
multi-database configuration, then it will be easier for you to use a
Java-based connection pooling library like c3p0 or DBCP than to use
pgpool.

-- Mark

On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote:
 Hi Mark,
 
 Thank you very much for the information. I will analyse the DB
 structure and create indexes on PG directly.
 Are you using any connection pooling like DBCP? or PG POOL?
 
 Regards, KP
 
 
 On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED]
 wrote:
 
 On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
  Hi,
 
  Can anyone suggest the performance tips for PostgreSQL using
  Hibernate.
 
  One of the queries:
 
  - PostgreSQL has INDEX concept and Hibernate also has Column
 INDEXes.
  Which is better among them? or creating either of them is
 enough? or
  need to create both of them?
 
  and any more performace aspects ?
 
 
 Hibernate is a library for accessing a database such as
 PostgreSQL.  It
 does not offer any add-on capabilities to the storage layer
 itself.  So
 when you tell Hibernate that a column should be indexed, all
 that it
 does create the associated PostgreSQL index when you ask
 Hibernate to
 build the DB tables for you.  This is part of Hibernate's
 effort to
 protect you from the implementation details of the underlying
 database,
 in order to make supporting multiple databases with the same
 application
 code easier.
 
 So there is no performance difference between a PG index and a
 Hibernate
 column index, because they are the same thing.
 
 The most useful Hibernate performance-tuning advice isn't
 PG-specific at
 all, there are just things that you need to keep in mind when
 developing
 for any database to avoid pathologically bad performance;
 those tips are
 really beyond the scope of this mailing list, Google is your
 friend
 here.
 
 I've been the architect for an enterprise-class application
 for a few
 years now using PostgreSQL and Hibernate together in a
 performance-critical context, and honestly I can't think of
 one time
 that I've been bitten by a PG-specific performance issue (a
 lot of
 performance issues with Hibernate that affected all databases
 though;
 you need to know what you're doing to make Hibernate apps that
 run fast.
 If you do run into problems, you can figure out the actual SQL
 that
 Hibernate is issuing and do the normal PostgreSQL explain
 analyze on it;
 usually caused by a missing index.
 
 -- Mark
 
 
 
 -- 
 
 Best Regards
 Kranti Kiran Kumar Parisa
 M: +91 - 9391 - 438 - 738
 +91 - 9849 - 625 - 625
 
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on view using max

2006-10-19 Thread Ioana Danes
Hello,

Actually what I expected from the planner for this
query (select max(transid) from view) was something
like this :

select max(transid) from (select max(transid) from
archive.transaction union all select max(transid) from
public.transaction)
 
and to apply the max function to each query of the
union. This is what is happening when you use a where
condition, it is using the indexes on each subquery of
the view...
ex: select transid from view where transid = 12;

This way it would be fast enough.

Also for order by and limit I was expecting the same
thing.


Thank you for your time,
Ioana Danes

 constraint exclusion and inheritance won't help him.
 
 The problem is that he has two indexes, and he needs
 to find the max
 between both of them. PostgreSQL isn't smart enough
 to recognize that it
 can use two indexes, find the max in each one, and
 find the max of those
 two values.
 
 Regards,
   Jeff Davis
 
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-19 Thread Ioana Danes
Hi,
I tried and this does does not work either.

Thank you,
Ioana
--- Jim C. Nasby [EMAIL PROTECTED] wrote:

 On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis
 wrote:
  On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby
 wrote:
   Sorry, don't have the earlier part of this
 thread, but what about...
   
   SELECT greatest(max(a), max(b)) ...
   
   ?
  
  To fill you in, we're trying to get the max of a
 union (a view across
  two physical tables).
 
 UNION or UNION ALL? You definitely don't want to do
 a plain UNION if you
 can possibly avoid it.
 
  It can be done if you're creative with the query;
 I suggested a query
  that selected the max of the max()es of the
 individual tables. Your
  query could work too. However, the trick would be
 getting postgresql to
  recognize that it can transform SELECT max(x)
 FROM foo into that,
  where foo is a view of a union.
  
  If PostgreSQL could sort the result of a union by
 merging the results of
  two index scans, I think the problem would be
 solved. Is there something
  preventing this, or is it just something that
 needs to be added to the
  planner?
 
 Hrm... it'd be worth trying the old ORDER BY ...
 LIMIT 1 trick just to
 see if that worked in this case, but I don't have
 much hope for that.
 -- 
 Jim Nasby   
 [EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com 
 512.569.9461 (cell)
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on

2006-10-19 Thread Ioana Danes
Hello,

It looks like some of you missed my first email but my
problem is not to find a replacement for this select:
select max(transid) from someunionview
Thare are plenty of solutions for doing this...

My point is to split a tale in two and to make this
transparent for the developers as a first step. On the
second step they will improve some of the queries but
that is another story.

So I would like to know if there is any plan to
improve this type of query for views in the near
future, or maybe it is alredy improved in 8.2 version?
I have the same problem and question for: 
select transid from someunionview order by transid
desc limit 1;
 
Thank you for your time,
Ioana Danes

--- Tom Lane [EMAIL PROTECTED] wrote:

 Jeff Davis [EMAIL PROTECTED] writes:
  If PostgreSQL could sort the result of a union by
 merging the results of
  two index scans, I think the problem would be
 solved. Is there something
  preventing this, or is it just something that
 needs to be added to the
  planner?
 
 It's something on the wish-list.  Personally I'd be
 inclined to try to
 rewrite the query as a plain MAX() across rewritten
 per-table indexed
 queries, rather than worry about mergesort or
 anything like that.
 There wasn't any very good way to incorporate that
 idea when planagg.c
 was first written, but now that the planner has an
 explicit notion of
 append relations it might be relatively
 straightforward.
 
   regards, tom lane
 
 ---(end of
 broadcast)---
 TIP 6: explain analyze is your friend
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Postgresql 8.1.4 - performance issues for select on view using max

2006-10-18 Thread Ioana Danes
Hi everyone,

I am doing a test for a scenario where I have 2
schemas  one (public) for the operational data and
another one (archive) for old, archived data. So
basically I want to split the data from some huge
tables in two. All data before 2006 in archive and all
data after and including  2006 in public. 

Let's say I have a table named public.AllTransactions
with data before and including 2006.
I want to move all the data  2006 into a new table
named archive.transaction (in archive schema)
I also want to move all data = 2006 into a new table
named public.transaction (in public schema).

In order to make this transparent for the developers I
want to drop the original table public.AllTransactions
 and to create a view with the same name that is a
union between the two new tables:

create view public.AllTransactions as
select * from public.transaction
union all 
select * from archive.transaction

On this view I will create rules for insert, update,
delete...

Testing some selects I know we have in the application
I got into a scenario where my plan does not work
without doing code change. This scenario is:

select max(transid) from alltransaction;

because the planner does not use the existent indexes
on the 2 new tables: public.transaction and
archive.transaction

Here are the results of the explain analyze:

1. Select only from one table is OK:
-

# explain select max(transid) from public.transaction;

  QUERY
PLAN  
   


--
 Result  (cost=0.04..0.05 rows=1 width=0)
   InitPlan
 -  Limit  (cost=0.00..0.04 rows=1 width=8)
   -  Index Scan Backward using
pk_transaction on transaction (cost=0.00..357870.46   
  
rows=9698002 width=8)
 Filter: (transid IS NOT NULL)
(5 rows)


2. Select from the view is doing a sequential scan:
---
# explain analyze select max(transid) from
alltransaction;

QUERY PLAN
  
---
  -
 Aggregate  (cost=200579993.70..200579993.71 rows=1
width=8) (actual time=115778.101..115778.103 rows=1
loops=1)
   -  Append  (cost=1.00..200447315.74
rows=10614237 width=143) (actual time=0.082..95146.144
rows=10622206 loops=   1)
 -  Seq Scan transaction
(cost=1.00..100312397.02 rows=9698002
width=143) (actual time=0.078..56002.778 rows=
  9706475 loops=1)
 -  Seq Scan on transaction 
(cost=1.00..100028776.35 rows=916235
width=143) (actual time=8.822..2799.496 rows= 
 915731 loops=1)
 Total runtime: 115778.200 ms
(5 rows)

Is this a bug or this is how the planner is suppose to
work?

The same problem I have on the following select:
select transid from alltransaction order by transid
desc limit 1;

Thank you for your time,
Ioana


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Postgresql 8.1.4 - performance issues for select on view using max

2006-10-18 Thread Dimitri Fontaine
Hi,

Le mercredi 18 octobre 2006 21:51, Ioana Danes a écrit :
 I am doing a test for a scenario where I have 2
 schemas  one (public) for the operational data and
 another one (archive) for old, archived data. So
 basically I want to split the data from some huge
 tables in two. All data before 2006 in archive and all
 data after and including  2006 in public.
[...]
 I got into a scenario where my plan does not work
 without doing code change.

This sounds a lot as a ddl partitionning, you may want to add some checks to 
your schema and set constraint_exclusion to on in your postgresql.conf.

Please read following documentation material :
  http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Regards,
-- 
Dimitri Fontaine
http://www.dalibo.com/


pgpTgCewok9P3.pgp
Description: PGP signature


  1   2   >