Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Thanks Mark,

We are using DBCP and i found something about pgpool in some forum threads,
which gave me queries on it. But I am clear now.

On Wed, Aug 20, 2008 at 8:59 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 Yes, we use connection pooling.  As I recall Hibernate ships with c3p0
 connection pooling built-in, which is what we use.  We were happy enough
 with c3p0 that we ended up moving our other non-hibernate apps over to
 it, away from DBCP.

 pgpool does connection pooling at a socket level instead of in a local
 library level, so really it's a very different thing.  If your app is
 the only thing talking to this database, and you don't have a
 multi-database configuration, then it will be easier for you to use a
 Java-based connection pooling library like c3p0 or DBCP than to use
 pgpool.

 -- Mark

 On Wed, 2008-08-20 at 20:32 +0530, Kranti K K Parisa™ wrote:
  Hi Mark,
 
  Thank you very much for the information. I will analyse the DB
  structure and create indexes on PG directly.
  Are you using any connection pooling like DBCP? or PG POOL?
 
  Regards, KP
 
 
  On Wed, Aug 20, 2008 at 8:05 PM, Mark Lewis [EMAIL PROTECTED]
  wrote:
 
  On Wed, 2008-08-20 at 17:55 +0530, Kranti K K Parisa™ wrote:
   Hi,
  
   Can anyone suggest the performance tips for PostgreSQL using
   Hibernate.
  
   One of the queries:
  
   - PostgreSQL has INDEX concept and Hibernate also has Column
  INDEXes.
   Which is better among them? or creating either of them is
  enough? or
   need to create both of them?
  
   and any more performace aspects ?
 
 
  Hibernate is a library for accessing a database such as
  PostgreSQL.  It
  does not offer any add-on capabilities to the storage layer
  itself.  So
  when you tell Hibernate that a column should be indexed, all
  that it
  does create the associated PostgreSQL index when you ask
  Hibernate to
  build the DB tables for you.  This is part of Hibernate's
  effort to
  protect you from the implementation details of the underlying
  database,
  in order to make supporting multiple databases with the same
  application
  code easier.
 
  So there is no performance difference between a PG index and a
  Hibernate
  column index, because they are the same thing.
 
  The most useful Hibernate performance-tuning advice isn't
  PG-specific at
  all, there are just things that you need to keep in mind when
  developing
  for any database to avoid pathologically bad performance;
  those tips are
  really beyond the scope of this mailing list, Google is your
  friend
  here.
 
  I've been the architect for an enterprise-class application
  for a few
  years now using PostgreSQL and Hibernate together in a
  performance-critical context, and honestly I can't think of
  one time
  that I've been bitten by a PG-specific performance issue (a
  lot of
  performance issues with Hibernate that affected all databases
  though;
  you need to know what you're doing to make Hibernate apps that
  run fast.
  If you do run into problems, you can figure out the actual SQL
  that
  Hibernate is issuing and do the normal PostgreSQL explain
  analyze on it;
  usually caused by a missing index.
 
  -- Mark
 
 
 
  --
 
  Best Regards
  Kranti Kiran Kumar Parisa
  M: +91 - 9391 - 438 - 738
  +91 - 9849 - 625 - 625
 
 




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Thanks Matthew,

does that mean i can just have index1, index3, index4?

On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling [EMAIL PROTECTED]wrote:

 On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:

 creating multiple indexes on same column will effect performance?
  for example:

 index1 : column1, column2, column3
 index2: column1
 index3: column2,
 index4: column3
 index5: column1,column2


 The sole purpose of indexes is to affect performance.

 However, if you have index1, there is no point in having index2 or index5.

 Matthew

 --
 Isn't Microsoft Works something of a contradiction?
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625


Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken


Am 20.08.2008 um 20:06 schrieb Scott Carey:


Ok, so the problem boils down to the sort at the end.

The query up through the merge join on domain is as fast as its  
going to get.  The sort at the end however, should not happen  
ideally.  There are not that many rows returned, and it should  
hash_aggregate if it thinks there is enough space to do so.


The query planner is going to choose the sort  agg over the hash- 
agg if it estimates the total number of resulting rows to be large  
enough so that the hash won't fit in work_mem.   However, there  
seems to be another factor here based on this:



GroupAggregate  (cost=11745105.66..12277396.
81 rows=28704 width=12)
  -  Sort  (cost=11745105.66..11878034.93 rows=53171707 width=12)

Sort Key: a.user, b.category
-  Merge Join  (cost=149241.25..1287278.89 rows=53171707  
width=12)


  Merge Cond: (b.domain = a.domain)


The planner actually thinks there will only be 28704 rows returned  
of width 12.  But it chooses to sort 53 million rows before  
aggregating.  Thats either a bug or there's something else wrong  
here.   That is the wrong way to aggregate those results no matter  
how much work_mem you have unless I'm completely missing something...


You can try rearranging the query just to see if you can work around  
this.  What happens if you compare the explain on:


select
 a.user, b.category, sum(1.0/b.cat_count)::float
 from result a, domain_categories b
 where a.domain = b.domain
 and b.depth  4
 and a.results  100
 and a.user  3
 group by a.user, b.category




HashAggregate  (cost=1685527.69..1686101.77 rows=28704 width=12)
  -  Merge Join  (cost=148702.25..1286739.89 rows=53171707 width=12)
Merge Cond: (b.domain = a.domain)
-  Index Scan using domain_categories_domain on  
domain_categories b  (cost=0.00..421716.32 rows=5112568 width=12)

  Filter: (depth  4)
-  Sort  (cost=148415.16..148513.60 rows=39376 width=8)
  Sort Key: a.domain
  -  Bitmap Heap Scan on result a   
(cost=1249.93..145409.79 rows=39376 width=8)

Recheck Cond: (user  3)
Filter: (results  100)
-  Bitmap Index Scan on result_user_idx   
(cost=0.00..1240.08 rows=66881 width=0)

  Index Cond: (user  3)




to

select
 c.user, c.category, sum(1.0/c.cat_count)::float
 from (select a.user, b.category, b.cat_count
   from result a, domain_categories b
 where a.domain = b.domain
   and b.depth  4
   and a.results  100
   and a.user  3 ) c
  group by c.user, c.category




HashAggregate  (cost=1685527.69..1686101.77 rows=28704 width=12)
  -  Merge Join  (cost=148702.25..1286739.89 rows=53171707 width=12)
Merge Cond: (b.domain = a.domain)
-  Index Scan using domain_categories_domain on  
domain_categories b  (cost=0.00..421716.32 rows=5112568 width=12)

  Filter: (depth  4)
-  Sort  (cost=148415.16..148513.60 rows=39376 width=8)
  Sort Key: a.domain
  -  Bitmap Heap Scan on result a   
(cost=1249.93..145409.79 rows=39376 width=8)

Recheck Cond: (user  3)
Filter: (results  100)
-  Bitmap Index Scan on result_user_idx   
(cost=0.00..1240.08 rows=66881 width=0)

  Index Cond: (user  3)



It shouldn't make a difference, but I've seen things like this help  
before so its worth a try.  Make sure work_mem is reasonably sized  
for this test.


It's exactly the same. work_mem was set to 3000MB.




Another thing that won't be that fast, but may avoid the sort, is to  
select the subselection above into a temporary table, analyze it,  
and then do the outer select.  Make sure your settings for temporary  
space (temp_buffers in 8.3) are large enough for the intermediate  
results (700MB should do it).  That won't be that fast, but it will  
most likely be faster than sorting 50 million + rows.  There are  
lots of problems with this approach but it may be worth the  
experiment.




I'll try this.

Thanks so far!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken


Am 20.08.2008 um 20:28 schrieb Tom Lane:


Scott Carey [EMAIL PROTECTED] writes:
The planner actually thinks there will only be 28704 rows returned  
of width
12.  But it chooses to sort 53 million rows before aggregating.   
Thats
either a bug or there's something else wrong here.   That is the  
wrong way
to aggregate those results no matter how much work_mem you have  
unless I'm

completely missing something...


That does look weird.  What are the datatypes of the columns being
grouped by?  Maybe they're not hashable?

Another forcing function that prevents use of HashAgg is DISTINCT
aggregates, but you don't seem to have any in this query...

regards, tom lane


The datatypes are both integers. There is no DISTINCT in this query.
Thanks anyway!



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken


Am 21.08.2008 um 09:04 schrieb Moritz Onken:



Am 20.08.2008 um 20:28 schrieb Tom Lane:


Scott Carey [EMAIL PROTECTED] writes:
The planner actually thinks there will only be 28704 rows returned  
of width
12.  But it chooses to sort 53 million rows before aggregating.   
Thats
either a bug or there's something else wrong here.   That is the  
wrong way
to aggregate those results no matter how much work_mem you have  
unless I'm

completely missing something...


That does look weird.  What are the datatypes of the columns being
grouped by?  Maybe they're not hashable?

Another forcing function that prevents use of HashAgg is DISTINCT
aggregates, but you don't seem to have any in this query...

regards, tom lane


The datatypes are both integers. There is no DISTINCT in this query.
Thanks anyway!



insert into setup1 (select
  a.user, b.category, sum(1.0/b.cat_count)::float
  from result a, domain_categories b
  where a.domain = b.domain
  and b.depth  4
  and a.results  100
  group by a.user, b.category);

This query inserted a total of 16,000,000 rows and, with work_mem set  
to 3000mb,

took about 24 hours.

Any more ideas to speed this up?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to setup disk spindles for best performance

2008-08-21 Thread Christiaan Willemsen

Thanks Joshua,

So what about putting the indexes on a separate array? Since we do a lot 
of inserts indexes are going to be worked on a lot of the time.


Regards,

Christiaan

Joshua D. Drake wrote:

Christiaan Willemsen wrote:
So, what you are basically saying, is that a single mirror is in 
general more than enough to facilitate the transaction log.


http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ 


http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

And to answer your question, yes. Transaction logs are written 
sequentially. You do not need a journaled file system and raid 1 is 
plenty for most if not all work loads.


Sincerely,

Joshua D. Drake



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-21 Thread André Volpato

André Volpato escreveu:


David Wilson escreveu:

On Wed, Aug 20, 2008 at 2:30 PM, André Volpato
[EMAIL PROTECTED] wrote:

 

The CPU is 100% used since a few hours ago. Can anyone tell why?



Sounds like you've just got a CPU bound query. The data may even all
be in cache.

Some information on database size, along with EXPLAIN results for your
queries, would help here.
  


The query itself runs smoothly, almost with no delay.



You where right about the cache.
After some experiences, I noticed that the arrays are being used, but 
only for a short time...
So, what is slowing down is the CPU (Intel(R) Pentium(R) Dual  CPU  
E2160  @ 1.80GHz)


In practice, I have noticed that dual 1.8 is worse than single 3.0. We 
have another server wich

is a Pentium D 3.0 GHz, that runs faster.

Explain output:
HashAggregate  (cost=19826.23..19826.96 rows=73 width=160) (actual 
time=11826.754..11826.754 rows=0 loops=1)
  -  Subquery Scan b2  (cost=19167.71..19817.21 rows=722 width=160) 
(actual time=11826.752..11826.752 rows=0 loops=1)
Filter: (bds_internacoes(200805, 200806, (b2.cod)::text, 
'qtdI'::text, 'P'::bpchar) = 1::numeric)
-  Limit  (cost=19167.71..19248.89 rows=2165 width=48) (actual 
time=415.157..621.043 rows=28923 loops=1)
  -  HashAggregate  (cost=19167.71..19248.89 rows=2165 
width=48) (actual time=415.155..593.309 rows=28923 loops=1)
-  Bitmap Heap Scan on bds_beneficiario b  
(cost=832.53..18031.61 rows=56805 width=48) (actual time=68.259..160.912 
rows=56646 loops=1)
  Recheck Cond: ((benef_referencia = 200805) 
AND (benef_referencia = 200806))
  -  Bitmap Index Scan on ibds_beneficiario2  
(cost=0.00..818.33 rows=56805 width=0) (actual time=63.293..63.293 
rows=56646 loops=1)
Index Cond: ((benef_referencia = 
200805) AND (benef_referencia = 200806))

Total runtime: 11827.374 ms

Postgres read the array in less than 1 sec, and the other 10s he takes 
100% of CPU usage,

wich is, in this case, one of the two cores at 1.8GHz.

I am a bit confused about what CPU is best for Postgres. Our apps is 
mostly read, with

a few connections and heavy queryes.
Does it worth a multi-core ?

--

[]´s, ACV



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Mark Lewis
On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote:

 On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling
 [EMAIL PROTECTED] wrote:
 On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:
 creating multiple indexes on same column will effect
 performance?
  for example:
 
 index1 : column1, column2, column3
 index2: column1
 index3: column2,
 index4: column3
 index5: column1,column2
 
 
 The sole purpose of indexes is to affect performance.
 
 However, if you have index1, there is no point in having
 index2 or index5.
 
 Matthew
 
 Thanks Matthew,
 
 does that mean i can just have index1, index3, index4?
 

(trying to get the thread back into newest-comments-last order)

Well, yes you can get away with just index1, index3 and index4, and it
may well be the optimal solution for you, but it's not entirely
clear-cut.

It's true that PG can use index1 to satisfy queries of the form SELECT
x FROM y WHERE column1=somevalue or column1=a AND column2=b.  It will
not be as fast as an index lookup from a single index, but depending on
the size of the tables/indexes and the selectivity of leading column(s)
in the index, the difference in speed may be trivial.

On the other hand, if you have individual indexes on column1, column2
and column3 but no multi-column index, PG can combine the individual
indexes in memory with a bitmap.  This is not as fast as a normal lookup
in the multi-column index would be, but can still be a big win over not
having an index at all.

To make an educated decision you might want to read over some of the
online documentation about indexes, in particular these two sections:

http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html

and

http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html

-- Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-21 Thread Mark Mielke

André Volpato wrote:
In practice, I have noticed that dual 1.8 is worse than single 3.0. We 
have another server wich

is a Pentium D 3.0 GHz, that runs faster.
...
Postgres read the array in less than 1 sec, and the other 10s he takes 
100% of CPU usage,

wich is, in this case, one of the two cores at 1.8GHz.

I am a bit confused about what CPU is best for Postgres. Our apps is 
mostly read, with

a few connections and heavy queryes.
Does it worth a multi-core ?


How are you doing your benchmarking? If you have two or more queries 
running at the same time, I would expect the 1.8 Ghz x 2 to be 
significant and possibly out-perform the 3.0 Ghz x 1. If you usually 
only have one query running at the same time, I expect the 3.0 Ghz x 1 
to always win. PostgreSQL isn't good at splitting the load from a single 
client across multiple CPU cores.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-21 Thread Tom Lane
=?ISO-8859-1?Q?Andr=E9_Volpato?= [EMAIL PROTECTED] writes:
 Explain output:
  HashAggregate  (cost=19826.23..19826.96 rows=73 width=160) (actual 
 time=11826.754..11826.754 rows=0 loops=1)
-  Subquery Scan b2  (cost=19167.71..19817.21 rows=722 width=160) 
 (actual time=11826.752..11826.752 rows=0 loops=1)
  Filter: (bds_internacoes(200805, 200806, (b2.cod)::text, 
 'qtdI'::text, 'P'::bpchar) = 1::numeric)
  -  Limit  (cost=19167.71..19248.89 rows=2165 width=48) (actual 
 time=415.157..621.043 rows=28923 loops=1)

So I guess the question is what is the bds_internacoes function, and
why is it so slow?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Scott Carey
It looks to me like the work_mem did have an effect.

Your earlier queries had a sort followed by group aggregate at the top, and
now its a hash-aggregate.  So the query plan DID change.  That is likely
where the first 10x performance gain came from.

The top of the plan was:

GroupAggregate  (cost=11745105.66..12277396.
81 rows=28704 width=12)
  -  Sort  (cost=11745105.66..11878034.93 rows=53171707 width=12)
-  Merge Join  (cost=149241.25..1287278.89 rows=53171707 width=12)
  Merge Cond: (b.domain = a.domain)

and now it is:

HashAggregate  (cost=1685527.69..1686101.77 rows=28704 width=12)
  -  Merge Join  (cost=148702.25..1286739.89 rows=53171707 width=12)
Merge Cond: (b.domain = a.domain)

The HashAggregate replaced the Sort followed by GroupAggregate at about 1/10
the cost.

It probably only took the first couple hundred MB of work_mem to do this, or
less given that you were at the default originally.
Note how the estimated cost on the latter is 1.6 million, and it is 11
million in the first one.

You won't get a large table aggregate significantly faster than this --
you're asking it to scan through 53 million records and aggregate.  An
explain analyze will be somewhat instructive to help identify if there is
more I/O or CPU bound overall as we can compare the estimated cost with the
actual times, but this probably won't get very far.

After that, inserting 16M rows requires rather different tuning and
bottleneck identification.

On Thu, Aug 21, 2008 at 12:03 AM, Moritz Onken [EMAIL PROTECTED]wrote:


 Am 20.08.2008 um 20:06 schrieb Scott Carey:

  Ok, so the problem boils down to the sort at the end.

 The query up through the merge join on domain is as fast as its going to
 get.  The sort at the end however, should not happen ideally.  There are not
 that many rows returned, and it should hash_aggregate if it thinks there is
 enough space to do so.

 The query planner is going to choose the sort  agg over the hash-agg if
 it estimates the total number of resulting rows to be large enough so that
 the hash won't fit in work_mem.   However, there seems to be another factor
 here based on this:


 GroupAggregate  (cost=11745105.66..12277396.
 81 rows=28704 width=12)
   -  Sort  (cost=11745105.66..11878034.93 rows=53171707 width=12)

 Sort Key: a.user, b.category
 -  Merge Join  (cost=149241.25..1287278.89 rows=53171707
 width=12)

   Merge Cond: (b.domain = a.domain)


 The planner actually thinks there will only be 28704 rows returned of
 width 12.  But it chooses to sort 53 million rows before aggregating.  Thats
 either a bug or there's something else wrong here.   That is the wrong way
 to aggregate those results no matter how much work_mem you have unless I'm
 completely missing something...

 You can try rearranging the query just to see if you can work around this.
  What happens if you compare the explain on:

 select
  a.user, b.category, sum(1.0/b.cat_count)::float
  from result a, domain_categories b
  where a.domain = b.domain
  and b.depth  4
  and a.results  100
  and a.user  3
  group by a.user, b.category



 HashAggregate  (cost=1685527.69..1686101.77 rows=28704 width=12)
   -  Merge Join  (cost=148702.25..1286739.89 rows=53171707 width=12)
 Merge Cond: (b.domain = a.domain)
 -  Index Scan using domain_categories_domain on domain_categories
 b  (cost=0.00..421716.32 rows=5112568 width=12)
   Filter: (depth  4)
 -  Sort  (cost=148415.16..148513.60 rows=39376 width=8)
   Sort Key: a.domain
   -  Bitmap Heap Scan on result a  (cost=1249.93..145409.79
 rows=39376 width=8)
 Recheck Cond: (user  3)
 Filter: (results  100)
 -  Bitmap Index Scan on result_user_idx
  (cost=0.00..1240.08 rows=66881 width=0)
   Index Cond: (user  3)



  to

 select
  c.user, c.category, sum(1.0/c.cat_count)::float
  from (select a.user, b.category, b.cat_count
   from result a, domain_categories b
 where a.domain = b.domain
   and b.depth  4
   and a.results  100
   and a.user  3 ) c
  group by c.user, c.category



 HashAggregate  (cost=1685527.69..1686101.77 rows=28704 width=12)
   -  Merge Join  (cost=148702.25..1286739.89 rows=53171707 width=12)
 Merge Cond: (b.domain = a.domain)
 -  Index Scan using domain_categories_domain on domain_categories
 b  (cost=0.00..421716.32 rows=5112568 width=12)
   Filter: (depth  4)
 -  Sort  (cost=148415.16..148513.60 rows=39376 width=8)
   Sort Key: a.domain
   -  Bitmap Heap Scan on result a  (cost=1249.93..145409.79
 rows=39376 width=8)
 Recheck Cond: (user  3)
 Filter: (results  100)
 -  Bitmap Index Scan on result_user_idx
  (cost=0.00..1240.08 rows=66881 width=0)
   Index Cond: (user  3)



 

Re: [PERFORM] Postgres not using array

2008-08-21 Thread André Volpato

Tom Lane escreveu:

=?ISO-8859-1?Q?Andr=E9_Volpato?= [EMAIL PROTECTED] writes:
  

Explain output:
 HashAggregate  (cost=19826.23..19826.96 rows=73 width=160) (actual 
time=11826.754..11826.754 rows=0 loops=1)
   -  Subquery Scan b2  (cost=19167.71..19817.21 rows=722 width=160) 
(actual time=11826.752..11826.752 rows=0 loops=1)
 Filter: (bds_internacoes(200805, 200806, (b2.cod)::text, 
'qtdI'::text, 'P'::bpchar) = 1::numeric)
 -  Limit  (cost=19167.71..19248.89 rows=2165 width=48) (actual 
time=415.157..621.043 rows=28923 loops=1)



So I guess the question is what is the bds_internacoes function, and
why is it so slow?


This function is quite fast:
Aggregate  (cost=5.17..5.18 rows=1 width=12) (actual time=0.286..0.287 
rows=1 loops=1)
  -  Index Scan using iinternacoes4 on internacoes  (cost=0.01..5.16 
rows=1 width=12) (actual time=0.273..0.273 rows=0 loops=1)
Index Cond: ano * 100) + mes) = 200801) AND (((ano * 100) 
+ mes) = 200806) AND ((cod_benef)::text = '0005375200'::text))

Filter: (tipo_internacao = 'P'::bpchar)
Total runtime: 0.343 ms


The problem is that its fired up against 29K rows, wich takes the
total runtime about 10s.

We are guessing that a dual core 3.0GHz will beat up a quad core 2.2,
at least in this environmnent with less than 4 concurrent queryes.

--

[]´s, ACV



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to setup disk spindles for best performance

2008-08-21 Thread Scott Carey
Indexes will be random write workload, but these won't by synchronous writes
and will be buffered by the raid controller's cache.  Assuming you're using
a hardware raid controller that is, and one that doesn't have major
performance problems on your platform.  Which brings those questions up ---
what is your RAID card and OS?

For reads, if your shared_buffers is large enough, your heavily used indexes
won't likely go to disk much at all.

A good raid controller will typically help distribute the workload
effectively on a large array.

You probably want a simple 2 disk mirror or 4 disks in raid 10 for your OS +
xlog, and the rest for data + indexes -- with hot spares IF your card
supports them.

The biggest risk to splitting up data and indexes is that you don't know how
much I/O each needs relative to each other, and if this isn't a relatively
constant ratio you will have one subset busy while the other subset is idle.
Unless you have extensively profiled your disk activity into index and data
subsets and know roughly what the optimal ratio is, its probably going to
cause more problems than it fixes.
Furthermore, if this ratio changes at all, its a maintenance nightmare.  How
much each would need in a perfect world is application dependant, so there
can be no general recommendation other than:  don't do it.

On Thu, Aug 21, 2008 at 1:34 AM, Christiaan Willemsen 
[EMAIL PROTECTED] wrote:

 Thanks Joshua,

 So what about putting the indexes on a separate array? Since we do a lot of
 inserts indexes are going to be worked on a lot of the time.

 Regards,

 Christiaan


 Joshua D. Drake wrote:

 Christiaan Willemsen wrote:

 So, what you are basically saying, is that a single mirror is in general
 more than enough to facilitate the transaction log.



 http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
 http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

 And to answer your question, yes. Transaction logs are written
 sequentially. You do not need a journaled file system and raid 1 is plenty
 for most if not all work loads.

 Sincerely,

 Joshua D. Drake


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] How to setup disk spindles for best performance

2008-08-21 Thread Christiaan Willemsen

Hi Scott,

Great info! Our RAID card is at  the moment a ICP vortex (Adaptec) 
ICP5165BR, and I'll be using it with Ubuntu server 8.04. I tried 
OpenSolaris, but it yielded even more terrible performance, specially 
using ZFS.. I guess that was just a missmatch. Anyway, I'm going to 
return the controller, because it does not scale very well with more 
that 4 disks in raid 10. Bandwidth is limited to 350MB/sec, and IOPS 
scale badly with extra disks...


So I guess, I'll be waiting for another controller first. The idea for 
xlog + os on 4 disk raid 10 and the rest for the data sound good :) I 
hope it will turn out that way too.. First another controller..


Regards,

Christiaan

Scott Carey wrote:
Indexes will be random write workload, but these won't by synchronous 
writes and will be buffered by the raid controller's cache.  Assuming 
you're using a hardware raid controller that is, and one that doesn't 
have major performance problems on your platform.  Which brings those 
questions up --- what is your RAID card and OS?


For reads, if your shared_buffers is large enough, your heavily used 
indexes won't likely go to disk much at all.


A good raid controller will typically help distribute the workload 
effectively on a large array.


You probably want a simple 2 disk mirror or 4 disks in raid 10 for 
your OS + xlog, and the rest for data + indexes -- with hot spares IF 
your card supports them.


The biggest risk to splitting up data and indexes is that you don't 
know how much I/O each needs relative to each other, and if this isn't 
a relatively constant ratio you will have one subset busy while the 
other subset is idle.
Unless you have extensively profiled your disk activity into index and 
data subsets and know roughly what the optimal ratio is, its probably 
going to cause more problems than it fixes. 
Furthermore, if this ratio changes at all, its a maintenance 
nightmare.  How much each would need in a perfect world is application 
dependant, so there can be no general recommendation other than:  
don't do it.


On Thu, Aug 21, 2008 at 1:34 AM, Christiaan Willemsen 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:


Thanks Joshua,

So what about putting the indexes on a separate array? Since we do
a lot of inserts indexes are going to be worked on a lot of the time.

Regards,

Christiaan


Joshua D. Drake wrote:

Christiaan Willemsen wrote:

So, what you are basically saying, is that a single mirror
is in general more than enough to facilitate the
transaction log.



http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

And to answer your question, yes. Transaction logs are written
sequentially. You do not need a journaled file system and raid
1 is plenty for most if not all work loads.

Sincerely,

Joshua D. Drake


-- 
Sent via pgsql-performance mailing list

(pgsql-performance@postgresql.org
mailto:pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Moritz Onken


Am 21.08.2008 um 16:39 schrieb Scott Carey:


It looks to me like the work_mem did have an effect.

Your earlier queries had a sort followed by group aggregate at the  
top, and now its a hash-aggregate.  So the query plan DID change.   
That is likely where the first 10x performance gain came from.


But it didn't change as I added the sub select.
Thank you guys very much. The speed is now ok and I hope I can finish  
tihs work soon.


But there is another problem. If I run this query without the  
limitation of the user id, postgres consumes about 150GB of disk space  
and dies with


ERROR:  could not write block 25305351 of temporary file: No space  
left on device


After that the avaiable disk space is back to normal.

Is this normal? The resulting table (setup1) is not bigger than 1.5 GB.

moritz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with a lot of data

2008-08-21 Thread Merlin Moncure
On Thu, Aug 21, 2008 at 11:07 AM, Moritz Onken [EMAIL PROTECTED] wrote:

 Am 21.08.2008 um 16:39 schrieb Scott Carey:

 It looks to me like the work_mem did have an effect.

 Your earlier queries had a sort followed by group aggregate at the top,
 and now its a hash-aggregate.  So the query plan DID change.  That is likely
 where the first 10x performance gain came from.

 But it didn't change as I added the sub select.
 Thank you guys very much. The speed is now ok and I hope I can finish tihs
 work soon.

 But there is another problem. If I run this query without the limitation of
 the user id, postgres consumes about 150GB of disk space and dies with

 ERROR:  could not write block 25305351 of temporary file: No space left on
 device

 After that the avaiable disk space is back to normal.

 Is this normal? The resulting table (setup1) is not bigger than 1.5 GB.

Maybe the result is too big.  if you explain the query, you should get
an estimate of rows returned.  If this is the case, you need to
rethink your query or do something like a cursor to browse the result.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Dan Harris
My company finally has the means to install a new database server for 
replication.  I have Googled and found a lot of sparse information out 
there regarding replication systems for PostgreSQL and a lot of it looks 
very out-of-date.  Can I please get some ideas from those of you that 
are currently using fail-over replication systems?  What advantage does 
your solution have?  What are the gotchas I need to worry about?


My desire would be to have a parallel server that could act as a hot 
standby system with automatic fail over in a multi-master role.  If our 
primary server goes down for whatever reason, the secondary would take 
over and handle the load seamlessly.  I think this is really the holy 
grail scenario and I understand how difficult it is to achieve.  
Especially since we make frequent use of sequences in our databases.  If 
MM is too difficult, I'm willing to accept a hot-standby read-only 
system that will handle queries until we can fix whatever ails the master. 

We are primary an OLAP environment but there is a constant stream of 
inserts into the databases.  There are 47 different databases hosted on 
the primary server and this number will continue to scale up to whatever 
the server seems to support.  The reason I mention this number is that 
it seems that those systems that make heavy use of schema changes 
require a lot of fiddling.  For a single database, this doesn't seem 
too problematic, but any manual work involved and administrative 
overhead will scale at the same rate as the database count grows and I 
certainly want to minimize as much fiddling as possible.


We are using 8.3 and the total combined size for the PG data directory 
is 226G.  Hopefully I didn't neglect to include more relevant information.


As always, thank you for your insight.

-Dan



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-21 Thread Tom Lane
=?ISO-8859-1?Q?Andr=E9_Volpato?= [EMAIL PROTECTED] writes:
 Tom Lane escreveu:
 So I guess the question is what is the bds_internacoes function, and
 why is it so slow?  

 This function is quite fast:

Well, fast is relative.  It's not fast enough, or you wouldn't have
been complaining.

 We are guessing that a dual core 3.0GHz will beat up a quad core 2.2,
 at least in this environmnent with less than 4 concurrent queryes.

The most you could hope for from that is less than a 50% speedup.  I'd
suggest investing some tuning effort first.  Some rethinking of your
schema, for example, might buy you orders of magnitude ... with no new
hardware investment.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to setup disk spindles for best performance

2008-08-21 Thread Ron Mayer

Scott Carey wrote:
For reads, if your shared_buffers is large enough, your heavily used 
indexes won't likely go to disk much at all.


ISTM this would happen regardless of your shared_buffers setting.
If you have enough memory the OS should cache the frequently used
pages regardless of shared_buffers; and if you don't have enough
memory it won't.


... splitting up data and indexes ...


FWIW, I've had a system where moving pgsql_tmp to different disks
helped more than moving indexes.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-21 Thread André Volpato

Tom Lane escreveu:

We are guessing that a dual core 3.0GHz will beat up a quad core 2.2,
at least in this environmnent with less than 4 concurrent queryes.


The most you could hope for from that is less than a 50% speedup.  I'd
suggest investing some tuning effort first.  Some rethinking of your
schema, for example, might buy you orders of magnitude ... with no new
hardware investment.


I think we almost reached the tuning limit, without changing the schema.

You are right, the whole design must be rethinked.
But this question about single vs multi cores has bitten me.

We will rethink the investiment in new hardware too. The databases that are
used less often will be managed to a single core server.

--

[]´s, ACV



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Mathias Stjernström

Hi Dan!

Its true, many of the replication options that exists for PostgreSQL  
have not seen any updates in a while.


If you only looking for redundancy and not a performance gain you  
should look at PostgreSQL PITR (http://www.postgresql.org/docs/8.1/static/backup-online.html 
)


For Master-Slave replication i think that Slony http://www.slony.info/  
is most up to date. But it does not support DDL changes.


You may wich to look at pgpool http://pgpool.projects.postgresql.org/  
it supports Synchronous replication (wich is good for data integrity,  
but can be bad for performance).


These are some of the open source options. I do not have any  
experience with the commercial onces.


Best regards,
Mathias

http://www.pastbedti.me/


On 21 aug 2008, at 19.53, Dan Harris wrote:

My company finally has the means to install a new database server  
for replication.  I have Googled and found a lot of sparse  
information out there regarding replication systems for PostgreSQL  
and a lot of it looks very out-of-date.  Can I please get some ideas  
from those of you that are currently using fail-over replication  
systems?  What advantage does your solution have?  What are the  
gotchas I need to worry about?


My desire would be to have a parallel server that could act as a hot  
standby system with automatic fail over in a multi-master role.  If  
our primary server goes down for whatever reason, the secondary  
would take over and handle the load seamlessly.  I think this is  
really the holy grail scenario and I understand how difficult it  
is to achieve.  Especially since we make frequent use of sequences  
in our databases.  If MM is too difficult, I'm willing to accept a  
hot-standby read-only system that will handle queries until we can  
fix whatever ails the master.
We are primary an OLAP environment but there is a constant stream of  
inserts into the databases.  There are 47 different databases hosted  
on the primary server and this number will continue to scale up to  
whatever the server seems to support.  The reason I mention this  
number is that it seems that those systems that make heavy use of  
schema changes require a lot of fiddling.  For a single database,  
this doesn't seem too problematic, but any manual work involved and  
administrative overhead will scale at the same rate as the database  
count grows and I certainly want to minimize as much fiddling as  
possible.


We are using 8.3 and the total combined size for the PG data  
directory is 226G.  Hopefully I didn't neglect to include more  
relevant information.


As always, thank you for your insight.

-Dan



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




PGP.sig
Description: This is a digitally signed message part


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Andrew Sullivan
On Thu, Aug 21, 2008 at 10:53:05PM +0200, Mathias Stjernström wrote:

 For Master-Slave replication i think that Slony http://www.slony.info/ is 
 most up to date. But it does not support DDL changes.

This isn't quite true.  It supports DDL; it just doesn't support it in
the normal way, and is broken by applications doing DDL as part of the
regular operation.  

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Alan Hodgson
On Thursday 21 August 2008, Dan Harris [EMAIL PROTECTED] wrote:
 Especially since we make frequent use of sequences in our databases.  If
 MM is too difficult, I'm willing to accept a hot-standby read-only
 system that will handle queries until we can fix whatever ails the
 master.

A heartbeat+DRBD solution might make more sense than database-level 
replication to achieve this. 

-- 
Alan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Mathias Stjernström
Yes thats true. It does support DDL changes but not in a automatic  
way. You have to execute all DDL changes with a separate script.


What's the status of http://www.commandprompt.com/products/mammothreplicator/ 
 ?


Best regards,
Mathias

http://www.pastbedti.me/


On 21 aug 2008, at 23.04, Andrew Sullivan wrote:


On Thu, Aug 21, 2008 at 10:53:05PM +0200, Mathias Stjernström wrote:

For Master-Slave replication i think that Slony http://www.slony.info/ 
 is

most up to date. But it does not support DDL changes.


This isn't quite true.  It supports DDL; it just doesn't support it in
the normal way, and is broken by applications doing DDL as part of the
regular operation.

A

--
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




PGP.sig
Description: This is a digitally signed message part


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Joshua Drake
On Thu, 21 Aug 2008 23:21:26 +0200
Mathias Stjernström [EMAIL PROTECTED] wrote:

 Yes thats true. It does support DDL changes but not in a automatic  
 way. You have to execute all DDL changes with a separate script.
 
 What's the status of
 http://www.commandprompt.com/products/mammothreplicator/ ?
 

It is about to go open source but it doesn't replicate DDL either.

Joshua D. Drake
-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread salman



Mathias Stjernström wrote:
Yes thats true. It does support DDL changes but not in a automatic way. 
You have to execute all DDL changes with a separate script.




That's true, but it's quite simple to do with the provided perl 
script(s) - slonik_execute_script. I've had to make use of it a few 
times and have had no problems.


-salman

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Alan Hodgson
On Thursday 21 August 2008, salman [EMAIL PROTECTED] wrote:
 Mathias Stjernström wrote:
  Yes thats true. It does support DDL changes but not in a automatic way.
  You have to execute all DDL changes with a separate script.

 That's true, but it's quite simple to do with the provided perl
 script(s) - slonik_execute_script. I've had to make use of it a few
 times and have had no problems.

I do it almost every day, and it is not all that simple if your 
configuration is complex. The original poster would require at least 47 
different Slony clusters, for starters. The complications from adding and 
dropping tables and sequences across 47 databases, and trying to keep Slony 
up to date throughout, staggers the imagination, honestly.

-- 
Alan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Alvaro Herrera
Joshua Drake wrote:
 On Thu, 21 Aug 2008 23:21:26 +0200
 Mathias Stjernström [EMAIL PROTECTED] wrote:
 
  Yes thats true. It does support DDL changes but not in a automatic  
  way. You have to execute all DDL changes with a separate script.
  
  What's the status of
  http://www.commandprompt.com/products/mammothreplicator/ ?
 
 It is about to go open source but it doesn't replicate DDL either.

It doesn't replicate multiple databases either.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread david

On Thu, 21 Aug 2008, Mathias Stjernstr?m wrote:


Hi Dan!

Its true, many of the replication options that exists for PostgreSQL have not 
seen any updates in a while.


If you only looking for redundancy and not a performance gain you should look 
at PostgreSQL PITR 
(http://www.postgresql.org/docs/8.1/static/backup-online.html)


For Master-Slave replication i think that Slony http://www.slony.info/ is 
most up to date. But it does not support DDL changes.


You may wich to look at pgpool http://pgpool.projects.postgresql.org/ it 
supports Synchronous replication (wich is good for data integrity, but can be 
bad for performance).


These are some of the open source options. I do not have any experience with 
the commercial onces.


a couple of months ago there was a lot of news about a WAL based 
replication engine. one that was closed source, but possibly getting 
opened shortly, and also the decision by the core devs to add one into the 
base distro.


what's been happening on this front?

from my understanding the first versions of this would not support queries 
of the replica, but would provide for the consistancy needed for reliable 
failover.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The state of PG replication in 2008/Q2?

2008-08-21 Thread Joshua Drake
On Thu, 21 Aug 2008 17:54:11 -0400
Alvaro Herrera [EMAIL PROTECTED] wrote:

 Joshua Drake wrote:
  On Thu, 21 Aug 2008 23:21:26 +0200
  Mathias Stjernström [EMAIL PROTECTED] wrote:
  
   Yes thats true. It does support DDL changes but not in a
   automatic way. You have to execute all DDL changes with a
   separate script.
   
   What's the status of
   http://www.commandprompt.com/products/mammothreplicator/ ?
  
  It is about to go open source but it doesn't replicate DDL either.
 
 It doesn't replicate multiple databases either.
 

True

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why do my hash joins turn to nested loops?

2008-08-21 Thread pgsql-performance
I only have a few days of experience with postgres and it is working 
great, but when I start building up test queries I ran into a problem I 
don't understand.

One query works fast, returning results in under a second.  If I insert 
one more join into the table however, it switches to nested-loops and 
takes minutes.  It does this randomly.. using integer compares in the 
query lines is fast for instance, but using LIKE operators causes it to 
use loops again.

I have about 200,000 items per table, so nested loops cause lots of pain.
The database has objects with lots (100 to 1000) 'specs' for each object 
in another table, so I have to pull them out to do queries and sorting on 
them.

Here are the two queries.  They are the same except the first has 
two 'find' joins and the other has three.

I assume that I am hitting some limit somewhere that is causing postgres 
to change it's idea on how to run the query.  Can I force it to use hash 
joins?

Thanks!

So this is fast... 

EXPLAIN ANALYZE SELECT * FROM logical 
LEFT JOIN model ON logical.uid = model.logical_uid 
LEFT JOIN company ON model.company_uid = company.uid 
LEFT JOIN type ON logical.type::INT = type.uid 
JOIN specs spec_find1 ON spec_find1 .spec_uid='8' AND spec_find1 .text LIKE '%' 
AND spec_find1 .logical_uid=model.logical_uid 
JOIN specs spec_find2 ON spec_find2.spec_uid='8' AND spec_find2.text LIKE '%' 
AND spec_find2.logical_uid=model.logical_uid 
LEFT JOIN specs specs_sort ON specs_sort.spec_uid='4' AND 
specs_sort.logical_uid=logical.uid 
ORDER BY specs_sort.num;

   
QUERY PLAN  
  
-
 Sort  (cost=5398.43..5398.44 rows=6 width=247) (actual time=331.546..333.303 
rows=3555 loops=1)
   Sort Key: specs_sort.num
   Sort Method:  quicksort  Memory: 981kB
   -  Hash Left Join  (cost=1087.68..5398.35 rows=6 width=247) (actual 
time=37.309..315.451 rows=3555 loops=1)
 Hash Cond: (model.company_uid = company.uid)
 -  Hash Left Join  (cost=1086.28..5396.86 rows=6 width=217) (actual 
time=37.235..308.787 rows=3555 loops=1)
   Hash Cond: (logical.uid = specs_sort.logical_uid)
   -  Hash Left Join  (cost=694.84..5004.96 rows=6 width=180) 
(actual time=22.433..284.832 rows=3555 loops=1)
 Hash Cond: ((logical.type)::integer = type.uid)
 -  Nested Loop  (cost=693.55..5003.62 rows=6 width=168) 
(actual time=22.361..273.502 rows=3555 loops=1)
   -  Hash Join  (cost=693.55..4953.84 rows=6 
width=110) (actual time=22.330..237.717 rows=3555 loops=1)
 Hash Cond: (model.logical_uid = 
spec_find1.logical_uid)
 -  Seq Scan on model  (cost=0.00..3337.82 
rows=184182 width=36) (actual time=0.017..99.289 rows=184182 loops=1)
 -  Hash  (cost=691.60..691.60 rows=156 
width=74) (actual time=21.795..21.795 rows=2196 loops=1)
   -  Hash Join  (cost=339.84..691.60 
rows=156 width=74) (actual time=8.558..19.060 rows=2196 loops=1)
 Hash Cond: (spec_find1.logical_uid 
= spec_find2.logical_uid)
 -  Seq Scan on specs spec_find1  
(cost=0.00..326.89 rows=1036 width=37) (actual time=0.023..6.765 rows=2196 
loops=1)
   Filter: (((text)::text ~~ 
'%'::text) AND (spec_uid = 8))
 -  Hash  (cost=326.89..326.89 
rows=1036 width=37) (actual time=8.508..8.508 rows=2196 loops=1)
   -  Seq Scan on specs 
spec_find2  (cost=0.00..326.89 rows=1036 width=37) (actual time=0.010..6.667 
rows=2196 loops=1)
 Filter: (((text)::text 
~~ '%'::text) AND (spec_uid = 8))
   -  Index Scan using logical_pkey on logical  
(cost=0.00..8.28 rows=1 width=58) (actual time=0.006..0.007 rows=1 loops=3555)
 Index Cond: (logical.uid = model.logical_uid)
 -  Hash  (cost=1.13..1.13 rows=13 width=12) (actual 
time=0.024..0.024 rows=13 loops=1)
   -  Seq Scan on type  (cost=0.00..1.13 rows=13 
width=12) (actual time=0.004..0.011 rows=13 loops=1)
   -  Hash  (cost=287.57..287.57 rows=8309 width=37) (actual 
time=14.773..14.773 rows=8172 loops=1)
 -  Seq Scan on specs specs_sort  (cost=0.00..287.57 
rows=8309 width=37) (actual time=0.012..8.206 rows=8172 loops=1)
   Filter: (spec_uid = 4)
 -  Hash  (cost=1.18..1.18 rows=18 

Re: [PERFORM] Why do my hash joins turn to nested loops?

2008-08-21 Thread Tom Lane
[EMAIL PROTECTED] writes:
 One query works fast, returning results in under a second.  If I insert 
 one more join into the table however, it switches to nested-loops and 
 takes minutes.

I think you need to raise from_collapse_limit and/or
join_collapse_limit.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-21 Thread Kranti K K Parisa™
Hi Mark,

Thanks again for the info.
I shall create diff sets of indexes and see the query execution time.
And one of such tables might get around 700,000 records over a period of 4-5
months. So what kind of other measures I need to focus on.
I thought of the following
1) Indexes
2) Better Hardware (RAM  HDD)

And how can i estimate the size of the row?  is it like based on the data
types of the columns i have in the table?
Do you have any info to guide me on this?

On Thu, Aug 21, 2008 at 7:32 PM, Mark Lewis [EMAIL PROTECTED] wrote:

 On Thu, 2008-08-21 at 12:33 +0530, Kranti K K Parisa™ wrote:

  On Wed, Aug 20, 2008 at 8:54 PM, Matthew Wakeling
  [EMAIL PROTECTED] wrote:
  On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote:
  creating multiple indexes on same column will effect
  performance?
   for example:
 
  index1 : column1, column2, column3
  index2: column1
  index3: column2,
  index4: column3
  index5: column1,column2
 
 
  The sole purpose of indexes is to affect performance.
 
  However, if you have index1, there is no point in having
  index2 or index5.
 
  Matthew
 
  Thanks Matthew,
 
  does that mean i can just have index1, index3, index4?
 

 (trying to get the thread back into newest-comments-last order)

 Well, yes you can get away with just index1, index3 and index4, and it
 may well be the optimal solution for you, but it's not entirely
 clear-cut.

 It's true that PG can use index1 to satisfy queries of the form SELECT
 x FROM y WHERE column1=somevalue or column1=a AND column2=b.  It will
 not be as fast as an index lookup from a single index, but depending on
 the size of the tables/indexes and the selectivity of leading column(s)
 in the index, the difference in speed may be trivial.

 On the other hand, if you have individual indexes on column1, column2
 and column3 but no multi-column index, PG can combine the individual
 indexes in memory with a bitmap.  This is not as fast as a normal lookup
 in the multi-column index would be, but can still be a big win over not
 having an index at all.

 To make an educated decision you might want to read over some of the
 online documentation about indexes, in particular these two sections:

 http://www.postgresql.org/docs/8.3/interactive/indexes-multicolumn.html

 and

 http://www.postgresql.org/docs/8.3/interactive/indexes-bitmap-scans.html

 -- Mark




-- 

Best Regards
Kranti Kiran Kumar Parisa
M: +91 - 9391 - 438 - 738
+91 - 9849 - 625 - 625