Re: [PERFORM] Optimize SQL

2006-09-18 Thread Mikael Carneholm
That query is generated by hibernate, right?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pallav
Kalva
Sent: den 15 september 2006 17:10
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Optimize SQL

Hi,

   Is there anyway we can optimize this sql ? it is doing full table
scan on listing and address table . Postgres version 8.0.2

Thanks!
Pallav.


explain analyze
select listing0_.listingid as col_0_0_, getmaxdate(listing0_.lastupdate,
max(addressval2_.createdate)) as col_1_0_ from listing.listing listing0_
left outer join listing.address listingadd1_ on
listing0_.fkbestaddressid=listingadd1_.addressid
left outer join listing.addressvaluation addressval2_ on
listingadd1_.addressid=addressval2_.fkaddressid
where listing0_.lastupdate>'2006-09-15 08:31:26.927'
and listing0_.lastupdate<=current_timestamp
or addressval2_.createdate>'2006-09-15 08:31:26.927' and
addressval2_.createdate<=current_timestamp
group by listing0_.listingid , listing0_.lastupdate order by
getmaxdate(listing0_.lastupdate, max(addressval2_.createdate)) asc limit
10;


Limit  (cost=2399501.49..2399501.51 rows=10 width=20) (actual
time=414298.076..414298.174 rows=10 loops=1)
   ->  Sort  (cost=2399501.49..2410707.32 rows=4482333 width=20) (actual
time=414298.068..414298.098 rows=10 loops=1)
 Sort Key: getmaxdate(listing0_.lastupdate,
max(addressval2_.createdate))
 ->  GroupAggregate  (cost=1784490.47..1851725.47 rows=4482333
width=20) (actual time=414212.926..414284.927 rows=2559 loops=1)
   ->  Sort  (cost=1784490.47..1795696.31 rows=4482333
width=20) (actual time=414174.678..414183.536 rows=2563 loops=1)
 Sort Key: listing0_.listingid, listing0_.lastupdate
 ->  Merge Right Join  (cost=1113947.32..1236714.45
rows=4482333 width=20) (actual time=273257.256..414163.920 rows=2563
loops=1)
   Merge Cond: ("outer".fkaddressid =
"inner".addressid)
   Filter: ((("inner".lastupdate > '2006-09-15
08:31:26.927'::timestamp without time zone) AND ("inner".lastupdate <=
('now'::text)::timestamp(6) with time zone)) OR (("outer".createdate >
'2006-09-15 08:31:26.927'::timestamp without time zone) AND
("outer".createdate <= ('now'::text)::timestamp(6) with time zone)))
   ->  Index Scan using
idx_addressvaluation_fkaddressid on addressvaluation addressval2_
(cost=0.00..79769.55 rows=947056 width=12) (actual
time=0.120..108240.633 rows=960834 loops=1)
   ->  Sort  (cost=1113947.32..1125153.15
rows=4482333 width=16) (actual time=256884.646..275823.217 rows=5669719
loops=1)
 Sort Key: listingadd1_.addressid
 ->  Hash Left Join
(cost=228115.38..570557.39 rows=4482333 width=16) (actual
time=93874.356..205054.946 rows=4490963 loops=1)
   Hash Cond:
("outer".fkbestaddressid = "inner".addressid)
   ->  Seq Scan on listing listing0_
(cost=0.00..112111.33 rows=4482333 width=16) (actual
time=0.026..25398.685 rows=4490963 loops=1)
   ->  Hash
(cost=18.70..18.70 rows=6990270 width=4) (actual
time=93873.659..93873.659 rows=0 loops=1)
 ->  Seq Scan on address
listingadd1_  (cost=0.00..18.70 rows=6990270 width=4) (actual
time=13.256..69441.056 rows=6990606 loops=1)


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor performance on seq scan

2006-09-18 Thread Markus Schaber
Hi, Piotr,

Piotr Kołaczkowski wrote:

> Why match rows from the heap if ALL required data are in the index itself?
> Why look at the heap at all?

Because the index does not contain any transaction informations, so it
has to look to the heap to find out which of the rows are current.

This is one of the more debated points in the PostgreSQL way of MVCC
implementation.


Markus


-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor performance on seq scan

2006-09-18 Thread Guido Neitzer

Because there is no MVCC information in the index.

cug

2006/9/12, Piotr Kołaczkowski <[EMAIL PROTECTED]>:

On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> Laszlo Nagy wrote:
> > I made another test. I create a file with the identifiers and names of
> > the products:
> >
> > psql#\o products.txt
> > psql#select id,name from product;
> >
> > Then I can search using grep:
> >
> > grep "Mug" products.txt | cut -f1 -d\|
> >
> > There is a huge difference. This command runs within 0.5 seconds. That
> > is, at least 76 times faster than the seq scan. It is the same if I
> > vacuum, backup and restore the database. I thought that the table is
> > stored in one file, and the seq scan will be actually faster than
> > grepping the file. Can you please tell me what am I doing wrong? I'm
> > not sure if I can increase the performance of a seq scan by adjusting
> > the values in postgresql.conf. I do not like the idea of exporting the
> > product table periodically into a txt file, and search with grep. :-)
>
> Is there any other columns besides id and name in the table? How big is
> products.txt compared to the heap file?
>
> > Another question: I have a btree index on product(name). It contains
> > all product names and the identifiers of the products. Wouldn't it be
> > easier to seq scan the index instead of seq scan the table? The index
> > is only 66MB, the table is 1123MB.
>
> Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> depends on how many matches there is. If you scan the index and then
> fetch the matching rows from the heap, you're doing random I/O to the
> heap. That becomes slower than scanning the heap sequentially if you're
> going to get more than a few hits.

Why match rows from the heap if ALL required data are in the index itself?
Why look at the heap at all?

This is the same performance problem in PostgreSQL I noticed when doing
some "SELECT count(*)" queries. Look at this:

explain analyze select count(*) from transakcja where data > '2005-09-09' and
miesiac >= (9 + 2005 * 12) and kwota < 50;

QUERY PLAN
--
 Aggregate  (cost=601557.86..601557.87 rows=1 width=0) (actual
time=26733.479..26733.484 rows=1 loops=1)
   ->  Bitmap Heap Scan on transakcja  (cost=154878.00..596928.23 rows=1851852
width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
 Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
 Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
 ->  Bitmap Index Scan on idx_transakcja_miesiac_kwota
(cost=0.00..154878.00 rows=556 width=0) (actual time=9919.967..9919.967
rows=1690402 loops=1)
   Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
precision))
 Total runtime: 26733.980 ms
(7 rows)

The actual time retrieving tuples from the index is less than 10 seconds, but
the system executes needless heap scan that takes up additional 16 seconds.

Best regards,
Peter




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High CPU Load

2006-09-18 Thread Jérôme BENOIS
Hi Guillaume,

Now i disable Hyper Threading in BIOS, and "context switch storms"
disappeared. (when i look with command sar -t)

I decreased work_mem parameter to 32768. My CPU load is better. But it
is still too high, in example : 

top - 16:27:05 up  9:13,  3 users,  load average: 45.37, 43.43, 41.43
Tasks: 390 total,  26 running, 363 sleeping,   0 stopped,   1 zombie
Cpu(s): 89.5% us,  9.8% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.2% hi,
0.4% si
Mem:   2076404k total,  2039552k used,36852k free,40412k buffers
Swap:  1954312k total,  468k used,  1953844k free,  1232000k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
30907 postgres  16   0  537m  51m 532m R 20.4  2.5   1:44.73 postmaster
25631 postgres  16   0  538m 165m 532m R 17.4  8.2   8:43.76 postmaster
29357 postgres  16   0  537m 311m 532m R 17.4 15.3   0:26.47 postmaster
32294 postgres  16   0  535m  86m 532m R 14.9  4.3   0:04.97 postmaster
31406 postgres  16   0  536m 180m 532m R 14.4  8.9   0:22.04 postmaster
31991 postgres  16   0  535m  73m 532m R 14.4  3.6   0:08.21 postmaster
30782 postgres  16   0  536m 205m 532m R 14.0 10.1   0:19.63 postmaster

Tomorrow morning i plan to add 2Go RAM in order to test difference with
my actual config.

Have you another ideas ?

Best Regards,
-- 
Jérôme,

python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '[EMAIL PROTECTED]'.split('@')])"

Le vendredi 15 septembre 2006 à 00:24 +0200, Guillaume Smet a écrit :
> On 9/14/06, Jérôme BENOIS <[EMAIL PROTECTED]> wrote:
> > Yes i have a lot of users ;-)
> 
> So your work_mem is probably far too high (that's what I told you in
> my first message) and you probably swap when you have too many users.
> Remember that work_mem can be used several times per query (and it's
> especially the case when you have a lot of sorts).
> When your load is high, check your swap activity and your io/wait. top
> gives you these information. If you swap, lower your work_mem to 32 MB
> for example then see if it's enough for your queries to run fast (you
> can check if there are files created in the $PGDATA/base/ database oid>/pg_tmp) and if it doesn't swap. Retry with a
> lower/higher value to find the one that fits best to your queries and
> load.
> 
> > I agree but by moment DB Server is so slow.
> 
> Yep, that's the information that was missing :).
> 
> > what's means "HT" please ?
> 
> Hyper threading. It's usually not recommended to enable it on
> PostgreSQL servers. On most servers, you can disable it directly in
> the BIOS.
> 
> --
> Guillaume
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 



signature.asc
Description: Ceci est une partie de message	numériquement signée


[PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Bucky Jordan
>Yes.  What's pretty large?  We've had to redefine large recently, now
we're
>talking about systems with between 100TB and 1,000TB.
>
>- Luke

Well, I said large, not gargantuan :) - Largest would probably be around
a few TB, but the problem I'm having to deal with at the moment is large
numbers (potentially > 1 billion) of small records (hopefully I can get
it down to a few int4's and a int2 or so) in a single table. Currently
we're testing for and targeting in the 500M records range, but the
design needs to scale to 2-3 times that at least. 
 
I read one of your presentations on very large databases in PG, and saw
mention of some tables over a billion rows, so that was encouraging. The
new table partitioning in 8.x will be very useful. What's the largest DB
you've seen to date on PG (in terms of total disk storage, and records
in largest table(s) )? 

My question is at what point do I have to get fancy with those big
tables? From your presentation, it looks like PG can handle 1.2 billion
records or so as long as you write intelligent queries. (And normal PG
should be able to handle that, correct?)

Also, does anyone know if/when any of the MPP stuff will be ported to
Postgres, or is the plan to keep that separate?

Thanks,

Bucky

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High CPU Load

2006-09-18 Thread Jérôme BENOIS
Hi Markus,

Le vendredi 15 septembre 2006 à 11:43 +0200, Markus Schaber a écrit :
> Hi, Jérôme,
> 
> Jérôme BENOIS wrote:
> 
> > max_connections = 512
> 
> Do you really have that much concurrent connections? Then you should
> think about getting a larger machine, probably.
> 
> You will definitely want to play with commit_delay and commit_siblings
> settings in that case, especially if you have write access.
> 
> > work_mem = 65536
> > effective_cache_size = 131072
> 
> hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of
> Memory you assume here, not counting OS usage, and the fact that certain
> queries can use up a multiple of work_mem.

Now i Have 335 concurrent connections, i decreased work_mem parameter to
32768 and disabled Hyper Threading in BIOS. But my CPU load is still
very important.

Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why
my database server worked good with previous version of postgres and
same queries ...

> Even on amachine that big, I'd be inclined to dedicate more memory to
> caching, and less to the backends, unless specific needs dictate it. You
> could try to use sqlrelay or pgpool to cut down the number of backends
> you need.
I used already database pool on my application and when i decrease
number of connection my application is more slow ;-(
> 
> > My Server is Dual Xeon 3.06GHz
> 
> For xeons, there were rumours about "context switch storms" which kill
> performance.
I disabled Hyper Threading.
> > with 2 Go RAM and good SCSI disks.
> 
> For 2 Gigs of ram, you should cut down the number of concurrent backends.
> 
> Does your machine go into swap?
No, 0 swap found and i cannot found pgsql_tmp files in $PG_DATA/base/...
> 
> Markus
-- 
Jérôme,

python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in '[EMAIL PROTECTED]'.split('@')])"


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [PERFORM] High CPU Load

2006-09-18 Thread Guillaume Smet

On 9/18/06, Jérôme BENOIS <[EMAIL PROTECTED]> wrote:

Tomorrow morning i plan to add 2Go RAM in order to test difference with
my actual config.


I don't think more RAM will change anything if you don't swap at all.
You can try to set shared_buffers lower (try 32768 and 16384) but I
don't think it will change anything in 8.1.

The only thing left IMHO is that 8.1 is choosing a bad plan which
consumes a lot of CPU for at least a query.

When you analyze your logs, did you see a particularly slow query? Can
you compare query log analysis from your old server and your new one?

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Merlin Moncure

On 9/18/06, Bucky Jordan <[EMAIL PROTECTED]> wrote:

My question is at what point do I have to get fancy with those big
tables? From your presentation, it looks like PG can handle 1.2 billion
records or so as long as you write intelligent queries. (And normal PG
should be able to handle that, correct?)


I would rephrase that: large databses are less forgiving of
unintelligent queries, particularly of the form of your average stupid
database abstracting middleware :-).  seek times on a 1gb database are
going to be zero all the time, not so on a 1tb+ database.

good normalization skills are really important for large databases,
along with materialization strategies for 'denormalized sets'.

regarding the number of rows, there is no limit to how much pg can
handle per se, just some practical limitations, especially vacuum and
reindex times.  these are important because they are required to keep
a handle on mvcc bloat and its very nice to be able to vaccum bits of
your database at a time.

just another fyi, if you have a really big database, you can forget
about doing pg_dump for backups (unless you really don't care about
being x day or days behind)...you simply have to due some type of
replication/failover strategy.  i would start with pitr.

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alan Hodgson
On Monday 18 September 2006 13:56, "Merlin Moncure" <[EMAIL PROTECTED]> 
wrote:
> just another fyi, if you have a really big database, you can forget
> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of
> replication/failover strategy.  i would start with pitr.

And, of course, the biggest problem of all; upgrades.

-- 
Eat right. Exercise regularly. Die anyway.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Vacuums on large busy databases

2006-09-18 Thread Jim C. Nasby
On Thu, Sep 14, 2006 at 11:23:01AM -0400, Francisco Reyes wrote:
> My setup:
> Freebsd 6.1
> Postgresql 8.1.4
> Memory: 8GB
> SATA Disks 
> 
> Raid 1 10 spindles (2 as hot spares)
> 500GB disks (16MB buffer), 7200 rpm
> Raid 10
> 
> Raid 2 4 spindles
> 150GB 10K rpm disks
> Raid 10
> 
> shared_buffers = 1
> temp_buffers = 1500
> work_mem = 32768# 32MB
> maintenance_work_mem = 524288   # 512MB
> 
> checkpoint_segments = 64
> Just increased to 64 today.. after reading this may help. Was 5 before.
> 
> pg_xlog on second raid (which sees very little activity)

BTW, on some good raid controllers (with battery backup and
write-caching), putting pg_xlog on a seperate partition doesn't really
help, so you might want to try combining everything.

Even if you stay with 2 partitions, I'd cut pg_xlog back to just a
simple mirror.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Bucky Jordan
> good normalization skills are really important for large databases,
> along with materialization strategies for 'denormalized sets'.

Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I've
only found a smattering of references on google. My guess is, you roll
your own for optimal performance... 

> regarding the number of rows, there is no limit to how much pg can
> handle per se, just some practical limitations, especially vacuum and
> reindex times.  these are important because they are required to keep
> a handle on mvcc bloat and its very nice to be able to vaccum bits of
> your database at a time.

I was hoping for some actual numbers on "practical". Hardware isn't too
much of an issue (within reason- we're not talking an amazon or google
here... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'd
just rather not have to re-design the database. Say the requirement is
to keep 12 months of data accessible, each "scan" produces 100M records,
and I run one per month. What happens if the customer wants to run it
once a week? I was more trying to figure out at what point (ballpark)
I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, but
not so easy to redesign/add history tables...

> 
> just another fyi, if you have a really big database, you can forget
> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of
> replication/failover strategy.  i would start with pitr.
> 
> merlin
I was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.

Thanks for the pointers though... 

- Bucky

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner
Do the basic math:If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much).   If you have to do a full table scan, then it will take roughly 400 seconds with a single 10k RPM SCSI drive with an average read speed of 50MB/sec.  If you are going to read indexes, figure out how big your index is, and how many blocks will be returned, and figure out how many blocks this will require transferring from the main table, make an estimate of the seeks, add in the transfer total, and you have a time to get your data.  A big array with a good controller can pass 1000MB/sec transfer on the right bus if you buy the write technologies.  But be warned, if you buy the wrong ones, your big array can end up being slower than a single drive for sequential transfer.  At 1000MB/sec your scan would take 20 seconds.
Be warned, the tech specs page:http://www.sun.com/servers/x64/x4500/specs.xml#anchor3doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and write operations.  If you are doing mostly Read, then RAID 5 is passable, but it's redundancy with large numbers of drives is not so great.
Alex.On 9/18/06, Bucky Jordan <[EMAIL PROTECTED]> wrote:
> good normalization skills are really important for large databases,> along with materialization strategies for 'denormalized sets'.Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I'veonly found a smattering of references on google. My guess is, you rollyour own for optimal performance...> regarding the number of rows, there is no limit to how much pg can
> handle per se, just some practical limitations, especially vacuum and> reindex times.  these are important because they are required to keep> a handle on mvcc bloat and its very nice to be able to vaccum bits of
> your database at a time.I was hoping for some actual numbers on "practical". Hardware isn't toomuch of an issue (within reason- we're not talking an amazon or googlehere... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'djust rather not have to re-design the database. Say the requirement isto keep 12 months of data accessible, each "scan" produces 100M records,
and I run one per month. What happens if the customer wants to run itonce a week? I was more trying to figure out at what point (ballpark)I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, butnot so easy to redesign/add history tables...>> just another fyi, if you have a really big database, you can forget> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of> replication/failover strategy.  i would start with pitr.>> merlinI was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.Thanks for the pointers though...- Bucky---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your   message can get through to the mailing list cleanly


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Michael Stone

On Mon, Sep 18, 2006 at 07:14:56PM -0400, Alex Turner wrote:

If you have a table with 100million records, each of which is 200bytes long,
that gives you roughtly 20 gig of data (assuming it was all written neatly
and hasn't been updated much).   


If you're in that range it doesn't even count as big or challenging--you 
can keep it memory resident for not all that much money. 


Mike Stone

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] LIKE query problem

2006-09-18 Thread Marc McIntyre
I'm having a problem with a simple query, that finds children of a node, 
using a materialized path to the node. The query:


select n1.id
from nodes n1, nodes n2
where n1.path like n2.path || '%'
and n2.id = 14;

   QUERY 
PLAN
---
Nested Loop  (cost=0.00..120256.56 rows=17517 width=4) (actual 
time=0.901..953.485 rows=7 loops=1)
  Join Filter: (("inner".path)::text ~~ (("outer".path)::text || 
'%'::text))
  ->  Index Scan using nodes_id on nodes n2  (cost=0.00..35.08 rows=11 
width=34) (actual time=0.050..0.059 rows=1 loops=1)

Index Cond: (id = 14)
  ->  Seq Scan on nodes n1  (cost=0.00..6151.89 rows=318489 width=38) 
(actual time=0.010..479.479 rows=318489 loops=1)

Total runtime: 953.551 ms
(6 rows)

I've tried re-writing the query, which results in a different plan:

select id
from nodes
where path like (
   select path
   from nodes
   where id = 14
   limit 1
) || '%';

  QUERY 
PLAN   

Seq Scan on nodes  (cost=3.19..7747.52 rows=1592 width=4) (actual 
time=0.230..226.311 rows=7 loops=1)

  Filter: ((path)::text ~~ (($0)::text || '%'::text))
  InitPlan
->  Limit  (cost=0.00..3.19 rows=1 width=34) (actual 
time=0.018..0.019 rows=1 loops=1)
  ->  Index Scan using nodes_id on nodes  (cost=0.00..35.08 
rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1)

Index Cond: (id = 14)
Total runtime: 226.381 ms
(7 rows)

While the plan looks a little better, the estimated rows are woefully 
inaccurate for some reason, resulting in a seq scan on nodes.
If I perform the nested select in the second query separately, then use 
the result in the outer select, it's extremely fast:


test=# select path from nodes where id = 14;
 path  


/3/13/
(1 row)

Time: 0.555 ms

test=# select id from nodes where path like '/3/13/%';
id
-
 14
 169012
 15
 16
 17
 169219
 169220
(7 rows)

Time: 1.062 ms

I've vacuum full analyzed. PG version is 8.1.4

The nodes table is as follows:

test=# \d nodes
  Table "public.nodes"
Column |  Type   | Modifiers
+-+---
id | integer | not null
path   | character varying(2000) | not null
depth  | integer | not null
Indexes:
   "nodes_pkey" PRIMARY KEY, btree (id, path)
   "nodes_id" btree (id)
   "nodes_path" btree (path)

test# select count(*) from nodes;
count  


318489

Is there a way to perform this efficiently in one query ?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Vacuums on large busy databases

2006-09-18 Thread Francisco Reyes

Jim C. Nasby writes:


BTW, on some good raid controllers (with battery backup and
write-caching), putting pg_xlog on a seperate partition doesn't really
help, so you might want to try combining everything.


Planning to put a busy database on second raid or perhaps some index files.
So far the second raid is highly under utilized.


Even if you stay with 2 partitions, I'd cut pg_xlog back to just a
simple mirror.


I am considering to put the pg_xlog back to the main raid. Primarily because 
we have two hot spares.. on top of RAID 10.. so it is safer. 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Luke Lonergan
Bucky,

On 9/18/06 7:37 AM, "Bucky Jordan" <[EMAIL PROTECTED]> wrote:

> My question is at what point do I have to get fancy with those big
> tables? From your presentation, it looks like PG can handle 1.2 billion
> records or so as long as you write intelligent queries. (And normal PG
> should be able to handle that, correct?)

PG has limitations that will confront you at sizes beyond about a couple
hundred GB of table size, as will Oracle and others.

You should be careful to implement very good disk hardware and leverage
Postgres 8.1 partitioning and indexes intelligently as you go beyond 100GB
per instance.  Also be sure to set the random_page_cost parameter in
postgresql.conf to 100 or even higher when you use indexes, as the actual
seek rate for random access ranges between 50 and 300 for modern disk
hardware.  If this parameter is left at the default of 4, indexes will often
be used inappropriately.
   
> Also, does anyone know if/when any of the MPP stuff will be ported to
> Postgres, or is the plan to keep that separate?

The plan is to keep that separate for now, though we're contributing
technology like partitioning, faster sorting, bitmap index, adaptive nested
loop, and hybrid hash aggregation to make big databases work better in
Postgres. 

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Alex,

On 9/18/06 4:14 PM, "Alex Turner" <[EMAIL PROTECTED]> wrote:

> Be warned, the tech specs page:
> http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
> doesn't mention RAID 10 as a possible, and this is probably what most would
> recommend for fast data access if you are doing both read and write
> operations.  If you are doing mostly Read, then RAID 5 is passable, but it's
> redundancy with large numbers of drives is not so great.

RAID10 works great on the X4500 ­ we get 1.6GB/s + per X4500 using RAID10 in
ZFS.  We worked with the Sun Solaris kernel team to make that happen and the
patches are part of Solaris 10 Update 3 due out in November.

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] LIKE query problem

2006-09-18 Thread Tom Lane
Marc McIntyre <[EMAIL PROTECTED]> writes:
> ... Is there a way to perform this efficiently in one query ?

No, because you're hoping for an indexscan optimization of a LIKE
query, and that can only happen if the pattern is a plan-time constant.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] LIKE query problem

2006-09-18 Thread Marc McIntyre

Thanks Tom,

Is that documented somewhere? I can't seem to see any mention of it in 
the docs.


Tom Lane wrote:

Marc McIntyre <[EMAIL PROTECTED]> writes:
  

... Is there a way to perform this efficiently in one query ?



No, because you're hoping for an indexscan optimization of a LIKE
query, and that can only happen if the pattern is a plan-time constant.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

  



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner
Sweet - thats good - RAID 10 support seems like an odd thing to leave out.AlexOn 9/18/06, Luke Lonergan <
[EMAIL PROTECTED]> wrote:Alex,On 9/18/06 4:14 PM, "Alex Turner" <
[EMAIL PROTECTED]> wrote:> Be warned, the tech specs page:> http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
> doesn't mention RAID 10 as a possible, and this is probably what most would> recommend for fast data access if you are doing both read and write> operations.  If you are doing mostly Read, then RAID 5 is passable, but it's
> redundancy with large numbers of drives is not so great.RAID10 works great on the X4500 ­ we get 1.6GB/s + per X4500 using RAID10 inZFS.  We worked with the Sun Solaris kernel team to make that happen and the
patches are part of Solaris 10 Update 3 due out in November.- Luke


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Yep, Solaris ZFS kicks butt.  It does RAID10/5/6, etc and implements most of
the high end features available on high end SANs...

- Luke


On 9/18/06 8:40 PM, "Alex Turner" <[EMAIL PROTECTED]> wrote:

> Sweet - thats good - RAID 10 support seems like an odd thing to leave out.
> 
> Alex
> 
> On 9/18/06, Luke Lonergan < [EMAIL PROTECTED]
>  > wrote:
>> Alex,
>> 
>> On 9/18/06 4:14 PM, "Alex Turner" < [EMAIL PROTECTED]> wrote:
>> 
>>> Be warned, the tech specs page:
>>> http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
>>> 
>>> doesn't mention RAID 10 as a possible, and this is probably what most would
>>> recommend for fast data access if you are doing both read and write
>>> operations.  If you are doing mostly Read, then RAID 5 is passable, but it's
>>> redundancy with large numbers of drives is not so great.
>> 
>> RAID10 works great on the X4500 ­ we get 1.6GB/s + per X4500 using RAID10 in
>> ZFS.  We worked with the Sun Solaris kernel team to make that happen and the
>> patches are part of Solaris 10 Update 3 due out in November.
>> 
>> - Luke
>> 
>> 
> 
> 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread mark
On Mon, Sep 18, 2006 at 06:10:13PM -0700, Luke Lonergan wrote:
> Also be sure to set the random_page_cost parameter in
> postgresql.conf to 100 or even higher when you use indexes, as the actual
> seek rate for random access ranges between 50 and 300 for modern disk
> hardware.  If this parameter is left at the default of 4, indexes will often
> be used inappropriately.

Does a tool exist yet to time this for a particular configuration?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Mark,

On 9/18/06 8:45 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> Does a tool exist yet to time this for a particular configuration?

We're considering building this into ANALYZE on a per-table basis.  The
basic approach times sequential access in page rate, then random seeks as
page rate and takes the ratio of same.

Since PG's heap scan is single threaded, the seek rate is equivalent to a
single disk (even though RAID arrays may have many spindles), the typical
random seek rates are around 100-200 seeks per second from within the
backend.  That means that as sequential scan performance increases, such as
happens when using large RAID arrays, the random_page_cost will range from
50 to 300 linearly as the size of the RAID array increases.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings