[PERFORM] Nested Loop

2007-03-26 Thread Gauri Kanekar

Hi List,

how to speedup nested loop queries and by which parameters.
--
Regards
Gauri


Re: [PERFORM] Sunfire X4500 recommendations

2007-03-26 Thread Dimitri
On Friday 23 March 2007 14:32, Matt Smiley wrote:
> Thanks Dimitri!  That was very educational material!  I'm going to think
> out loud here, so please correct me if you see any errors.

Your mail is so long - I was unable to answer all questions same day :))

>
> The section on tuning for OLTP transactions was interesting, although my
> OLAP workload will be predominantly bulk I/O over large datasets of
> mostly-sequential blocks.

I supposed mostly READ operations, right?

>
> The NFS+ZFS section talked about the zil_disable control for making zfs
> ignore commits/fsyncs.  Given that Postgres' executor does single-threaded
> synchronous I/O like the tar example, it seems like it might benefit
> significantly from setting zil_disable=1, at least in the case of
> frequently flushed/committed writes.  However, zil_disable=1 sounds unsafe
> for the datafiles' filesystem, and would probably only be acceptible for
> the xlogs if they're stored on a separate filesystem and you're willing to
> loose recently committed transactions.  This sounds pretty similar to just
> setting fsync=off in postgresql.conf, which is easier to change later, so
> I'll skip the zil_disable control.

yes, you don't need it for PostgreSQL, it may be useful for other database 
vendors, but not here.

>
> The RAID-Z section was a little surprising.  It made RAID-Z sound just like
> RAID 50, in that you can customize the trade-off between iops versus usable
> diskspace and fault-tolerance by adjusting the number/size of
> parity-protected disk groups.  The only difference I noticed was that
> RAID-Z will apparently set the stripe size across vdevs (RAID-5s) to be as
> close as possible to the filesystem's block size, to maximize the number of
> disks involved in concurrently fetching each block.  Does that sound about
> right?

Well, look at RAID-Z just as wide RAID solution. More you have disks in your 
system - more high is probability you may loose 2 disks on the same time, and 
in this case wide RAID-10 will simply make loose you whole the data set (and 
again if you loose both disks in mirror pair). So, RAID-Z brings you more 
security as you may use wider parity, but the price for it is I/O 
performance...

>
> So now I'm wondering what RAID-Z offers that RAID-50 doesn't.  I came up
> with 2 things: an alleged affinity for full-stripe writes and (under
> RAID-Z2) the added fault-tolerance of RAID-6's 2nd parity bit (allowing 2
> disks to fail per zpool).  It wasn't mentioned in this blog, but I've heard
> that under certain circumstances, RAID-Z will magically decide to mirror a
> block instead of calculating parity on it.  I'm not sure how this would
> happen, and I don't know the circumstances that would trigger this
> behavior, but I think the goal (if it really happens) is to avoid the
> performance penalty of having to read the rest of the stripe required to
> calculate parity.  As far as I know, this is only an issue affecting small
> writes (e.g. single-row updates in an OLTP workload), but not large writes
> (compared to the RAID's stripe size).  Anyway, when I saw the filesystem's
> intent log mentioned, I thought maybe the small writes are converted to
> full-stripe writes by deferring their commit until a full stripe's worth of
> data had been accumulated.  Does that sound plausible?

The problem here that within the same workload you're able to do less I/O 
operations with RAID-Z then in RAID-10. So, bigger your I/O block size or 
smaller - you'll still obtain lower throughput, no? :)

>
> Are there any other noteworthy perks to RAID-Z, rather than RAID-50?  If
> not, I'm inclined to go with your suggestion, Dimitri, and use zfs like
> RAID-10 to stripe a zpool over a bunch of RAID-1 vdevs.  Even though many
> of our queries do mostly sequential I/O, getting higher seeks/second is
> more important to us than the sacrificed diskspace.

There is still one point to check: if you do mostly READ on your database 
probably RAID-Z will be not *too* bad and will give you more used space. 
However, if you need to update your data or load frequently - RAID-10 will be 
better...

>
> For the record, those blogs also included a link to a very helpful ZFS Best
> Practices Guide:
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

oh yes, it's constantly growing wiki, good start for any Solaris questions as 
well performance points :)

>
> To sum up, so far the short list of tuning suggestions for ZFS includes:
>  - Use a separate zpool and filesystem for xlogs if your apps write often.
>  - Consider setting zil_disable=1 on the xlogs' dedicated filesystem.  ZIL
> is the intent log, and it sounds like disabling it may be like disabling
> journaling.  Previous message threads in the Postgres archives debate
> whether this is safe for the xlogs, but it didn't seem like a conclusive
> answer was reached. - Make filesystem block size (zfs record size) match
> the Postgres block size. - Manually adjust vdev_cache.  

Re: [PERFORM] Nested Loop

2007-03-26 Thread Michael Fuhr
On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote:
> how to speedup nested loop queries and by which parameters.

Please post a query you're trying to tune and the EXPLAIN ANALYZE
output, as well as any changes you've already made in postgresql.conf
or configuration variables you've set in a particular session.
Without more information we can't give much advice other than to
make sure you're vacuuming and analyzing the tables often enough
to keep them from becoming bloated with dead rows and to keep the
statistics current, and to review a configuration checklist such
as this one:

http://www.powerpostgresql.com/PerfList

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Nested Loop

2007-03-26 Thread Gauri Kanekar

Sorry,

this are the Confg Setting
max_connections = 100   # (change requires restart)
shared_buffers = 300MB
work_mem = 256MB
max_fsm_pages = 40
max_fsm_relations = 500
wal_buffers = 512
checkpoint_segments = 20
checkpoint_timeout = 900
enable_bitmapscan = on
enable_seqscan = off
enable_tidscan = on
random_page_cost = 2
cpu_index_tuple_cost = 0.001
effective_cache_size = 800MB
join_collapse_limit = 1 # JOINs
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error
message
lc_monetary = 'en_US.UTF-8' # locale for monetary
formatting
lc_numeric = 'en_US.UTF-8'  # locale for number
formatting
lc_time = 'en_US.UTF-8' # locale for time formatting

all other are the default values.



QUERY PLAN
---
HashAggregate  (cost=1116330.73..1116432.34 rows=6774 width=128) (actual
time=438565.297..440455.386 rows=646881 loops=1)
  ->  Hash Join  (cost=10802.93..1116093.64 rows=6774 width=128) (actual
time=1904.797..377717.036 rows=10438694 loops=1)
Hash Cond: (rm.ck = rc.k)
->  Hash Join  (cost=10651.73..1115840.83 rows=6774 width=105)
(actual time=1890.765..347169.113 rows=10438694 loops=1)
  Hash Cond: (rm.chk = rc.ky)
  ->  Hash Join  (cost=9835.35..1114905.90 rows=6774 width=83)
(actual time=1873.463..317623.437 rows=10438694 loops=1)
Hash Cond: (rm.ckey = rc.k)
->  Hash Join  (cost=615.77..1105533.91 rows=6774
width=85) (actual time=1842.309..288198.666 rows=10438694 loops=1)
  Hash Cond: (rm.sk = rs.k)
  ->  Hash Join  (cost=77.32..1104885.39 rows=6774
width=58) (actual time=1831.908..259147.154 rows=10438694 loops=1)
Hash Cond: (rm.advk = ra.k)
->  Nested Loop
(cost=0.00..1104714.83rows=6801 width=44) (actual time=
1820.153..229779.814 rows=10945938 loops=1)
  Join Filter: (rm.nk = rn.k)
  ->  Index Scan using r_idx on rn
(cost=0.00..4.27 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1)
Index Cond: (id = 607)
  ->  Nested Loop  (cost=
0.00..1104370.50 rows=27205 width=48) (actual
time=7.920..202878.054rows=10945998 loops=1)
->  Index Scan using
rpts_ldt_idx on rd  (cost=0.00..4.27 rows=1 width=12) (actual time=
0.097..0.352 rows=30 loops=1)
  Index Cond: ((sdt >=
'2006-12-01 00:00:00'::timestamp without time zone) AND (sd <= '2006-12-30
00:00:00'::timestamp without time zone))
->  Index Scan using rmidx on
rm  (cost=0.00..1100192.24 rows=333919 width=44) (actual time=
3.109..5835.861 rows=364867 loops=30)
  Index Cond: (rmdkey =
rd.k)
->  Hash  (cost=68.15..68.15 rows=734
width=22) (actual time=11.692..11.692 rows=734 loops=1)
  ->  Index Scan using radvki on radvt
(cost=0.00..68.15 rows=734 width=22) (actual time=9.112..10.517 rows=734
loops=1)
Filter: ((name)::text <>
'SYSTEM'::text)
  ->  Hash  (cost=500.35..500.35 rows=3048
width=35) (actual time=10.377..10.377 rows=3048 loops=1)
->  Index Scan using rskidx on rs  (cost=
0.00..500.35 rows=3048 width=35) (actual time=0.082..5.589 rows=3048
loops=1)
->  Hash  (cost=9118.63..9118.63 rows=8076 width=6)
(actual time=31.124..31.124 rows=8076 loops=1)
  ->  Index Scan using rcridx on rcr  (cost=
0.00..9118.63 rows=8076 width=6) (actual time=2.036..19.218 rows=8076
loops=1)
  ->  Hash  (cost=769.94..769.94 rows=3715 width=30) (actual
time=17.275..17.275 rows=3715 loops=1)
->  Index Scan using ridx on rcl
(cost=0.00..769.94rows=3715 width=30) (actual time=
4.238..11.432 rows=3715 loops=1)
->  Hash  (cost=120.38..120.38 rows=2466 width=31) (actual time=
14.010..14.010 rows=2466 loops=1)
  ->  Index Scan using rckdx on rcpn
(cost=0.00..120.38rows=2466 width=31) (actual time=
4.564..9.926 rows=2466 loops=1)
Total runtime: 441153.878 ms
(32 rows)


we are using 8.2 version


On 3/26/07, Michael Fuhr <[EMAIL PROTECTED]> wrote:


On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote:
> how to speedup nested loop queries and by which parameters.

Please post a query you're trying to tune and the EXPLAIN ANALYZE
output, as well as any changes yo

Re: [PERFORM] Nested Loop

2007-03-26 Thread Dave Dutcher
-Original Message-
>From: [EMAIL PROTECTED] On Behalf Of Gauri Kanekar
>Subject: Re: [PERFORM] Nested Loop
>
>join_collapse_limit = 1 # JOINs 

Is there a reason you have this set to 1?  Postgres can't consider multiple
join orders when you do that.  I would try setting that back to the default
and seeing if this query is any faster.

Other than that it looked like the problems with the query might be bad
estimates of rows.  One is that postgres expects there to be 1 matching row
from rd when there are actually 30.  You might try increasing the statistics
targets on rd.sd and rd.sdt, reanalyzing, and seeing if that helps.  Also
postgres expects the join of rd and rm to return about 27205 rows when it
actually returns 10 million.  I'm not sure what you can do about that.
Maybe if Postgres gets a better estimate for rd it would then estimate the
join better.

Dave




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Nested Loop

2007-03-26 Thread Ragnar
On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote:

you did not show your query, nor did you answer whather you had vacuumed
and analyzed.

> enable_seqscan = off

why this? this is unlikely to help


> 
> QUERY PLAN
> ...
>  ->  Nested Loop
> (cost=0.00..1104714.83 rows=6801 width=44) (actual
> time=1820.153..229779.814 rows=10945938 loops=1)

the estimates are way off here. you sure you have analyzed?

gnari

> 


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] [GENERAL] ERROR: out of shared memory

2007-03-26 Thread Tom Lane
"Sorin N. Ciolofan" <[EMAIL PROTECTED]> writes:
>I have to manage an application written in java which call another module
> written in java which uses Postgre DBMS in a Linux environment. I'm new to
> Postgres. The problem is that for large amounts of data the application
> throws an:
>  org.postgresql.util.PSQLException: ERROR: out of shared memory

AFAIK the only very likely way to cause that is to touch enough
different tables in one transaction that you run out of lock entries.
While you could postpone the problem by increasing the
max_locks_per_transaction setting, I suspect there may be some basic
application misdesign involved here.  How many tables have you got?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings