Re: [PERFORM] Slow query - possible bug?

2006-04-18 Thread Gavin Hamill

Tom Lane wrote:


Gavin Hamill [EMAIL PROTECTED] writes:
 


If I replace the
(allocation0_.Date between '2006-06-09 00:00:00.00' and 
'2006-06-09 00:00:00.00')

with
allocation0_.Date ='2006-04-09 00:00:00.00'
then the query comes back in a few milliseconds (as I'd expect :)
   



Could we see EXPLAIN ANALYZE for
* both forms of the date condition, with the roomid condition;
* both forms of the date condition, WITHOUT the roomid condition;
* just the roomid condition

I'm thinking the planner is misestimating something, but it's hard
to tell what without breaking it down.
 



Of course. In each case, I have changed the date by two weeks to try and 
minimise the effect of any query caching.


The base query is explain analyse select allocation0_.ID as y1_, 
allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, 
allocation0_.Price as y3_, allocation0_.Number as y5_, 
allocation0_.Date as y6_ from Allocation allocation0_ where


now both forms of the Date condition

a)

(allocation0_.Date between '2006-04-25 00:00:00.00' and 
'2006-04-25 00:00:00.00')and(allocation0_.RoomID in(211800));

   QUERY PLAN
---
Index Scan using ix_date on Allocation allocation0_  (cost=0.00..4.77 
rows=1 width=34) (actual time=3253.340..48040.396 rows=1 loops=1)
  Index Cond: ((Date = '2006-04-25'::date) AND (Date = 
'2006-04-25'::date))

  Filter: (RoomID = 211800)
Total runtime: 48040.451 ms (ouch!)


b)

(allocation0_.Date= '2006-05-10 
00:00:00.00'::date)and(allocation0_.RoomID in(211800));

  QUERY PLAN

Index Scan using ix_dateroom on Allocation allocation0_  
(cost=0.00..5.01 rows=1 width=34) (actual time=0.033..0.035 rows=1 loops=1)

  Index Cond: ((RoomID = 211800) AND (Date = '2006-05-10'::date))
Total runtime: 0.075 ms (whoosh!)

And now without the RoomID condition:

a)
(allocation0_.Date between '2006-06-10 00:00:00.00' and 
'2006-06-10 00:00:00.00');

QUERY PLAN
---
Index Scan using ix_date on Allocation allocation0_  (cost=0.00..4.77 
rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1)
  Index Cond: ((Date = '2006-06-10'::date) AND (Date = 
'2006-06-10'::date))

Total runtime: 6728.743 ms

b)
(allocation0_.Date= '2006-05-25 00:00:00.00'::date);
 QUERY PLAN
--
Bitmap Heap Scan on Allocation allocation0_  (cost=87.46..25017.67 
rows=13845 width=34) (actual time=207.674..9702.656 rows=34241 loops=1)

  Recheck Cond: (Date = '2006-05-25'::date)
  -  Bitmap Index Scan on ix_date  (cost=0.00..87.46 rows=13845 
width=0) (actual time=185.086..185.086 rows=42705 loops=1)

Index Cond: (Date = '2006-05-25'::date)
Total runtime: 9725.470 ms


Wow, I'm not really sure what that tells me...

Cheers,
Gavin.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Migration study, step 2: rewriting queries

2006-04-18 Thread Mikael Carneholm
This should be fixed by the changes I made recently in
choose_bitmap_and
--- it wasn't being aggressive about pruning overlapping AND conditions
when a sub-OR was involved.  It's possible the new coding is *too*
aggressive, and will reject indexes that it'd be profitable to include;
but at least it won't make this particular mistake.

Ok, cool. I don't have time to test this right now as the project has to
move on (and I guess testing the fix would require a dump+build CVS
version+restore), but as a temporary workaround I simly dropped the
xda_dat index (all queries on that table include the
person_information__id column anyway).

- Mikael



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Inserts optimization?

2006-04-18 Thread Markus Schaber
Hi, Francisco,

Francisco Reyes wrote:

 I only wonder what is safer.. using a second or two in commit_delay or
 using fsync = off.. Anyone cares to comment?

It might be that you misunderstood commit_delay. It will not only delay
the disk write, but also block your connnection until the write actually
is performed.

It will rise the throughput in multi-client scenarios, but will also
rise the latency, and it will absolutely bring no speedup in
single-client scenarios.

It does not decrease safety (in opposite to fsync=off), data will be
consistent, and any application that has successfully finished a commit
can be shure their data is on the platters.[1]

HTH,
Markus

[1] As long as the platters don't lie, but that's another subject.

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] mergehashloop

2006-04-18 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 Well, the other thing that's going on here is that we know we are
 overestimating the cost of nestloop-with-inner-indexscan plans.
 The current estimation for that is basically outer scan cost plus N
 times inner scan cost where N is the estimated number of outer tuples;
 in other words the repeated indexscan probes are each assumed to happen
 from a cold start.  In reality, caching of the upper levels of the index
 means that the later index probes are much cheaper than this model
 thinks.  We've known about this for some time but no one's yet proposed
 a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

 In my mind this is tied into another issue, which is that the planner
 always costs on the basis of each query starting from zero.  In a real
 environment it's much cheaper to use heavily-used indexes than this cost
 model suggests, because they'll already be swapped in due to use by
 previous queries.  But we haven't got any infrastructure to keep track
 of what's been heavily used, let alone a cost model that could make use
 of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

 I think part of the reason that people commonly reduce random_page_cost
 to values much lower than physical reality would suggest is that it
 provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Inserts optimization?

2006-04-18 Thread Magnus Hagander
  For now, I only could get good performance with bacula and 
 postgresql 
  when disabling fsync...
 
 
 Isn't that less safe?

Most definitly.

FWIW, I'm getting pretty good speeds with Bacula and PostgreSQL on a
reasonably small db (file table about 40 million rows, filename about
5.2 million and path 1.5 million). 

Config changes are increasing shared mem and work mems, fsm pages,
wal_sync_method=fdatasync, wal_buffers=16, checkpoint_segments=8,
default_with_oids=off (before creating the bacula tables, so they don't
use oids).

Used to run with full_pages_writes=off, but not anymore since it's not
safe.


 Also planning to check commit_delay and see if that helps.
 I will try to avoid 2 or more machines backing up at the same 
 time.. plus in a couple of weeks I should have a better 
 machine for the DB anyways..

Bacula already serializes access to the database (they have to support
mysql/myisam), so this shouldn't help. Actually, it might well hurt by
introducing extra delays.

 I only wonder what is safer.. using a second or two in 
 commit_delay or using 
 fsync = off.. Anyone cares to comment?

Absolutely a commit_delay.


//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] SELECT FOR UPDATE performance is bad

2006-04-18 Thread Mario Splivalo
For the purpose of the application I need to establish some form of
serialization, therefore I use FOR UPDATE. The query, inside the
function, is like this:

pulitzer2=# explain analyze select id FROM messages JOIN
ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND
receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE;

QUERY PLAN 
-
 Hash Join  (cost=32131.04..34281.86 rows=627 width=16) (actual
time=742.806..1491.864 rows=58005 loops=1)
   Hash Cond: (outer.message_id = inner.id)
   -  Seq Scan on ticketing_codes_played  (cost=0.00..857.17 rows=57217
width=10) (actual time=0.024..209.331 rows=58005 loops=1)
   -  Hash  (cost=32090.60..32090.60 rows=16177 width=10) (actual
time=742.601..742.601 rows=65596 loops=1)
 -  Bitmap Heap Scan on messages  (cost=4153.51..32090.60
rows=16177 width=10) (actual time=160.555..489.459 rows=65596 loops=1)
   Recheck Cond: ((service_id = 1102) AND (receiving_time =
'2006-03-01 00:00:00+01'::timestamp with time zone) AND (receiving_time
= '2006-06-30 00:00:00+02'::timestamp with time zone))
   -  BitmapAnd  (cost=4153.51..4153.51 rows=16177 width=0)
(actual time=156.900..156.900 rows=0 loops=1)
 -  Bitmap Index Scan on idx_service_id
(cost=0.00..469.31 rows=68945 width=0) (actual time=16.661..16.661
rows=66492 loops=1)
   Index Cond: (service_id = 1102)
 -  Bitmap Index Scan on
idx_messages_receiving_time  (cost=0.00..3683.95 rows=346659 width=0)
(actual time=137.526..137.526 rows=360754 loops=1)
   Index Cond: ((receiving_time = '2006-03-01
00:00:00+01'::timestamp with time zone) AND (receiving_time =
'2006-06-30 00:00:00+02'::timestamp with time zone))
 Total runtime: 6401.954 ms
(12 rows)



Now, this query takes between 8 and 30 seconds, wich is a lot, since
during the day we have almost 20 requests per minute. I notice that
during the execution of the above mentioned query i/o goes bezerk,
iostat tells me that load is around 60%. I tried playing with WAL
configuration parametars, even put the log on separate disk spindles, it
did nothing.

Shall I reconsider the need for the exact lock I developed, or there is
something more I could do to speed the things up?

Mario
-- 
Mario Splivalo
Mob-Art
[EMAIL PROTECTED]

I can do it quick, I can do it cheap, I can do it well. Pick any two.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] SELECT FOR UPDATE performance is bad

2006-04-18 Thread Tom Lane
Mario Splivalo [EMAIL PROTECTED] writes:
 For the purpose of the application I need to establish some form of
 serialization, therefore I use FOR UPDATE. The query, inside the
 function, is like this:

 pulitzer2=# explain analyze select id FROM messages JOIN
 ticketing_codes_played ON id = message_id WHERE service_id = 1102 AND
 receiving_time BETWEEN '2006-03-01' AND '2006-06-30' FOR UPDATE;

  Hash Join  (cost=32131.04..34281.86 rows=627 width=16) (actual
 time=742.806..1491.864 rows=58005 loops=1)
  ^

 Now, this query takes between 8 and 30 seconds, wich is a lot, since
 during the day we have almost 20 requests per minute.

Acquiring a row lock separately for each of 58000 rows is not going to
be a cheap operation.  Especially not if anyone else is locking any of
the same rows and thereby blocking you.  If there is concurrent locking,
you're also running a big risk of deadlock because two processes might
try to lock the same rows in different orders.

Are you really intending to update all 58000 rows?  If not, what is
the serialization requirement exactly (ie, what are you trying to
accomplish)?  Seems like something about this app needs to be
redesigned.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Migration study, step 2: rewriting queries

2006-04-18 Thread Tom Lane
Mikael Carneholm [EMAIL PROTECTED] writes:
 Ok, cool. I don't have time to test this right now as the project has to
 move on (and I guess testing the fix would require a dump+build CVS
 version+restore), but as a temporary workaround I simly dropped the
 xda_dat index (all queries on that table include the
 person_information__id column anyway).

The patch is in the 8.1 branch so you don't need dump/restore anyway...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi!

  I am having trouble with like statements on one of my tables.

  I already tried a vacuum and analyze but with no success.

  The database is PostgreSQL Database Server 8.1.3 on i686-pc-mingw32

I get the following explain and I am troubled by the very high
startup_cost ... does anyone have any idea why that value is so
high?

{SEQSCAN
   :startup_cost 1.00 
   :total_cost 100021432.33 
   :plan_rows 1 
   :plan_width 1311 
   :targetlist (
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 1 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 1
 }
  :resno 1 
  :resname image_id 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 1 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 2 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 2
 }
  :resno 2 
  :resname customer_id 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 2 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 3 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 3
 }
  :resno 3 
  :resname theme_id 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 3 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 4 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 4
 }
  :resno 4 
  :resname gallery_id 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 4 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 5 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 5
 }
  :resno 5 
  :resname event_id 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 5 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 6 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 6
 }
  :resno 6 
  :resname width 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 6 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 7 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 7
 }
  :resno 7 
  :resname height 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 7 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 8 
 :vartype 23 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 8
 }
  :resno 8 
  :resname filesize 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 8 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 9 
 :vartype 1114 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 9
 }
  :resno 9 
  :resname uploadtime 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 9 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 10 
 :vartype 1043 
 :vartypmod 259 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 10
 }
  :resno 10 
  :resname filename 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 10 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 11 
 :vartype 1043 
 :vartypmod 259 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 11
 }
  :resno 11 
  :resname originalfilename 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 11 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 12 
 :vartype 1043 
 :vartypmod 259 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 12
 }
  :resno 12 
  :resname thumbname 
  :ressortgroupref 0 
  :resorigtbl 29524 
  :resorigcol 12 
  :resjunk false
  }
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 1 
 :varattno 13 
 :vartype 1043 
 :vartypmod 259 
 :varlevelsup 0 
 :varnoold 1 
 :varoattno 13
 }
 

Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Dave Dutcher
 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Tarabas (Manuel Rorarius)
 Subject: [PERFORM] Problem with LIKE-Performance
 
 Hi!
 
   I am having trouble with like statements on one of my tables.


It looks like you are getting a sequential scan instead of an index
scan.  What is your locale setting?  As far as I know Postgres doesn't
support using indexes with LIKE unless you are using the C locale.  

Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN
ANALYZE VERBOSE.

Dave



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi Dave,

DD It looks like you are getting a sequential scan instead of an index
DD scan.  What is your locale setting?  As far as I know Postgres doesn't
DD support using indexes with LIKE unless you are using the C locale.

Actually no, I am using de_DE as locale because I need the german
order-by support. But even for a seq-scan it seems pretty slow, but that's
just a feeling. The table currently has ~172.000 rows and is suposed to
rise to about 1 mio or more.

Is there any way to speed the like's up with a different locale than C
or to get an order by in a different Locale although using the
default C locale?

DD Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN
DD ANALYZE VERBOSE.

ok, i will keep that in mind :-) didn't know how verbose you would need
it *smile*

Best regards
Manuel


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tom Lane
Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes:
 I get the following explain and I am troubled by the very high
 startup_cost ... does anyone have any idea why that value is so
 high?

 {SEQSCAN
:startup_cost 1.00 

You have enable_seqscan = off, no?

Please refrain from posting EXPLAIN VERBOSE unless it's specifically
requested ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] SELECT FOR UPDATE performance is bad

2006-04-18 Thread Tom Lane
Mario Splivalo [EMAIL PROTECTED] writes:
 If there is concurrent locking,
 you're also running a big risk of deadlock because two processes might
 try to lock the same rows in different orders.

 I think there is no risk of a deadlock, since that particular function
 is called from the middleware (functions are used as interface to the
 database), and the lock order is always the same.

No, you don't even know what the order is, let alone that it's always
the same.

 Now, I just need to have serialization, I need to have clients 'line up'
 in order to perform something in the database. Actually, users are
 sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and
 database needs to check has the code allready been played. Since the
 system is designed so that it could run multiple code-games (and then
 there similair code could exists for coke-game and beer-game), I'm using
 messages table to see what code-game (i.e. service) that particular code
 belongs.

I'd suggest using a table that has exactly one row per code-game, and
doing a SELECT FOR UPDATE on that row to establish the lock you need.
This need not have anything to do with the tables/rows you are actually
intending to update --- although obviously such a convention is pretty
fragile if you have updates coming from a variety of code.  I think it's
reasonably safe when you're funneling all the operations through a bit
of middleware.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi Tom,

TL Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes:
 I get the following explain and I am troubled by the very high
 startup_cost ... does anyone have any idea why that value is so
 high?

 {SEQSCAN
:startup_cost 1.00 

TL You have enable_seqscan = off, no?

You were right, I was testing this and had it removed, but somehow I
must have hit the wrong button in pgadmin and it was not successfully
removed from the database.

After removing the enable_seqscan = off and making sure it was gone,
it is a lot faster again.

Now it takes about 469.841 ms for the select.

TL Please refrain from posting EXPLAIN VERBOSE unless it's specifically
TL requested ...

mea culpa, i will not do that again :-)

Best regards
Manuel


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Hakan Kocaman
Hi,

i remember something that you need a special index with localesC.

You nned a different operator class for this index smth. like:
CREATE INDEX idx_image_title
  ON image
  USING btree
  (title varchar_pattern_ops);

You can find the details here:
http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: [EMAIL PROTECTED]



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Tarabas (Manuel Rorarius)
 Sent: Tuesday, April 18, 2006 4:35 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Problem with LIKE-Performance
 
 
 Hi!
 
   I am having trouble with like statements on one of my tables.
 
   I already tried a vacuum and analyze but with no success.
 
   The database is PostgreSQL Database Server 8.1.3 on i686-pc-mingw32
 
 I get the following explain and I am troubled by the very high
 startup_cost ... does anyone have any idea why that value is so
 high?
 
 {SEQSCAN
:startup_cost 1.00 
:total_cost 100021432.33 
:plan_rows 1 
:plan_width 1311 
:targetlist (
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 1 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 1
  }
   :resno 1 
   :resname image_id 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 1 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 2 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 2
  }
   :resno 2 
   :resname customer_id 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 2 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 3 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 3
  }
   :resno 3 
   :resname theme_id 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 3 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 4 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 4
  }
   :resno 4 
   :resname gallery_id 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 4 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 5 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 5
  }
   :resno 5 
   :resname event_id 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 5 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 6 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 6
  }
   :resno 6 
   :resname width 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 6 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 7 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 7
  }
   :resno 7 
   :resname height 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 7 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 8 
  :vartype 23 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 8
  }
   :resno 8 
   :resname filesize 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 8 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 9 
  :vartype 1114 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 9
  }
   :resno 9 
   :resname uploadtime 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 9 
   :resjunk false
   }
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 1 
  :varattno 10 
  :vartype 1043 
  :vartypmod 259 
  :varlevelsup 0 
  :varnoold 1 
  :varoattno 10
  }
   :resno 10 
   :resname filename 
   :ressortgroupref 0 
   :resorigtbl 29524 
   :resorigcol 10 
   :resjunk false
   }

Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread REISS Thomas DSIC DESP



Dave Dutcher a écrit :

It looks like you are getting a sequential scan instead of an index
scan.  What is your locale setting?  As far as I know Postgres doesn't
support using indexes with LIKE unless you are using the C locale.
  

It does if you create your index this way :

CREATE INDEX idx_image_title
 ON image
 USING btree
 (title varchar_pattern_ops);

Please see http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html


Thomas


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [bulk] RE: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi Hakan,

HK i remember something that you need a special index with localesC.
HK You nned a different operator class for this index smth. like:
HK CREATE INDEX idx_image_title
HK   ON image
HK   USING btree
HK   (title varchar_pattern_ops);

I also forgot that, thanks a lot for the hint. that speeded up my
searches a lot!

Best regards
Manuel


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Guido Neitzer

On 18.04.2006, at 17:16 Uhr, Tarabas (Manuel Rorarius) wrote:


Is there any way to speed the like's up with a different locale than C
or to get an order by in a different Locale although using the
default C locale?


Sure. Just create the index with

create index tabname_column_index on tabname (column  
varchar_pattern_ops);


Than you can use something like

select * from table where column like 'Something%';

Remember that an index can't be used for queries with '%pattern%'.

cug

smime.p7s
Description: S/MIME cryptographic signature


Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi Tom,

TL As already noted, it might be worth your while to add an index using the
TL pattern-ops opclass to help with queries like this.

I have done that now and it works very fine as supposed.

The problem with the high startup_costs disappeared somehow after the
change of the enable_seqscan = off and a restart of pg-admin.

first Time I ran the statement it showed 13 sec execution time.

Seq Scan on image image0_  (cost=0.00..21414.21 rows=11 width=1311)
(actual time=10504.138..12857.127 rows=119 loops=1)
  Filter: ((title)::text ~~ '%Davorka%'::text)
Total runtime: 12857.372 ms

second time I ran the statement it dropped to ~500 msec , which is
pretty ok. :-)

Seq Scan on image image0_  (cost=0.00..21414.21 rows=11 width=1311)
(actual time=270.289..552.144 rows=119 loops=1)
  Filter: ((title)::text ~~ '%Davorka%'::text)
Total runtime: 552.708 ms

Best regards
Manuel Rorarius


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tom Lane
Tarabas (Manuel Rorarius) [EMAIL PROTECTED] writes:
 After removing the enable_seqscan = off and making sure it was gone,
 it is a lot faster again.
 Now it takes about 469.841 ms for the select.

Um, no, enable_seqscan would certainly not have had any effect on the
*actual* runtime of this query.  All that enable_seqscan = off really
does is to add a large constant to the estimated cost of any seqscan,
so as to prevent the planner from selecting it unless there is no other
alternative plan available.  But that has nothing to do with how long
the seqscan will really run.

If you are seeing a speedup in repeated executions of the same seqscan
plan, it's probably just a caching effect.

As already noted, it might be worth your while to add an index using the
pattern-ops opclass to help with queries like this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] creating of temporary table takes very long

2006-04-18 Thread Tom Lane
Sriram Dandapani [EMAIL PROTECTED] writes:
 Got an explain analyze output..Here it is
 Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
 width=136) (actual time=2.345..648070.474 rows=22001 loops=1)
   Filter: (subplan)
   SubPlan
 -  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
 (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
 rows=1 loops=22001)
   Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id)
 AND ($2 = node_id))
   Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
 COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
 (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)
   -  Bitmap Index Scan on chkpfw_tr_hr_idx1
 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
 rows=33026 loops=22001)
 Index Cond: (($0 = firstoccurrence) AND ($1 =
 sentryid_id) AND ($2 = node_id))
 Total runtime: 648097.800 ms

That's probably about as good a query plan as you can hope for given
the way the query is written.  Those COALESCE comparisons are all
unindexable (unless you make functional indexes on the COALESCE
expressions).  You might get somewhere by converting the EXISTS
to an IN, though.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] creating of temporary table takes very long

2006-04-18 Thread Sriram Dandapani
Thx Tom

I guess I have to abandon the bulk update. The columns in the where
clause comprise 80% of the table columns..So indexing all may not help.
The target table will have on average 60-180 million rows.

I will attempt the in instead of exist and let you know the result

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 18, 2006 9:10 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] creating of temporary table takes very long 

Sriram Dandapani [EMAIL PROTECTED] writes:
 Got an explain analyze output..Here it is
 Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
 width=136) (actual time=2.345..648070.474 rows=22001 loops=1)
   Filter: (subplan)
   SubPlan
 -  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
 (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
 rows=1 loops=22001)
   Recheck Cond: (($0 = firstoccurrence) AND ($1 =
sentryid_id)
 AND ($2 = node_id))
   Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
 COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
 (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)
   -  Bitmap Index Scan on chkpfw_tr_hr_idx1
 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
 rows=33026 loops=22001)
 Index Cond: (($0 = firstoccurrence) AND ($1 =
 sentryid_id) AND ($2 = node_id))
 Total runtime: 648097.800 ms

That's probably about as good a query plan as you can hope for given
the way the query is written.  Those COALESCE comparisons are all
unindexable (unless you make functional indexes on the COALESCE
expressions).  You might get somewhere by converting the EXISTS
to an IN, though.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Richard Huxton

Tarabas (Manuel Rorarius) wrote:

Hi Tom,

TL As already noted, it might be worth your while to add an index using the
TL pattern-ops opclass to help with queries like this.

I have done that now and it works very fine as supposed.

The problem with the high startup_costs disappeared somehow after the
change of the enable_seqscan = off and a restart of pg-admin.


I'm not sure restarting pgAdmin would have had any effect.


first Time I ran the statement it showed 13 sec execution time.

Seq Scan on image image0_  (cost=0.00..21414.21 rows=11 width=1311)
(actual time=10504.138..12857.127 rows=119 loops=1)
  Filter: ((title)::text ~~ '%Davorka%'::text)
Total runtime: 12857.372 ms

second time I ran the statement it dropped to ~500 msec , which is
pretty ok. :-)


This will be because all the data is cached in the server's memory.


Seq Scan on image image0_  (cost=0.00..21414.21 rows=11 width=1311)
(actual time=270.289..552.144 rows=119 loops=1)
  Filter: ((title)::text ~~ '%Davorka%'::text)
Total runtime: 552.708 ms


As you can see, the plan is still scanning all the rows. In any case, 
you've changed the query - this has % at the beginning and end, which no 
index will help you with.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [bulk] Re: [bulk] Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Tarabas (Manuel Rorarius)
Hi Richard,

RH As you can see, the plan is still scanning all the rows. In any case, 
RH you've changed the query - this has % at the beginning and end, which no
RH index will help you with.

I realize that, the index definately helped a lot with the query where
the % is just at the end. The time went down to 0.203 ms after I
changed the index to varchar_pattern_ops.

Index Scan using idx_image_title on image  (cost=0.00..6.01 rows=1 width=1311) 
(actual time=0.027..0.108 rows=33 loops=1)
Index Cond: (((title)::text ~=~ 'Davorka'::character varying) AND 
((title)::text ~~ 'Davorkb'::character varying))
Filter: ((title)::text ~~ 'Davorka%'::text)
Total runtime: 0.203 ms

Although 13 sec. for the first select seems a bit odd, I think after
the Database-Cache on the Table kicks in, it should be fine with ~500 ms

Best regards
Manuel


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SELECT FOR UPDATE performance is bad

2006-04-18 Thread PFC


Suppose you have a table codes :
(
game_id INT,
codeTEXT,
usedBOOL NOT NULL DEFAULT 'f',
prize   ...
...
PRIMARY KEY (game_id, code)
)

Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND 
code=...

	Then check the rowcount : if one row was updated, the code was not used  
yet. If no row was updated, the code either did not exist, or was already  
used.


Another option : create a table used_codes like this :

(
game_id INT,
codeTEXT,
...
PRIMARY KEY (game_id, code)
)

	Then, when trying to use a code, INSERT into this table. If you get a  
constraint violation on the uniqueness of the primary key, your code has  
already been used.


	Both solutions have a big advantage : they don't require messing with  
locks and are extremely simple. The one with UPDATE is IMHO better,  
because it doesn't abort the current transaction (although you could use a  
savepoint in the INSERT case to intercept the error).









On Tue, 18 Apr 2006 17:33:06 +0200, Tom Lane [EMAIL PROTECTED] wrote:


Mario Splivalo [EMAIL PROTECTED] writes:

If there is concurrent locking,
you're also running a big risk of deadlock because two processes might
try to lock the same rows in different orders.



I think there is no risk of a deadlock, since that particular function
is called from the middleware (functions are used as interface to the
database), and the lock order is always the same.


No, you don't even know what the order is, let alone that it's always
the same.


Now, I just need to have serialization, I need to have clients 'line up'
in order to perform something in the database. Actually, users are
sending codes from the newspaper, beer-cans, Cola-cans, and stuff, and
database needs to check has the code allready been played. Since the
system is designed so that it could run multiple code-games (and then
there similair code could exists for coke-game and beer-game), I'm using
messages table to see what code-game (i.e. service) that particular code
belongs.


I'd suggest using a table that has exactly one row per code-game, and
doing a SELECT FOR UPDATE on that row to establish the lock you need.
This need not have anything to do with the tables/rows you are actually
intending to update --- although obviously such a convention is pretty
fragile if you have updates coming from a variety of code.  I think it's
reasonably safe when you're funneling all the operations through a bit
of middleware.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-18 Thread Rodrigo Sakai
  Thanks for all responses! I agree with most of you, and say that the RI is
best maintened by Database ! Performance must be improved in other ways
(indexes, hardware, etc)!


- Original Message - 
From: Jim C. Nasby [EMAIL PROTECTED]
To: Craig A. James [EMAIL PROTECTED]
Cc: PFC [EMAIL PROTECTED]; Michael Glaesemann [EMAIL PROTECTED];
Rodrigo Sakai [EMAIL PROTECTED];
pgsql-performance@postgresql.org
Sent: Wednesday, April 12, 2006 5:59 PM
Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE


 On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote:
  Jim C. Nasby wrote:
  1. You have only one application that modifies the data.  (Otherwise,
you
  have to duplicate the rules across many applications, leading to a
  code-maintenance nightmare).
  
  You forgot something:
  
  1a: You know that there will never, ever, ever, ever, be any other
  application that wants to talk to the database.
  
  I know tons of people that get burned because they go with something
  that's good enough for now, and then regret that decision for years
to
  come.
 
  No, I don't agree with this.  Too many people waste time designing for
  what if... scenarios that never happen.  You don't want to be dumb and
  design something that locks out a foreseeable and likely future need,
but
  referential integrity doesn't meet this criterion.  There's nothing to
keep
  you from changing from app-managed to database-managed referential
  integrity if your needs change.

 In this case your argument makes no sense, because you will spend far
 more time re-creating RI capability inside an application than if you
 just use what the database offers natively.

 It's certainly true that you don't want to over-engineer for no reason,
 but many times choices are made to save a very small amount of time or
 hassle up-front, and those choices become extremely painful later.
 -- 
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow query - possible bug?

2006-04-18 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm thinking the planner is misestimating something, but it's hard
 to tell what without breaking it down.

 (allocation0_.Date between '2006-06-10 00:00:00.00' and 
 '2006-06-10 00:00:00.00');
  QUERY PLAN
 ---
  Index Scan using ix_date on Allocation allocation0_  (cost=0.00..4.77 
 rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1)
Index Cond: ((Date = '2006-06-10'::date) AND (Date = 
 '2006-06-10'::date))
  Total runtime: 6728.743 ms

Bingo, there's our misestimation: estimated 1 row, actual 34220 :-(

That's why it's choosing the wrong index: it thinks the condition on
RoomID isn't going to reduce the number of rows fetched any further,
and so the smaller index ought to be marginally cheaper to use.
In reality, it works way better when using the two-column index.

I think this is the same problem recently discussed about how the
degenerate case for a range comparison is making an unreasonably small
estimate, where it probably ought to fall back to some equality estimate
instead.  With the simple-equality form of the date condition, it does
get a reasonable estimate, and so it picks the right index.

There should be a fix for this by the time PG 8.2 comes out, but in the
meantime you might find that it helps to write the range check in a way
that doesn't have identical bounds, eg
date = '2006-06-10'::date AND date  '2006-06-11'::date

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow query - possible bug?

2006-04-18 Thread Gavin Hamill
On Tue, 18 Apr 2006 13:31:48 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 There should be a fix for this by the time PG 8.2 comes out, but in
 the meantime you might find that it helps to write the range check in
 a way that doesn't have identical bounds, eg
   date = '2006-06-10'::date AND date  '2006-06-11'::date

OK coolies - we've already had a code release for this (and other
stuff) planned for tomorrow morning checking on the client side
if a single date has been chosen, then do an equality test on that...
otherwise leave the between in place - seems to work like a charm, and
hopefully it'll mean we don't have a loadavg of 15 on our main pg
server tomorrow (!) :))

Basically, as long as I know it's a pg issue rather than something daft
I've done (or not done) then I'm happy enough. 

Cheers,
Gavin.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow query - possible bug?

2006-04-18 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 On Tue, 18 Apr 2006 13:31:48 -0400
 Tom Lane [EMAIL PROTECTED] wrote:
 There should be a fix for this by the time PG 8.2 comes out, but in
 the meantime you might find that it helps to write the range check in
 a way that doesn't have identical bounds, eg
 date = '2006-06-10'::date AND date  '2006-06-11'::date

 OK coolies - we've already had a code release for this (and other
 stuff) planned for tomorrow morning checking on the client side
 if a single date has been chosen, then do an equality test on that...

Fair enough, no reason to replace one workaround with another.  But 
would you try it on your test case, just to verify the diagnosis?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] index is not used if I include a function that returns current time in my query

2006-04-18 Thread Jim C. Nasby
Interesting what's EXPLAIN ANALYZE show if you SET
enable_seqscan=off; ?

You should also consider upgrading to 8.1...

On Thu, Apr 13, 2006 at 12:25:02PM +0200, Cris Carampa wrote:
 Hello, postgresql 7.4.8 on SuSE Linux here.
 
 I have a table called DMO with a column called ORA_RIF defined as 
 timestamp without time zone ;
 
 I created an index on this table based on this column only.
 
 If I run a query against a text literal the index is used:
 
  explain select * from dmo where ora_rif'2006-01-01';
   QUERY PLAN
 -
  Index Scan using dmo_ndx02 on dmo  (cost=0.00..1183.23 rows=736 width=156)
Index Cond: (ora_rif  '2006-01-01 00:00:00'::timestamp without time 
 zone)
 
 If I try to use a function that returns the current time instead, a 
 sequential scan is always performed:
 
  explain select * from dmo where ora_riflocaltimestamp;
   QUERY PLAN
 --
  Seq Scan on dmo  (cost=0.00..1008253.22 rows=2703928 width=156)
Filter: (ora_rif  ('now'::text)::timestamp(6) without time zone)
 
  explain select * from dmo where ora_riflocaltimestamp::timestamp 
 without time zone;
   QUERY PLAN
 --
  Seq Scan on dmo  (cost=0.00..1008253.22 rows=2703928 width=156)
Filter: (ora_rif  ('now'::text)::timestamp(6) without time zone)
 
 ... etc. ...
 
 (tried with all datetime functions with and without cast)
 
 I even tried to write a function that explicitly returns a timestamp 
 without time zone value:
 
 create or replace function f () returns timestamp without time zone
 as '
 declare
   x timestamp without time zone ;
 begin
   x := ''2006-01-01 00:00:00'';
   return x ;
 end ;
 ' language plpgsql ;
 
 But the result is the same:
 
  explain select * from dmo ora_riff();
  QUERY PLAN
 -
  Seq Scan on dmo  (cost=0.00..987973.76 rows=2703928 width=156)
Filter: (ora_rif  f())
 
 Any suggestion?
 
 Kind regards,
 
 -- 
 Cris Carampa (spamto:[EMAIL PROTECTED])
 
 potevo chiedere come si chiama il vostro cane
 il mio ? un po' di tempo che si chiama Libero
 
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Blocks read for index scans

2006-04-18 Thread Jim C. Nasby
On Fri, Apr 14, 2006 at 11:12:55AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  In my case it would be helpful to break the heap access numbers out
  between seqscans and index scans, since each of those represents very
  different access patterns. Would adding that be a mess?
 
 Yes; it'd require more counters-per-table than we now keep, thus
 nontrivial bloat in the stats collector's tables.  Not to mention

ISTM it would only require two additional columns, which doesn't seem
unreasonable, especially considering the value of the information
collected.

 incompatible changes in the pgstats views and the underlying functions
 (which some apps probably use directly).

There's certainly ways around that issue, especially since this would
only be adding new information (though we would probably want to
consider the old info as depricated and eventually remove it).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Blocks read for index scans

2006-04-18 Thread Jim C. Nasby
On Fri, Apr 14, 2006 at 08:05:39AM +0200, Terje Elde wrote:
 Jim Nasby wrote:
 While working on determining a good stripe size for a database, I 
 realized it would be handy to know what the average request size is. 
 Getting this info is a simple matter of joining pg_stat_all_tables and 
 pg_statio_all_tables and doing some math, but there's one issue I've 
 found; it appears that there's no information on how many heap blocks 
 were read in by an index scan. Is there any way to get that info?
snip 
 Knowing what the average stripe size is can be a good place to start, 
 but the real question is;  which stripe size will allow the majority of 
 your transactions to be possible to satisfy without having to go to two 
 spindles?

And of course right now there's not a very good way to know that...
granted, I can look at the average request size on the machine, but that
will include any seqscans that are happening, and for stripe sizing I
think it's better to leave that out of the picture unless your workload
is heavily based on seqscans.

 That said, it's the transactions against disk that typically matter.  On 
 FreeBSD, you can get an impression of this using 'systat -vmstat', and 
 watch the KB/t column for your drives.

On a related note, you know of any way to determine the breakdown
between read activity and write activity on FreeBSD? vmstat, systat,
iostat all only return aggregate info. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow query - possible bug?

2006-04-18 Thread Gavin Hamill
On Tue, 18 Apr 2006 15:51:44 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Fair enough, no reason to replace one workaround with another.  But 
 would you try it on your test case, just to verify the diagnosis?

Yup I can confirm it from testing earlier today - as soon as
the two dates are non-equal, an index scan is correctly selected and
returns results in just a few milliseconds:

laterooms=# explain analyse select allocation0_.ID as y1_,
allocation0_.RoomID as y2_, allocation0_.StatusID as y4_,
allocation0_.Price as y3_, allocation0_.Number as y5_,
allocation0_.Date as y6_ from Allocation allocation0_ where
(allocation0_.Date between '2006-04-25 00:00:00.00' and
'2006-04-26 00:00:00.00')and(allocation0_.RoomID in(211800));
QUERY PLAN
---
Index Scan using ix_dateroom on Allocation allocation0_
(cost=0.00..14.02 rows=4 width=34) (actual time=16.799..21.804 rows=2
loops=1) Index Cond: ((RoomID = 211800) AND (Date =
'2006-04-25'::date) AND (Date = '2006-04-26'::date)) 
Total runtime: 21.910 ms

which I ran first, versus the identical-date equivalent which turned
in a whopping...

 Index Scan using ix_date on Allocation allocation0_
(cost=0.00..4.77 rows=1 width=34) (actual time=6874.272..69541.064
rows=1 loops=1) Index Cond: ((Date = '2006-04-25'::date) AND (Date
= '2006-04-25'::date)) Filter: (RoomID = 211800) Total runtime:
69541.113 ms (4 rows)

Cheers,
Gavin.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Inserts optimization?

2006-04-18 Thread Jim C. Nasby
On Fri, Apr 14, 2006 at 03:15:33PM -0500, Scott Marlowe wrote:
 On Fri, 2006-04-14 at 15:09, Francisco Reyes wrote:
  Michael Stone writes:
  
   I still don't follow that. Why would the RAID level matter? IOW, are you 
   actually wanting 2 spares, or are you just stick with that because you 
   need a factor of two disks for your mirrors?
  
  RAID 10 needs pairs.. so we can either have no spares or 2 spares.
 
 Spares are placed in service one at a time.  You don't need 2 spares for
 RAID 10, trust me.

Sadly, 3ware doesn't produce any controllers with the ability to do an
odd number of channels, so you end up burning through 2 slots to get a
hot spare (unless you spend substantially more money and go with the
next model up).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Inserts optimization?

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 01:56:44PM +0200, Magnus Hagander wrote:
 Bacula already serializes access to the database (they have to support
 mysql/myisam), so this shouldn't help. Actually, it might well hurt by
 introducing extra delays.

You have any contact with the developers? Maybe they're a possibility
for our summer of code...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg_toast size

2006-04-18 Thread Jim C. Nasby
On Fri, Apr 14, 2006 at 03:13:43PM +0200, Julien Drouard wrote:
 Hi everyone,
 
 I've seen my pg_toast tables are becoming bigger and bigger. After googling I 
 would like to modify my max_fsm_pages parameter to prevent that kind of 
 problem. So I'm wondering if changing this parameter is enough and after that 
 how can I reduce the size of these tables? By doing a full vacuum?

A full vacuum would do it. CLUSTERing the table might rewrite the toast
tables as well.

As for toast, if you do a vacuum verbose over the entire cluster, it
will tell you at the end how much space you need in the FSM. See also
http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10087
and http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10116
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 12:51:59PM +0200, Markus Schaber wrote:
  In my mind this is tied into another issue, which is that the planner
  always costs on the basis of each query starting from zero.  In a real
  environment it's much cheaper to use heavily-used indexes than this cost
  model suggests, because they'll already be swapped in due to use by
  previous queries.  But we haven't got any infrastructure to keep track
  of what's been heavily used, let alone a cost model that could make use
  of the info.
 
 An easy first approach would be to add a user tunable cache probability
 value to each index (and possibly table) between 0 and 1. Then simply
 multiply random_page_cost with (1-that value) for each scan.
 
 Later, this value could be automatically tuned by stats analysis or
 other means.

Actually, if you run with stats_block_level turned on you have a
first-order approximation of what is and isn't cached. Perhaps the
planner could make use of this information if it's available.

  I think part of the reason that people commonly reduce random_page_cost
  to values much lower than physical reality would suggest is that it
  provides a crude way of partially compensating for this basic problem.
 
 I totall agree with this, it's just what we did here from time to time. :-)
 
 Hmm, how does effective_cach_size correspond with it? Shouldn't a high
 effective_cache_size have a similar effect?

Generally, effective_cache_size is used to determine the likelyhood that
something will be in-cache. random_page_cost tells us how expensive it
will be to get that information if it isn't in cache.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Actually, if you run with stats_block_level turned on you have a
 first-order approximation of what is and isn't cached.

Only if those stats decayed (pretty fast) with time; which they don't.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 Hmm, how does effective_cach_size correspond with it? Shouldn't a high
 effective_cache_size have a similar effect?

It seems reasonable to suppose that effective_cache_size ought to be
used as a number indicating how much stuff would hang around from
query to query.  Right now it's not used that way...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] creating of temporary table takes very long

2006-04-18 Thread Jim C. Nasby
You might try rewriting the coalesces into a row comparison...

WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...)

See
http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13408

Note that the docs only show IS DISTINCT FROM, so you might have to do

WHERE NOT row(...) IS DISTINCT FROM row(...)

On Tue, Apr 18, 2006 at 09:13:04AM -0700, Sriram Dandapani wrote:
 Thx Tom
 
 I guess I have to abandon the bulk update. The columns in the where
 clause comprise 80% of the table columns..So indexing all may not help.
 The target table will have on average 60-180 million rows.
 
 I will attempt the in instead of exist and let you know the result
 
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, April 18, 2006 9:10 AM
 To: Sriram Dandapani
 Cc: Pgsql-Performance (E-mail)
 Subject: Re: [PERFORM] creating of temporary table takes very long 
 
 Sriram Dandapani [EMAIL PROTECTED] writes:
  Got an explain analyze output..Here it is
  Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
  width=136) (actual time=2.345..648070.474 rows=22001 loops=1)
Filter: (subplan)
SubPlan
  -  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
  (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
  rows=1 loops=22001)
Recheck Cond: (($0 = firstoccurrence) AND ($1 =
 sentryid_id)
  AND ($2 = node_id))
Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
  COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
  0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
  (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)
-  Bitmap Index Scan on chkpfw_tr_hr_idx1
  (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
  rows=33026 loops=22001)
  Index Cond: (($0 = firstoccurrence) AND ($1 =
  sentryid_id) AND ($2 = node_id))
  Total runtime: 648097.800 ms
 
 That's probably about as good a query plan as you can hope for given
 the way the query is written.  Those COALESCE comparisons are all
 unindexable (unless you make functional indexes on the COALESCE
 expressions).  You might get somewhere by converting the EXISTS
 to an IN, though.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Multicolumn order by

2006-04-18 Thread Theo Kramer
Hi

Apologies if this has already been raised...

PostgreSQL 8.1.3 and prior versions. Vacuum done.

Assuming a single table with columns named c1 to cn and a requirement to
select from a particular position in multiple column order. 

The column values in my simple example below denoted by 'cnv' a typical
query would look as follows

select * from mytable where
  (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or
  (c1 = 'c1v' and c2  'c2v') or
  (c1  'c1v')
  order by c1, c2, c3;

In real life with the table containing many rows (9 Million) and
a single multicolumn index on the required columns existing I get the
following

explain analyse
 SELECT
 tran_subledger,
 tran_subaccount,
 tran_mtch,
 tran_self,
 tran_Rflg FROM tran
WHERE ((tran_subledger = 2 AND tran_subaccount = 'ARM '
AND tran_mtch = 0 AND tran_self = 0 )
OR (tran_subledger = 2 AND tran_subaccount = 'ARM ' AND
tran_mtch  0 )
OR (tran_subledger = 2 AND tran_subaccount  'ARM ' )
OR (tran_subledger  2 ))
ORDER BY tran_subledger,
 tran_subaccount,
 tran_mtch,
 tran_self
limit 10;

 Limit  (cost=0.00..25.21 rows=10 width=36) (actual
time=2390271.832..2390290.305 rows=10 loops=1)
   -  Index Scan using tran_mtc_idx on tran  (cost=0.00..13777295.04
rows=5465198 width=36) (actual time=2390271.823..2390290.252 rows=10
loops=1)
 Filter: (((tran_subledger = 2) AND (tran_subaccount = 'ARM
'::bpchar) AND (tran_mtch = 0) AND (tran_self = 0)) OR ((tran_subledger
= 2) AND (tran_subaccount = 'ARM '::bpchar) AND
(tran_mtch  0)) OR ((tran_subledger = 2) AND (tran_subaccount 
'ARM '::bpchar)) OR (tran_subledger  2))
 Total runtime: 2390290.417 ms

Any suggestions/comments/ideas appreciated.
-- 
Regards
Theo


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] creating of temporary table takes very long

2006-04-18 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 You might try rewriting the coalesces into a row comparison...
 WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...)

That would be notationally nicer, but no help performance-wise; I'm
fairly sure that IS DISTINCT doesn't get optimized in any fashion
whatsoever :-(

What might be worth trying is functional indexes on the COALESCE(foo,0)
expressions.  Or if possible, consider revising your data schema to
avoid using NULLs in a way that requires assuming that NULL = NULL.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Multicolumn order by

2006-04-18 Thread Tom Lane
Theo Kramer [EMAIL PROTECTED] writes:
 select * from mytable where
   (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or
   (c1 = 'c1v' and c2  'c2v') or
   (c1  'c1v')
   order by c1, c2, c3;

Yeah ... what you really want is the SQL-spec row comparison operator

select ... where (c1,c2,c3) = ('c1v','c2v','c3v') order by c1,c2,c3;

This does not work properly in any current PG release :-( but it does
work and is optimized well in CVS HEAD.  See eg this thread
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00209.php

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Multicolumn order by

2006-04-18 Thread Jim C. Nasby
Assuming stats are accurate, you're reading through 5.5M index rows in
order to run that limit query. You didn't say what the index was
actually on, but you might want to try giving each column it's own
index. That might make a bitmap scan feasable.

I know this doesn't help right now, but 8.2 will also allow you to do
this using a row comparitor. You might want to compile cvs HEAD and see
how that does with this query (specifically if using a row comparitor
performs better than the query below).

On Wed, Apr 19, 2006 at 12:07:55AM +0200, Theo Kramer wrote:
 Hi
 
 Apologies if this has already been raised...
 
 PostgreSQL 8.1.3 and prior versions. Vacuum done.
 
 Assuming a single table with columns named c1 to cn and a requirement to
 select from a particular position in multiple column order. 
 
 The column values in my simple example below denoted by 'cnv' a typical
 query would look as follows
 
 select * from mytable where
   (c1 = 'c1v' and c2 = 'c2v' and c3 = 'c3v') or
   (c1 = 'c1v' and c2  'c2v') or
   (c1  'c1v')
   order by c1, c2, c3;
 
 In real life with the table containing many rows (9 Million) and
 a single multicolumn index on the required columns existing I get the
 following
 
 explain analyse
  SELECT
  tran_subledger,
  tran_subaccount,
  tran_mtch,
  tran_self,
  tran_Rflg FROM tran
 WHERE ((tran_subledger = 2 AND tran_subaccount = 'ARM '
 AND tran_mtch = 0 AND tran_self = 0 )
 OR (tran_subledger = 2 AND tran_subaccount = 'ARM ' AND
 tran_mtch  0 )
 OR (tran_subledger = 2 AND tran_subaccount  'ARM ' )
 OR (tran_subledger  2 ))
 ORDER BY tran_subledger,
  tran_subaccount,
  tran_mtch,
  tran_self
 limit 10;
   
   
  Limit  (cost=0.00..25.21 rows=10 width=36) (actual
 time=2390271.832..2390290.305 rows=10 loops=1)
-  Index Scan using tran_mtc_idx on tran  (cost=0.00..13777295.04
 rows=5465198 width=36) (actual time=2390271.823..2390290.252 rows=10
 loops=1)
  Filter: (((tran_subledger = 2) AND (tran_subaccount = 'ARM
 '::bpchar) AND (tran_mtch = 0) AND (tran_self = 0)) OR ((tran_subledger
 = 2) AND (tran_subaccount = 'ARM '::bpchar) AND
 (tran_mtch  0)) OR ((tran_subledger = 2) AND (tran_subaccount 
 'ARM '::bpchar)) OR (tran_subledger  2))
  Total runtime: 2390290.417 ms
 
 Any suggestions/comments/ideas appreciated.
 -- 
 Regards
 Theo
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Actually, if you run with stats_block_level turned on you have a
  first-order approximation of what is and isn't cached.
 
 Only if those stats decayed (pretty fast) with time; which they don't.

Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 06:26:48PM -0400, Tom Lane wrote:
 Markus Schaber [EMAIL PROTECTED] writes:
  Hmm, how does effective_cach_size correspond with it? Shouldn't a high
  effective_cache_size have a similar effect?
 
 It seems reasonable to suppose that effective_cache_size ought to be
 used as a number indicating how much stuff would hang around from
 query to query.  Right now it's not used that way...

Maybe it would be a reasonable first pass to have estimators calculate
the cost if a node found everything it wanted in cache and then do a
linear interpolation between that and the costs we currently come up
with? Something like pg_class.relpages / sum(pg_class.relpages) would
give an idea of how much of a relation is likely to be cached, which
could be used for the linear interpolation.

Of course having *any* idea as to how much of a relation was actually in
shared_buffers (or better yet, the OS cache) would be a lot more
accurate, but this simple method might be a good enough first-pass.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 An easy first approach would be to add a user tunable cache probability
 value to each index (and possibly table) between 0 and 1. Then simply
 multiply random_page_cost with (1-that value) for each scan.

That's not the way you'd need to use it.  But on reflection I do think
there's some merit in a cache probability parameter, ranging from zero
(giving current planner behavior) to one (causing the planner to assume
everything is already in cache from prior queries).  We'd have to look
at exactly how such an assumption should affect the cost equations ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Planner doesn't chose Index - (slow select)

2006-04-18 Thread patrick keshishian
Hi all,

I've been struggling with some performance issues with certain
SQL queries.  I was prepping a long-ish overview of my problem
to submit, but I think I'll start out with a simple case of the
problem first, hopefully answers I receive will help me solve
my initial issue.

Consider the following two queries which yield drastically different
run-time:

db=# select count(*) from pk_c2 b0 where b0.offer_id=7141;
 count
---
 1
(1 row)
Time: 5139.004 ms

db=# select count(*) from pk_c2 b0 where b0.pending=true and b0.offer_id=7141;
 count
---
 1
(1 row)
Time: 1.828 ms


That's 2811 times faster!

Just to give you an idea of size of pk_c2 table:

db=# select count(*) from pk_c2 ;
  count
-
 2158094
(1 row)
Time: 5275.782 ms

db=# select count(*) from pk_c2 where pending=true;
 count
---
51
(1 row)
Time: 5073.699 ms



db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141;
QUERY PLAN
---
 Aggregate  (cost=44992.78..44992.78 rows=1 width=0)
   -  Seq Scan on pk_c2 b0  (cost=0.00..44962.50 rows=12109 width=0)
 Filter: (offer_id = 7141)
(3 rows)
Time: 1.350 ms

db=# explain select count(*) from pk_c2 b0 where b0.pending=true and
b0.offer_id=7141;
QUERY PLAN

 Aggregate  (cost=45973.10..45973.10 rows=1 width=0)
   -  Index Scan using pk_boidx on pk_c2 b0  (cost=0.00..45973.09
rows=1 width=0)
 Index Cond: (offer_id = 7141)
 Filter: (pending = true)
(4 rows)
Time: 1.784 ms



The table has indexes for both 'offer_id' and '(pending=true)':

Indexes:
pk_boidx btree (offer_id)
pk_bpidx btree (((pending = true)))

So, why would the planner chose to use the index on the second query
and not on the first?


Note that I am able to fool the planner into using an Index scan
on offer_id by adding a silly new condition in the where clause of
the first form of the query:


db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid  1;
QUERY PLAN
---
 Aggregate  (cost=45983.19..45983.19 rows=1 width=0)
   -  Index Scan using pk_boidx on pk_c2 b0  (cost=0.00..45973.09
rows=4037 width=0)
 Index Cond: (offer_id = 7141)
 Filter: (oid  1::oid)
(4 rows)
Time: 27.301 ms

db=# select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid  1;
 count
---
 1
(1 row)
Time: 1.900 ms

What gives?

This seems just too hokey for my taste.

--patrick



db=# select version();
 version
-
 PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Planner doesn't chose Index - (slow select)

2006-04-18 Thread Tom Lane
patrick keshishian [EMAIL PROTECTED] writes:
 I've been struggling with some performance issues with certain
 SQL queries.  I was prepping a long-ish overview of my problem
 to submit, but I think I'll start out with a simple case of the
 problem first, hopefully answers I receive will help me solve
 my initial issue.

Have you ANALYZEd this table lately?

 db=# select count(*) from pk_c2 b0 where b0.offer_id=7141;
  count
 ---
  1
 (1 row)

The planner is evidently estimating that there are 12109 such rows,
not 1, which is the reason for its reluctance to use an indexscan.
Generally the only reason for it to be off that far on such a simple
statistical issue is if you haven't updated the stats in a long time.
(If you've got a really skewed data distribution for offer_id, you
might need to raise the statistics target for it.)

 The table has indexes for both 'offer_id' and '(pending=true)':

 Indexes:
 pk_boidx btree (offer_id)
 pk_bpidx btree (((pending = true)))

The expression index on (pending = true) won't do you any good,
unless you spell your query in a weird way like
... WHERE (pending = true) = true
I'd suggest a plain index on pending instead.

 db=# select version();
  PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6

You might want to think about an update, too.  7.4 is pretty long in the
tooth.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SELECT FOR UPDATE performance is bad

2006-04-18 Thread Christopher Kings-Lynne

Suppose you have a table codes :
(
game_idINT,
codeTEXT,
usedBOOL NOT NULL DEFAULT 'f',
prize...
...
PRIMARY KEY (game_id, code)
)

Just UPDATE codes SET used='t' WHERE used='f' AND game_id=... AND 
code=...


Then check the rowcount : if one row was updated, the code was not 
used yet. If no row was updated, the code either did not exist, or was 
already used.


You can use a stored procedure with exceptions no?

Try this:

http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

Chris



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Blocks read for index scans

2006-04-18 Thread Terje Elde

Jim C. Nasby wrote:
That said, it's the transactions against disk that typically matter.  On 
FreeBSD, you can get an impression of this using 'systat -vmstat', and 
watch the KB/t column for your drives.



On a related note, you know of any way to determine the breakdown
between read activity and write activity on FreeBSD? vmstat, systat,
iostat all only return aggregate info. :(
  



Can't think of a right way to do this ATM, but for a lab-type setup to 
get an idea, you could set up a gmirror volume, then choose a balancing 
algorithm to only read from one of the disks.  The effect should be that 
writes go to both, while reads only go to one.  Activity on the 
write-only disk would give you an idea of the write activity, and 
(read/write disk - write-only disk) would give you an idea of the 
reads.  I have to admit though, seems like quite a bit of hassle, and 
I'm not sure how good the numbers would be, given that at least some of 
the info (KB/transaction) are totals, it'd require a bit of math to get 
decent numbers.  But at least it's something.


Terje



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] mergehashloop

2006-04-18 Thread Mark Kirkwood

Jim C. Nasby wrote:

On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

Actually, if you run with stats_block_level turned on you have a
first-order approximation of what is and isn't cached.

Only if those stats decayed (pretty fast) with time; which they don't.


Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...


contrib/pg_buffercache will tell you this - what buffers from what 
relation are in shared_buffers (if you want to interrogate the os file 
buffer cache, that's a different story - tho I've been toying with doing 
a utility for  Freebsd that would do this).


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Blocks read for index scans

2006-04-18 Thread Jim C. Nasby
On Wed, Apr 19, 2006 at 04:35:11AM +0200, Terje Elde wrote:
 Jim C. Nasby wrote:
 That said, it's the transactions against disk that typically matter.  On 
 FreeBSD, you can get an impression of this using 'systat -vmstat', and 
 watch the KB/t column for your drives.
 
 
 On a related note, you know of any way to determine the breakdown
 between read activity and write activity on FreeBSD? vmstat, systat,
 iostat all only return aggregate info. :(
   
 
 
 Can't think of a right way to do this ATM, but for a lab-type setup to 
 get an idea, you could set up a gmirror volume, then choose a balancing 
 algorithm to only read from one of the disks.  The effect should be that 
 writes go to both, while reads only go to one.  Activity on the 
 write-only disk would give you an idea of the write activity, and 
 (read/write disk - write-only disk) would give you an idea of the 
 reads.  I have to admit though, seems like quite a bit of hassle, and 
 I'm not sure how good the numbers would be, given that at least some of 
 the info (KB/transaction) are totals, it'd require a bit of math to get 
 decent numbers.  But at least it's something.

Yeah... not gonna happen...

It's completely mind-boggling that FBSD doesn't track writes and reads
seperately.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Wed, Apr 19, 2006 at 04:47:40PM +1200, Mark Kirkwood wrote:
 Jim C. Nasby wrote:
 On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
 Actually, if you run with stats_block_level turned on you have a
 first-order approximation of what is and isn't cached.
 Only if those stats decayed (pretty fast) with time; which they don't.
 
 Good point. :/ I'm guessing there's no easy way to see how many blocks
 for a given relation are in shared memory, either...
 
 contrib/pg_buffercache will tell you this - what buffers from what 
 relation are in shared_buffers (if you want to interrogate the os file 

So theoretically with that code we could make the cost estimator
functions more intelligent about actual query costs. Now, how you'd
actually see how those estimates improved...

 buffer cache, that's a different story - tho I've been toying with doing 
 a utility for  Freebsd that would do this).

Well, the problem is that I doubt anything that OS-specific would be
accepted into core. What we really need is some method that's
OS-agnostic...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 Jim C. Nasby wrote:
 Good point. :/ I'm guessing there's no easy way to see how many blocks
 for a given relation are in shared memory, either...

 contrib/pg_buffercache will tell you this -

I think the key word in Jim's comment was easy, ie, cheap.  Grovelling
through many thousands of buffers to count the matches to a given
relation doesn't sound appetizing, especially not if it gets done over
again several times during each query-planning cycle.  Trying to keep
centralized counts somewhere would be even worse (because of locking/
contention issues).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Blocks read for index scans

2006-04-18 Thread Mark Kirkwood

Jim C. Nasby wrote:



Yeah... not gonna happen...

It's completely mind-boggling that FBSD doesn't track writes and reads
seperately.


'iostat' does not tell you this, but 'gstat' does - its the geom 
system monitor (a bit annoying that the standard tool is lacking in this 
regard...).


Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] mergehashloop

2006-04-18 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:

Jim C. Nasby wrote:

Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...



contrib/pg_buffercache will tell you this -


I think the key word in Jim's comment was easy, ie, cheap.  Grovelling
through many thousands of buffers to count the matches to a given
relation doesn't sound appetizing, especially not if it gets done over
again several times during each query-planning cycle.  Trying to keep
centralized counts somewhere would be even worse (because of locking/
contention issues).



Yeah - not sensible for a transaction oriented system - might be ok for 
DSS tho.


Cheers

mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings