Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Joseph Shraibman
On 04/27/2011 04:32 PM, Robert Haas wrote:
 In the first case, PostgreSQL evidently thinks that using the indexes
 will be slower than just ignoring them.  You could find out whether
 it's right by trying it with enable_seqscan=off.

My point is that this is just a problem with inherited tables.  It
should be obvious to postgres that few rows are being returned, but in
the inherited tables case it doesn't use indexes.  This was just an
example.  In a 52 gig table I have a select id from table limit 1 order
by id desc returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] planner and worst case scenario

2004-06-30 Thread Joseph Shraibman
Here is my query, that returns one row:
SELECT f1, f2,(SELECT dfield FROM d WHERE d.ukey = f1) FROM m WHERE 
status IN(2) AND jid IN(17674) ORDER BY pkey DESC LIMIT 25 OFFSET 0;

Here was the really bad plan chosen.  This didn't come back for a long 
while and had to be cancelled:

  QUERY PLAN
--
 Limit  (cost=0.00..10493.05 rows=25 width=118)
   -  Index Scan Backward using m_pkey on m  (cost=0.00..1883712.97 
rows=4488 width=118)
 Filter: ((status = 2) AND (jid = 17674))
 SubPlan
   -  Index Scan using d_pkey on d  (cost=0.00..3.83 rows=1 
width=24)
 Index Cond: (ukey = $0)
(6 rows)

After an ANALYZE the plan was much better:
  QUERY PLAN
--
 Limit  (cost=22060.13..22060.19 rows=25 width=119)
   -  Sort  (cost=22060.13..22067.61 rows=2993 width=119)
 Sort Key: serial
 -  Index Scan using m_jid_uid_key on m  (cost=0.00..21887.32 
rows=2993 width=119)
   Index Cond: (jid = 17674)
   Filter: (status = 2)
   SubPlan
 -  Index Scan using d_pkey on d  (cost=0.00..3.83 
rows=1 width=24)
   Index Cond: (ukey = $0)
(9 rows)

The thing is since there was only 1 row in the (very big) table with 
that jid, the ANALYZE didn't
include that row in the stats table, so I'm figuring there was a small 
random change that made it
choose the better query.

Doing: ALTER TABLE m ALTER jid SET STATISTICS 1000;
produce a much more accurate row guess:
  QUERY PLAN
--
 Limit  (cost=2909.65..2909.71 rows=25 width=115)
   -  Sort  (cost=2909.65..2910.64 rows=395 width=115)
 Sort Key: serial
 -  Index Scan using m_jid_uid_key on m  (cost=0.00..2892.61 
rows=395 width=115)
   Index Cond: (jbid = 17674)
   Filter: (status = 2)
   SubPlan
 -  Index Scan using d_pkey on d  (cost=0.00..3.83 
rows=1 width=24)
   Index Cond: (userkey = $0)
(9 rows)

It seems the problem is that the pg planner goes for the job with the 
lowest projected time,
but ignores the worst case scenario.

I think the odds of this problem happening again are lower since the SET 
STATISTICS, but I don't know what triggered the really bad plan in the 
first place.  Did pg think that because so many rows would match the 
limit would be filled up soon, so that a more accurate and lower 
assumption would cause it to choose the better plan?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL performance in simple queries

2004-05-20 Thread Joseph Shraibman
Tom Lane wrote:
: * JDBC
With JDBC out of the core, I'm not sure why we still have a JDBC section
in the core TODO.
Speaking of which why is the jdbc site so hard to find?  For that matter 
the new foundry can only be found through the news article on the front 
page.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] shared buffer size on linux

2004-05-19 Thread Joseph Shraibman
See http://kerneltrap.org/node/view/3148, about 40% down, under the 
header 2.6 -aa patchset, object-based reverse mapping.  Does this mean 
that the more shared memory the bigger the potential for a swap storm?

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL performance in simple queries

2004-05-19 Thread Joseph Shraibman
Neil Conway wrote:
PostgreSQL ( 7.5) won't consider using an indexscan when the predicate 
involves an integer literal and the column datatype is int2 or int8.
Is this fixed for 7.5?  It isn't checked off on the TODO list at
http://developer.postgresql.org/todo.php
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Using LIKE expression problem..

2004-05-18 Thread Joseph Shraibman
Use the text_pattern_ops operator when creating the index, see:
http://www.postgresql.org/docs/7.4/static/indexes-opclass.html
Michael Ryan S. Puncia wrote:
Sorry .. I am a newbie and I don't know :( 
How can I know that I am in C locale ?
How can I change my database to use C locale?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Wednesday, May 12, 2004 3:59 PM
To: Michael Ryan S. Puncia
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using LIKE expression problem..
Are you in a non-C locale?
Chris
Michael Ryan S. Puncia wrote:

Yes , I already do that but the same result .. LIKE uses seq scan
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Wednesday, May 12, 2004 2:48 PM
To: Michael Ryan S. Puncia
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using LIKE expression problem..

In the query plan ..it uses seq scan rather than index scan .. why ? I 
have index on lastname, firtname.

Have you run VACUUM ANALYZE; on the table recently?
Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
   http://www.postgresql.org/docs/faqs/FAQ.html

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] cache table

2004-05-04 Thread Joseph Shraibman
scott.marlowe wrote:
I think you might be interested in materialized views.  You could create 
this as a materialized view which should be very fast to just select * 
from.
That seems to be the count table I envisioned.  It just hides the 
details for me.  It still has the problems of an extra UPDATE every time 
the data table is updated and generating a lot of dead tuples.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] cache table

2004-05-03 Thread Joseph Shraibman
I have a big table with some int fields.  I frequently need to do 
queries like:

SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2;
The problem is that this is slow and frequently requires a seqscan. I'd 
like to cache the results in a second table and update the counts with 
triggers, but this would a) require another UPDATE for each 
INSERT/UPDATE which would slow down adding and updating of data and b) 
produce a large amount of dead rows for vacuum to clear out.

It would also be nice if this small table could be locked into the pg 
cache somehow.  It doesn't need to store the data on disk because the 
counts can be generated from scratch?

So what is the best solution to this problem?  I'm sure it must come up 
pretty often.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Insert only tables and vacuum performance

2004-04-30 Thread Joseph Shraibman
Tom Lane wrote:
Joseph Shraibman [EMAIL PROTECTED] writes:
I have a table that is never updated, only INSERTED into. Is there a way 
I can prevent vacuum wasting time on this table

What makes you think vacuum is wasting much time on this table?  AFAICS
it will only update any unfixed hint bits ...
			regards, tom lane
INFO:  elog: found 0 removable, 12869411 nonremovable row versions in 
196195 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 5 unused item pointers.
0 pages are entirely empty.
CPU 31.61s/4.53u sec elapsed 1096.83 sec.

It took 1096.83 seconds, and what did it accomplish?  And what are hint 
bits?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] analyzer/planner and clustered rows

2004-04-29 Thread Joseph Shraibman
How does the analyzer/planner deal with rows clustered together?  Does 
it just assume that if this col is clustered on then the actual data 
will be clustered?  What if the data in the table happens to be close 
together because it was inserted together originally?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Insert only tables and vacuum performance

2004-04-29 Thread Joseph Shraibman
Joseph Shraibman wrote:
I have a table that is never updated, only INSERTED into. Is there a way 
I can prevent vacuum wasting time on this table besides vacuuming each 
table in the db by itself and omitting this table?

How feasable would it be to have a marker somewhere in pg that is 
updated since last vacuum that would be cleared when vacuum runs, and 
if set vacuum will ignore that table?
Or even better an offset into the datatable for the earliest deleted 
row, so if you have a table where you update the row shortly after 
insert and then never touch it vacuum can skip most of the table 
(inserts are done at the end of the table, right?)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with query plan inconsistencies

2004-03-23 Thread Joseph Shraibman
I'm going to ask because someone else surely will:

Do you regularily vacuum/analyze the database?

Woody Woodring wrote:
Hello,

I am using postgres 7.4.2 as a backend for geocode data for a mapping
application.  My question is why can't I get a consistent use of my indexes
during a query, I tend to get a lot of seq scan results.
I use a standard query:

SELECT lat, long, mac, status FROM (
   SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN
1 ELSE -1 END 
  as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE
boxtype='d'
)AS FOO WHERE (long=X1) AND (long=X2) AND (lat=Y1) AND (lat=Y2)

Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing
area.
QUERY PLAN #1  #2 are from when I get a view from 10 miles out, sometimes
it uses the index(#1) and most of the time not(#2).  I do run into plans
that seq scan both sides of the join.
QUERY PLAN #3 is when I view from 5 miles out, and I have much greater
chance of getting index scans ( about 90% of the time).
I have listed information about the database below.

Cable_billing ~500,000 rows updated once per day
Davic  ~500,000 rows, about 100 rows update per minute
Any info or suggestions would be appreciated.

Woody

twc-ral-overview=# \d cable_billing;
 Table public.cable_billing
 Column  |  Type  | Modifiers 
-++---
 cable_billingid | integer| not null
 mac | macaddr| not null
 account | integer| 
 number  | character varying(10)  | 
 address | character varying(200) | 
 region  | character varying(30)  | 
 division| integer| 
 franchise   | integer| 
 node| character varying(10)  | 
 lat | numeric| 
 long| numeric| 
 trunk   | character varying(5)   | 
 ps  | character varying(5)   | 
 fd  | character varying(5)   | 
 le  | character varying(5)   | 
 update  | integer| 
 boxtype | character(1)   | 
Indexes: cable_billing_pkey primary key btree (mac),
 cable_billing_account_index btree (account),
 cable_billing_lat_long_idx btree (lat, long),
 cable_billing_node_index btree (node),
 cable_billing_region_index btree (region)

twc-ral-overview=# \d davic
Table public.davic
 Column  | Type  | Modifiers 
-+---+---
 davicid | integer   | not null
 mac | macaddr   | not null
 source  | character varying(20) | 
 status  | smallint  | 
 updtime | integer   | 
 type| character varying(10) | 
 avail1  | integer   | 
Indexes: davic_pkey primary key btree (mac)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long=-78.70723462816063) AND
(long=-78.53096764204116) AND (lat=35.5741118787) AND
(lat=35.66366331376857);
QUERY PLAN #1


-
 Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
time=0.555..5095.434 rows=3224 loops=1)
   -  Index Scan using cable_billing_lat_long_idx on cable_billing
(cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931
rows=3224 loops=1)
 Index Cond: ((lat = 35.5741118787) AND (lat =
35.66366331376857) AND (long = -78.70723462816063) AND (long =
-78.53096764204116))
 Filter: (boxtype = 'd'::bpchar)
   -  Index Scan using davic_pkey on davic  (cost=0.00..6.01 rows=1
width=8) (actual time=1.476..1.480 rows=1 loops=3224)
 Index Cond: (outer.mac = davic.mac)
 Total runtime: 5100.028 ms
(7 rows)


twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long=-78.87878592206046) AND
(long=-78.70220280717479) AND (lat=35.71703190638861) AND
(lat=35.80658335998006);
QUERY PLAN #2


---
 Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
time=0.559..17387.427 rows=19997 loops=1)
   -  Seq Scan on cable_billing  (cost=0.00..20837.76 rows=9223 width=32)
(actual time=0.290..7117.799 

[PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (u.status = 3 ) AND NOT u.boolfield ;
  QUERY PLAN
--
 Aggregate  (cost=45707.84..45707.84 rows=1 width=4)
   -  Nested Loop  (cost=0.00..45707.16 rows=273 width=4)
 -  Seq Scan on usertable u  (cost=0.00..44774.97 rows=272 
width=4)
   Filter: ((pkey = 260) AND (status = 3) AND (NOT boolfield))
 -  Index Scan using d_pkey on d  (cost=0.00..3.41 rows=1 width=4)
   Index Cond: (d.ukey = outer.ukey)

explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (d.status = 3 ) AND NOT u.boolfield ;

  QUERY PLAN
--
 Aggregate  (cost=28271.38..28271.38 rows=1 width=4)
   -  Nested Loop  (cost=0.00..28271.38 rows=1 width=4)
 -  Seq Scan on d  (cost=0.00..28265.47 rows=1 width=4)
   Filter: (status = 3)
 -  Index Scan using u_pkey on u  (cost=0.00..5.89 rows=1 width=4)
   Index Cond: ((outer.ukey = u.ukey) AND (u.pkey = 260))
   Filter: (NOT boolfield)
explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 
AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;

  QUERY PLAN
---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
   -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
 Hash Cond: (outer.ukey = inner.ukey)
 Join Filter: ((inner.status = 3) OR (outer.status = 3))
 -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
   Filter: ((pkey = 260) AND (NOT boolfield))
 -  Hash  (cost=25682.98..25682.98 rows=1032998 width=6)
   -  Seq Scan on d  (cost=0.00..25682.98 rows=1032998 
width=6)

... so what do I do?  It would be a real pain to rewrite this query to 
run twice and add the results up, especially since I don't always know 
beforehand when it will be faster based on different values to the query.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Richard Huxton wrote:
On Thursday 18 March 2004 21:21, Joseph Shraibman wrote:

explain
SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260
AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;
  QUERY PLAN
---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
   -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
 Hash Cond: (outer.ukey = inner.ukey)
 Join Filter: ((inner.status = 3) OR (outer.status = 3))
 -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
   Filter: ((pkey = 260) AND (NOT boolfield))


There's your problem. For some reason it thinks it's getting 407,824 rows back 
from that filtered seq-scan. I take it that pkey is a primary-key and is 
defined as being UNIQUE? If you actually did have several hundred thousand 
matches then a seq-scan might be sensible.

No, pkey is not the primary key in this case. The number of entries in u 
that have pkey 260 and not boolfield is 344706. The number of those that 
have status == 3 is 7.  To total number of entries in d that have status 
 == 3 is 4.

I'd start by analyze-ing the table in question,
Is done every night.

The problem is that it seems the planner doesn't think to do the 
different parts of the OR seperately and then combine the answers.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Tom Lane wrote:
Joseph Shraibman [EMAIL PROTECTED] writes:

No, pkey is not the primary key in this case. The number of entries in u 
that have pkey 260 and not boolfield is 344706.


... and every one of those rows *must* be included in the join input,
*If* you use one big join in the first place.  If postgres ran the query 
to first get the values with status == 3 from u, then ran the query to 
get the entries from d, then combined them, the result would be the same 
but the output faster.  Instead it is doing seq scans on both tables and 
doing an expensive join that returns only a few rows.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Joseph Shraibman
Stephan Szabo wrote:
On Mon, 22 Mar 2004, Joseph Shraibman wrote:


Tom Lane wrote:

Joseph Shraibman [EMAIL PROTECTED] writes:


No, pkey is not the primary key in this case. The number of entries in u
that have pkey 260 and not boolfield is 344706.


... and every one of those rows *must* be included in the join input,
*If* you use one big join in the first place.  If postgres ran the query
to first get the values with status == 3 from u, then ran the query to
get the entries from d, then combined them, the result would be the same
but the output faster.  Instead it is doing seq scans on both tables and


Well, you have to be careful on the combination to not give the wrong
answers if there's a row with u.status=3 that matches a row d.status=3.
Right you would have to avoid duplicates.  The existing DISTINCT code 
should be able to handle that.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Postgresql on SAN

2004-03-13 Thread Joseph Shraibman
Josh Berkus wrote:


See above.   Also keep in mind that PostgreSQL's use of I/O should improve 
100% in version 7.5.

Really?  What happened?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])