date:20120911

Re: [PERFORM] : PostgreSQL Index behavior

2012-09-11 Thread Venkat Balaji

Thank you Jeff !

My comments are inline.

> explain  SELECT tv.short_code, tv.chn as pkg_subscription_chn,
> >tv.vert as pkg_vert, ubs.campaign_id as campaign,
> > 'none'::varchar as referer,
> >CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
> > count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
> >FROM campaign_base ubs
> >JOIN tab_current_day_v2 tv
> >ON ubs.ubs_seq_id = tv.ubs_seq_id
> >  AND tv.dt = CAST('2012-09-08' AS DATE)
> >  GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id,
> vn;
> ...
> >
> > The above plan shows "seq scan" on tab_current_day_v2 table, though
> there is
> > an index on "ubs_seq_id" column which is an unique column.
> >
> > Can anyone please help us understand, why PostgreSQL optimizer is not
> > prioritizing the unique column and hitting ubs_seq_id_idx Index here ?
>
> The query where clause does not specify a constant value for
> ubs_seq_id.  So it is likely that the only way to use that index would
> be to reverse the order of the nested loop and seq scan the other
> table.  Is there any reason to think that doing that would be faster?
>

I believe, I missed an important point here. Yes, since the constant value
is not provided for ubs_seq_id, Index scan is not a prime preference. Makes
sense. Further analysis is explained below.


> > Later -
> >
> > We have created composite Index on "dt" (one distinct value) and
> > "ubs_seq_id" (no duplicate values) and the index has been picked up.
>
> Postgres seems to think that "dt" has no duplicate values, the
> opposite of having one distinct value.
> That is based on the estimates given in the explain plan, that teh seq
> scan will return only one row after the filter on Filter: "(dt =
> '2012-09-08'::date)".   This does seem to conflict with what you
> report from pg_stats, but I'm not familiar with that view, and you
> haven't told us what version of pgsql you are using.
>

We are using PostgreSQL-9.0.1.

Yes, "dt" has one distinct value all the time is generated on daily basis.
"2012-09-08" is an non-existent value, so, Postgres seems to think there
are no duplicates.

If i pass on the value which is existing in the table "2012-09-11",
PostgreSQL optimizer is picking up "Seq Scan" (what ever Index is existing).

In our scenario, we cannot expect an Index scan to happen, because I
believe, following are the reasons -

ubs_seq_id column in campaign_base table has 1.2 m rows -- all distinct
ubs_seq_id column in tab_current_day_v2 table has 1.9 m rows -- all distinct
dt has only 1 distinct value.

All being used with AND operator, extracted rows will be minimum 1.2 m. So,
I believe, "seq scan" is the best choice PG is opting for.

I got the point. Thanks !

Regards,
Venkat

-- 
 

DISCLAIMER:

Please note that this message and any attachments may contain confidential 
and proprietary material and information and are intended only for the use 
of the intended recipient(s). If you are not the intended recipient, you 
are hereby notified that any review, use, disclosure, dissemination, 
distribution or copying of this message and any attachments is strictly 
prohibited. If you have received this email in error, please immediately 
notify the sender and delete this e-mail , whether electronic or printed. 
Please also note that any views, opinions, conclusions or commitments 
expressed in this message are those of the individual sender and do not 
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-11 Thread Tom Lane

Bill Martin  writes:
> IÂ´ve created following table which contains one million records.
> ...

> "Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual 
> time=0.255..0.255 rows=0 loops=1)"
> "  ->  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 
> rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)"
> "Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ 
> '''asdasdadas'':*'::tsquery)"
> "->  Bitmap Index Scan on ft_simple_core_content_content_idx  
> (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 
> loops=1)"
> "  Index Cond: (to_tsvector('simple'::regconfig, content) @@ 
> '''asdasdadas'':*'::tsquery)"
> "Total runtime: 0.277 ms"

> Is there any posibility to tune up the performance even if the limit is only 
> 10?

The problem is the way-off rowcount estimate (20011 rows when it's
really none); with a smaller estimate there, the planner wouldn't decide
to switch to a seqscan.

Did you take the advice to increase the column's statistics target?
Because 20011 looks suspiciously close to the default estimate that
tsquery_opr_selec will fall back on if it hasn't got enough stats
to come up with a trustworthy estimate for a *-pattern query.

(I think there are probably some bugs in tsquery_opr_selec's estimate
for this, as I just posted about on pgsql-hackers.  But this number
looks like you're not even getting to the estimation code, for lack
of enough statistics entries.)

The other thing that seems kind of weird here is that the cost estimate
for the bitmap index scan seems out of line even given the
2-entries-to-fetch estimate.  I'd have expected a cost estimate of a
few hundred for that, not 1.  Perhaps this index is really bloated,
and it's time to REINDEX it?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread Andrew Dunstan



On 09/11/2012 09:55 AM, AI Rumman wrote:

I added the excel file for locks data.
I was surprised to see that while I was updating a single column value 
for all records in a tables, all indexes are locked by the server.



Any ALTER TABLE command locks the whole table in ACCESS EXCLUSIVE mode, 
indexes included. See the description of ACCESS EXCLUSIVE lock at 



cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread hubert depesz lubaczewski

On Tue, Sep 11, 2012 at 08:04:06PM +0600, AI Rumman wrote:
> Table size is 1186 MB.

if it takes long, it just means that your IO is slow.

> I split the command in three steps as you said, but the result same during
> the update operation.

three? I was showing four steps, and one of them is usually consisting
hundreds, if not thousands, of queries.

> One more thing, I have just restored the db from dump and analyzed it and
> I am using Postgresql 9.1 with 3 GB Ram with dual core machine.

so it looks like your IO channel is slow.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread hubert depesz lubaczewski

On Tue, Sep 11, 2012 at 07:55:24PM +0600, AI Rumman wrote:
> I added the excel file for locks data.

well, it worked, but why didn't you just make it text file, in notepad or
something like this?

> I was surprised to see that while I was updating a single column value for
> all records in a tables, all indexes are locked by the server.

alter table is not locked (At least looking at the pg_locks data you
showed).

this means - it just takes long time.

Please do:
select pg_total_relation_size('entity');
to see how much data it has to rewrite.

for future - just don't do alter table, with default, and not null.
doing it via add column; set default; batch-backfill data, set not null
will take longer but will be done with much shorter locks.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread AI Rumman

Table size is 1186 MB.
I split the command in three steps as you said, but the result same during
the update operation.
One more thing, I have just restored the db from dump and analyzed it and
I am using Postgresql 9.1 with 3 GB Ram with dual core machine.


On Tue, Sep 11, 2012 at 7:59 PM, hubert depesz lubaczewski <
dep...@depesz.com> wrote:

> On Tue, Sep 11, 2012 at 07:55:24PM +0600, AI Rumman wrote:
> > I added the excel file for locks data.
>
> well, it worked, but why didn't you just make it text file, in notepad or
> something like this?
>
> > I was surprised to see that while I was updating a single column value
> for
> > all records in a tables, all indexes are locked by the server.
>
> alter table is not locked (At least looking at the pg_locks data you
> showed).
>
> this means - it just takes long time.
>
> Please do:
> select pg_total_relation_size('entity');
> to see how much data it has to rewrite.
>
> for future - just don't do alter table, with default, and not null.
> doing it via add column; set default; batch-backfill data, set not null
> will take longer but will be done with much shorter locks.
>
> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact
> with it.
>
> http://depesz.com/
>

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread AI Rumman

I added the excel file for locks data.
I was surprised to see that while I was updating a single column value for
all records in a tables, all indexes are locked by the server.

On Tue, Sep 11, 2012 at 7:44 PM, hubert depesz lubaczewski <
dep...@depesz.com> wrote:

> On Tue, Sep 11, 2012 at 07:20:28PM +0600, AI Rumman wrote:
> > I have a table as follows:
> > I execued the query:
> > ALTER TABLE entity ADD COLUMN owner_type char(1) NOT NULL default 'U';
> >
> > The db is stuck. The enity table has 2064740 records;
>
> such alter table has to rewrite whole table. So it will take a while
>
> > Watching locks:
>
> output of this was perfectly unreadable, because your email client
> wrapped lines at some random places.
>
> In future - please put such dumps on some paste site, or just attach it
> to mail, and not copy/paste them to body of message.
>
> Best regards,
>
> depesz
>
> --
> The best thing about modern society is how easy it is to avoid contact
> with it.
>
> http://depesz.com/
>


pglocks.xls
Description: MS-Excel spreadsheet

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread hubert depesz lubaczewski

On Tue, Sep 11, 2012 at 07:20:28PM +0600, AI Rumman wrote:
> I have a table as follows:
> I execued the query:
> ALTER TABLE entity ADD COLUMN owner_type char(1) NOT NULL default 'U';
> 
> The db is stuck. The enity table has 2064740 records;

such alter table has to rewrite whole table. So it will take a while

> Watching locks:

output of this was perfectly unreadable, because your email client
wrapped lines at some random places.

In future - please put such dumps on some paste site, or just attach it
to mail, and not copy/paste them to body of message.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread Albe Laurenz

AI Rumman wrote:
> I execued the query:
> ALTER TABLE entity ADD COLUMN owner_type char(1) NOT NULL default 'U';
> 
> The db is stuck. The enity table has 2064740 records;
> 
> Watching locks:
[all locks are granted]

> Any idea for the db stuck?

To add the column, PostgreSQL has to modify all rows in the table.

But then 2064740 records is not very much, so it shouldn't take forever.

Do you see processor or I/O activity?

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] add column with default value is very slow

2012-09-11 Thread AI Rumman

I have a table as follows:
\d entity
 Table "public.entity"
Column|Type | Modifiers
--+-+
 crmid| integer | not null
 smcreatorid  | integer | not null default 0
 smownerid| integer | not null default 0
 modifiedby   | integer | not null default 0
 setype   | character varying(30)   | not null
 description  | text|
 createdtime  | timestamp without time zone | not null
 modifiedtime | timestamp without time zone | not null
 viewedtime   | timestamp without time zone |
 status   | character varying(50)   |
 version  | integer | not null default 0
 presence | integer | default 1
 deleted  | integer | not null default 0
Indexes:
"entity_pkey" PRIMARY KEY, btree (crmid)
"entity_createdtime_idx" btree (createdtime)
"entity_modifiedby_idx" btree (modifiedby)
"entity_modifiedtime_idx" btree (modifiedtime)
"entity_setype_idx" btree (setype) WHERE deleted = 0
"entity_smcreatorid_idx" btree (smcreatorid)
"entity_smownerid_idx" btree (smownerid)
"ftx_en_entity_description" gin (to_tsvector('vcrm_en'::regconfig,
for_fts(description)))
"entity_deleted_idx" btree (deleted)
Referenced by:
TABLE "service" CONSTRAINT "fk_1_service" FOREIGN KEY (serviceid)
REFERENCES entity(crmid) ON DELETE CASCADE
TABLE "servicecontracts" CONSTRAINT "fk_1_servicecontracts" FOREIGN KEY
(servicecontractsid) REFERENCES entity(crmid) ON DELETE CASCADE
TABLE "vantage_cc2entity" CONSTRAINT "fk_vantage_cc2entity_entity"
FOREIGN KEY (crm_id) REFERENCES entity(crmid) ON UPDATE CASCADE ON DELETE
CASCADE
TABLE "vantage_emails_optout_history" CONSTRAINT
"fk_vantage_emails_optout_history_crmid" FOREIGN KEY (crmid) REFERENCES
entity(crmid) ON DELETE CASCADE
TABLE "vantage_emails_optout_history" CONSTRAINT
"fk_vantage_emails_optout_history_emailid" FOREIGN KEY (emailid) REFERENCES
entity(crmid) ON DELETE CASCADE

I execued the query:
ALTER TABLE entity ADD COLUMN owner_type char(1) NOT NULL default 'U';

The db is stuck. The enity table has 2064740 records;

Watching locks:
select
 pg_stat_activity.datname,pg_class.relname,pg_locks.mode, pg_locks.granted,
pg_stat_activity.usename,substr(pg_stat_activity.current_query,1,10),
pg_stat_activity.query_start,
age(now(),pg_stat_activity.query_start) as "age", pg_stat_activity.procpid
from pg_stat_activity,pg_locks left
outer join pg_class on (pg_locks.relation = pg_class.oid)
where pg_locks.pid=pg_stat_activity.procpid order by query_start;


  datname  |   relname   |mode
| granted | usename  |   substr   |  query_start  |
  age   | procpid
---+-+-+-+--++---+-+-
 db_test | entity_modifiedtime_idx  | AccessExclusiveLock | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |
  13574
 db_test | | ExclusiveLock   | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 |
00:45:46.101971 |   13574
 db_test | entity_modifiedby_idx| AccessExclusiveLock | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |
  13574
 db_test | entity_createdtime_idx   | AccessExclusiveLock | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |
  13574
 db_test | entity| ShareLock   | t   | user
| ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |   13574
 db_test | entity| AccessExclusiveLock | t   | user
| ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |   13574
 db_test | | AccessExclusiveLock | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 |
00:45:46.101971 |   13574
 db_test | | AccessExclusiveLock | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 |
00:45:46.101971 |   13574
 db_test | | ExclusiveLock   | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 |
00:45:46.101971 |   13574
 db_test | entity_pkey   | AccessExclusiveLock | t   | user
| ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |   13574
 db_test | | ShareLock   | t
| user | ALTER TABL | 2012-09-11 12:26:20.269965+06 |
00:45:46.101971 |   13574
 db_test | ftx_en_entity_description | AccessExclusiveLock | t   | user
| ALTER TABL | 2012-09-11 12:26:20.269965+06 | 00:45:46.101971 |   13574
 db_test |

Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-11 Thread Bill Martin

On 10/09/12 16:24, bill_mar...@freenet.de wrote:

Hi All

I´ve ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );


If I´m seaching for a word which is NOT in the column content the query plan 
and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few 
milliseconds.

core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

"Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)"
"  ->  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 
width=621) (actual time=52147.147..52147.147 rows=0 loops=1)"
"Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)"
"Total runtime: 52147.173 ms"

Is there any posibility to improve the performance even if the limit is only 
10? Is it possible to determine that the query optimizer takes only the fast 
bitmap heap scan instead of the slow seq scan?

The big hammer is: "set enable_seqscan = off", but if you tell which PG version 
you're on there may be something to do. I suggest you'd start by bumping the 
statistics target for the column to 1 and run analyze to see what that 
changes.

--
Jesper

Hi,
my email client delete a lot of the content of the original thread message. 
Here is the full content:

Hi All

I´ve created following table which contains one million records.

CREATE TABLE core_content
(
  id bigint NOT NULL,
  content text NOT NULL,
  short_content text,
  CONSTRAINT core_content_pkey PRIMARY KEY (id )
)

CREATE INDEX ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );


If I´m seaching for a word which is not in the column content the query plan 
and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few 
milliseconds.

select * from core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

"Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 
rows=0 loops=1)"
"  ->  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 
rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)"
"Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)"
"->  Bitmap Index Scan on ft_simple_core_content_content_idx  
(cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 
loops=1)"
"  Index Cond: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)"
"Total runtime: 0.277 ms"

If I choose 3926 or any lower number (e.g. 10) the query execution took more 
than fifty seconds.

select * from core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

"Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)"
"  ->  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 
width=621) (actual time=52147.147..52147.147 rows=0 loops=1)"
"Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)"
"Total runtime: 52147.173 ms"

Is there any posibility to tune up the performance even if the limit is only 
10? Is it possible to determine that the query optimizer takes
only the fast bitmap heap scan instead of the slow seq scan?

I use PostgreSQL 9.1.5.; Intel i5-2400 @ 3.1 GHz, 16GB; Windows 7 64 Bit

Regards,
Bill Martin

Re: [PERFORM] : PostgreSQL Index behavior

Re: [PERFORM] Planner selects different execution plans depending on limit

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

Re: [PERFORM] add column with default value is very slow

[PERFORM] add column with default value is very slow

Re: [PERFORM] Planner selects different execution plans depending on limit

11 matches

Site Navigation

Mail list logo

Footer information