Re: [PERFORM] issue in postgresql 9.1.3 in using arrow key in Solaris platform

2014-12-04 Thread Josh Berkus
On 11/26/2014 02:16 AM, M Tarkeshwar Rao wrote:
> Hi all,
> 
>  
> 
> We are facing following issue in postgresql 9.1.3 in using arrow key in
> Solaris platform.
> 
> *Can you please help us to resolve it or any new release has fix for
> this or any workaround for this?*

Mr. Rao:

1) Please do not cross-post to multiple mailing lists.  In the future,
this may cause you to be banned from the PostgreSQL mailing lists.

2) PostgreSQL 9.1.3 is 11 patch releases behind and contains multiple
published security holes.

3) Sounds like there's a bug in the readline or libedit libraries for
your platform.  How did you build PostgreSQL?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query doesn't use index on hstore column

2014-12-04 Thread Michael Barker
Hi,

Apologies if this is the wrong list for this time of query (first time
posting).

I'm currently experimenting with hstore on Posgtres 9.4rc1.  I've created a
table with an hstore column, with and index on that column (tried both gin
and btree indexes) and the explain plan says that the index is never used
for the lookup and falls to a sequential scan every time (table has 1 000
000 rows).  The query plans and execution time for btree index, gin index
and unindexed are the same.  Is there something I'm doing wrong or missing
in order to get indexes to work on hstore columns?

Details:

0) Postgres version:

barkerm=# select version();
version

---
 PostgreSQL 9.4rc1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3
20140911 (Red Hat 4.8.3-7), 64-bit
(1 row)

1)  Created table with hstore column and btree index.

barkerm=# \d audit
   Table "public.audit"
Column |Type |
Modifiers
---+-+
 id| integer | not null default
nextval('audit_id_seq'::regclass)
 principal_id  | integer |
 created_at| timestamp without time zone |
 root  | character varying(255)  |
 template_code | character(3)|
 attributes| hstore  |
 args  | character varying(255)[]|
Indexes:
"audit_pkey" PRIMARY KEY, btree (id)
"audit_attributes_idx" btree (attributes)

2) Insert 1 000 000 rows

barkerm=# select count(*) from audit;
  count
-
 100
(1 row)

3) Run analyse.

4) Pick a row somewhere in the middle:

barkerm=# select id, attributes from audit where id = 50;
   id   |   attributes
+-
 50 | "accountId"=>"1879355460", "instrumentId"=>"1625557725"
(1 row)

5) Explain query using the attributes column in the where clause (uses Seq
Scan).

barkerm=# explain analyse select * from audit where attributes->'accountId'
= '1879355460';
 QUERY PLAN


 Seq Scan on audit  (cost=0.00..35409.00 rows=5000 width=133) (actual
time=114.314..218.821 rows=1 loops=1)
   Filter: ((attributes -> 'accountId'::text) = '1879355460'::text)
   Rows Removed by Filter: 99
 Planning time: 0.074 ms
 Execution time: 218.843 ms
(5 rows)

6) Rebuild the data using a gin index.

barkerm=# \d audit
   Table "public.audit"
Column |Type |
Modifiers
---+-+
 id| integer | not null default
nextval('audit_id_seq'::regclass)
 principal_id  | integer |
 created_at| timestamp without time zone |
 root  | character varying(255)  |
 template_code | character(3)|
 attributes| hstore  |
 args  | character varying(255)[]|
Indexes:
"audit_pkey" PRIMARY KEY, btree (id)
"audit_attributes_idx" gin (attributes)

7) Again explain the selection of a single row using a constraint that
references the hstore column.  Seq Scan is still used.

barkerm=# explain analyse select * from audit where attributes->'accountId'
= '1238334838';
 QUERY PLAN


 Seq Scan on audit  (cost=0.00..35409.00 rows=5000 width=133) (actual
time=122.173..226.363 rows=1 loops=1)
   Filter: ((attributes -> 'accountId'::text) = '1238334838'::text)
   Rows Removed by Filter: 99
 Planning time: 0.164 ms
 Execution time: 226.392 ms
(5 rows)

8) Drop index an query as a baseline.

barkerm=# explain analyse select * from audit where attributes->'accountId'
= '1238334838';
 QUERY PLAN


 Seq Scan on audit  (cost=0.00..35409.00 rows=5000 width=133) (actual
time=109.115..212.666 rows=1 loops=1)
   Filter: ((attributes -> 'accountId'::text) = '1238334838'::text)
   Rows Removed by Filter: 99
 Planning time: 0.113 ms
 Execution time: 212.701 ms
(5 rows)

Regards,
Michael Barker.


Re: [PERFORM] Query doesn't use index on hstore column

2014-12-04 Thread k...@rice.edu
On Fri, Dec 05, 2014 at 09:42:20AM +1300, Michael Barker wrote:
> 1)  Created table with hstore column and btree index.
> 
> barkerm=# \d audit
>Table "public.audit"
> Column |Type |
> Modifiers
> ---+-+
>  id| integer | not null default
> nextval('audit_id_seq'::regclass)
>  principal_id  | integer |
>  created_at| timestamp without time zone |
>  root  | character varying(255)  |
>  template_code | character(3)|
>  attributes| hstore  |
>  args  | character varying(255)[]|
> Indexes:
> "audit_pkey" PRIMARY KEY, btree (id)
> "audit_attributes_idx" btree (attributes)
> 
> ...
> 5) Explain query using the attributes column in the where clause (uses Seq
> Scan).
> 
> barkerm=# explain analyse select * from audit where attributes->'accountId'
> = '1879355460';
>  QUERY PLAN
> 
> 
>  Seq Scan on audit  (cost=0.00..35409.00 rows=5000 width=133) (actual
> time=114.314..218.821 rows=1 loops=1)
>Filter: ((attributes -> 'accountId'::text) = '1879355460'::text)
>Rows Removed by Filter: 99
>  Planning time: 0.074 ms
>  Execution time: 218.843 ms
> (5 rows)
> 
Hi Michael,

I think your index definitions need to be on the particular attribute from
attributes and not attributes itself. That works but it does not apply to
the query you show above. I think that the binary json type in 9.4 will
do what you want. I have not worked with it myself, just looked at the docs.

Regards,
Ken


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query doesn't use index on hstore column

2014-12-04 Thread Tom Lane
Michael Barker  writes:
> I'm currently experimenting with hstore on Posgtres 9.4rc1.  I've created a
> table with an hstore column, with and index on that column (tried both gin
> and btree indexes) and the explain plan says that the index is never used
> for the lookup and falls to a sequential scan every time (table has 1 000
> 000 rows).  The query plans and execution time for btree index, gin index
> and unindexed are the same.  Is there something I'm doing wrong or missing
> in order to get indexes to work on hstore columns?

Well, first off, a btree index is fairly useless for this query,
because btree has no concept that the hstore has any sub-structure.
A GIN index or GIST index could work though.  Secondly, you have to
remember that indexable WHERE conditions in Postgres are *always* of
the form "WHERE indexed_column indexable_operator some_comparison_value".
So the trick is to recast the condition you have into something that
looks like that.  Instead of

WHERE attributes->'accountId' = '1879355460'

you could do

WHERE attributes @> 'accountId=>1879355460'

(@> being the hstore containment operator, ie "does attributes contain
a pair that looks like this?") or equivalently but possibly easier to
generate,

WHERE attributes @> hstore('accountId', '1879355460')

Another possibility if you're only concerned about indexing searches
for one or a few specific keys is to use expression indexes:

CREATE INDEX ON audit ((attributes->'accountId'));

whereupon your original query works, since the left-hand side of
the '=' operator is now the indexed expression.  (Here, since you
are testing plain equality on the indexed value, a btree works fine.)

You might care to read
http://www.postgresql.org/docs/9.4/static/indexes.html
to get a better handle on what Postgres indexes can and can't do.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-04 Thread Simon Riggs
On 30 September 2014 at 05:53, Simon Riggs  wrote:
> On 29 September 2014 16:00, Merlin Moncure  wrote:
>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs  wrote:
>>> The problem, as I see it, is different. We assume that if there are
>>> 100 distinct values and you use LIMIT 1 that you would only need to
>>> scan 1% of rows. We assume that the data is arranged in the table in a
>>> very homogenous layout. When data is not, and it seldom is, we get
>>> problems.
>>
>> Hm, good point -- 'data proximity'.  At least in theory, can't this be
>> measured and quantified?  For example, given a number of distinct
>> values, you could estimate the % of pages read (or maybe non
>> sequential seeks relative to the number of pages) you'd need to read
>> all instances of a particular value in the average (or perhaps the
>> worst) case.   One way of trying to calculate that would be to look at
>> proximity of values in sampled pages (and maybe a penalty assigned for
>> high update activity relative to table size).  Data proximity would
>> then become a cost coefficient to the benefits of LIMIT.
>
> The necessary first step to this is to realise that we can't simply
> apply the LIMIT as a reduction in query cost, in all cases.
>
> The way I'm seeing it, you can't assume the LIMIT will apply to any
> IndexScan that doesn't have an index condition. If it has just a
> filter, or nothing at all, just an ordering then it could easily scan
> the whole index if the stats are wrong.
>
> So plans like this could be wrong, by assuming the scan will end
> earlier because of the LIMIT than it actually will.
>
> Limit
>   IndexScan (no index cond)
>
> Limit
>   NestJoin
> IndexScan (no index cond)
> SomeScan
>
> Limit
>   NestJoin
> NestJoin
>   IndexScan (no index cond)
>   SomeScan
>SomeScan
>
> and deeper...
>
> I'm looking for a way to identify and exclude such plans, assuming
> that this captures at least some of the problem plans.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem by keeping the total cost estimate, just
move some of that into startup cost so LIMIT does not reduce costs as
much as before.

It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.

I tried much less subtle approaches involving direct prevention of
LIMIT pushdown but the code was much too complex for my liking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


avoid_limit_pushdown.v3.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance