Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-13 Thread Jeff Janes
On Sat, Nov 12, 2016 at 5:33 PM, otar shavadze  wrote:

> Tried
>
>   OPERATOR(pg_catalog.@>)
>
>
>  as Tom mentioned, but still, don't get fast performance when value does
> not existed in any array.
>

Did you build the correct index?


>
> Also "played" with many   different ways, gin, gist indexes (gin with and
> without *gin__int_ops*)  but, always, there was some situation, where
> search in array was slow.
>

Yes.  There will always be some situation when the array search is slow.
Is that situation one that a specific person cares about?  Hard to tell,
since you have not given us any additional useful information.

I don't know exactly, may be I am wrong, but what I understood after
> several day "trying", is that, I never will use arrays, with tables more
> than 500 000-1000 000 rows, because then  searching in this array is
> somehow problematic.
>
> I rebuild my structure and added another table (instead of using array)
> and then used join's instead of searching in array.
>
> That's works perfectly,   joining works fast as hell, even for several
> millions rows in each table.
>

"Properly" normalizing your data is a wonderful thing, no doubt about it,
if you are prepared to deal with the consequences of doing so.  But not
everyone has that luxury.  Which is why there is more than one way of doing
things.

Cheers,

Jeff


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-12 Thread otar shavadze
Tried

  OPERATOR(pg_catalog.@>)


 as Tom mentioned, but still, don't get fast performance when value does
not existed in any array.

Also "played" with many   different ways, gin, gist indexes (gin with and
without *gin__int_ops*)  but, always, there was some situation, where
search in array was slow.

I don't know exactly, may be I am wrong, but what I understood after
several day "trying", is that, I never will use arrays, with tables more
than 500 000-1000 000 rows, because then  searching in this array is
somehow problematic.

I rebuild my structure and added another table (instead of using array) and
then used join's instead of searching in array.

That's works perfectly,   joining works fast as hell, even for several
millions rows in each table.

On Fri, Nov 11, 2016 at 12:58 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane  wrote:
> >> If you are using that contrib module, and it's capturing this operator
> >> reference, that would probably explain the bad estimate.  You could
> >> drop the extension if you're not depending on its other features, or you
> >> could explicitly qualify the operator name ("operator(pg_catalog.@>)"),
> >> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ...
> UPDATE
> >> afterwards).
>
> > Isn't the operator determined at index build time?  If he doesn't want to
> > update to 9.6, I think he would need to rebuild the index, removing
> > the "gin__int_ops" specification.
>
> The operator in the query isn't.  But yes, if he's using an index that's
> built on the extension's opclass, he'd need to rebuild that too in order
> to still use the index with the core @> operator.
>
> regards, tom lane
>


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-10 Thread Tom Lane
Jeff Janes  writes:
> On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane  wrote:
>> If you are using that contrib module, and it's capturing this operator
>> reference, that would probably explain the bad estimate.  You could
>> drop the extension if you're not depending on its other features, or you
>> could explicitly qualify the operator name ("operator(pg_catalog.@>)"),
>> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE
>> afterwards).

> Isn't the operator determined at index build time?  If he doesn't want to
> update to 9.6, I think he would need to rebuild the index, removing
> the "gin__int_ops" specification.

The operator in the query isn't.  But yes, if he's using an index that's
built on the extension's opclass, he'd need to rebuild that too in order
to still use the index with the core @> operator.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-10 Thread Jeff Janes
On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane  wrote:

> otar shavadze  writes:
> >> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
> >> operator not the core operator?  The intarray operator didn't get
> plugged
> >> into any real estimation logic until 9.6.
>
> > So, you mean that better would be go to version 9.6 ?
>
> If you are using that contrib module, and it's capturing this operator
> reference, that would probably explain the bad estimate.  You could
> drop the extension if you're not depending on its other features, or you
> could explicitly qualify the operator name ("operator(pg_catalog.@>)"),
> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE
> afterwards).
>

Isn't the operator determined at index build time?  If he doesn't want to
update to 9.6, I think he would need to rebuild the index, removing
the "gin__int_ops" specification.

Cheers,

Jeff


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-10 Thread Tom Lane
otar shavadze  writes:
>> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
>> operator not the core operator?  The intarray operator didn't get plugged
>> into any real estimation logic until 9.6.

> So, you mean that better would be go to version 9.6 ?

If you are using that contrib module, and it's capturing this operator
reference, that would probably explain the bad estimate.  You could
drop the extension if you're not depending on its other features, or you
could explicitly qualify the operator name ("operator(pg_catalog.@>)"),
or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE
afterwards).

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-10 Thread otar shavadze
@Jeff


> most_common_elems.  Is it empty, or is it not empty?  If not empty, does
> it contain the specific values you used in your queries?


No, most_common_elems is not empty. it contain the specific values I used
in queries.



@Tom

>
> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
> operator not the core operator?  The intarray operator didn't get plugged
> into any real estimation logic until 9.6.



So, you mean that better would be go to version 9.6 ?

On Wed, Nov 9, 2016 at 8:35 AM, Tom Lane  wrote:

> I wrote:
> > Seems like your problem here is that the planner has no idea about the
> > selectivity of this condition --- if it did, I think it would have
> > made the right choice, because it would have made a much higher estimate
> > for the cost of the indexscan.
>
> > AFAICT, Postgres 9.5 does make a reasonably correct guess when given
> > up-to-date stats.  I speculate that you need to ANALYZE this table.
>
> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
> operator not the core operator?  The intarray operator didn't get plugged
> into any real estimation logic until 9.6.
>
> regards, tom lane
>


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Tom Lane
I wrote:
> Seems like your problem here is that the planner has no idea about the
> selectivity of this condition --- if it did, I think it would have
> made the right choice, because it would have made a much higher estimate
> for the cost of the indexscan.

> AFAICT, Postgres 9.5 does make a reasonably correct guess when given
> up-to-date stats.  I speculate that you need to ANALYZE this table.

Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray
operator not the core operator?  The intarray operator didn't get plugged
into any real estimation logic until 9.6.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Jeff Janes
On Tue, Nov 8, 2016 at 12:27 PM, otar shavadze  wrote:

>
> p.s. In "pg_stats" really many values (long lists in "most_common_vals",
> "most_common_freqs") and in another columns
> Which one columns should I show you? All?
>

most_common_elems.  Is it empty, or is it not empty?  If not empty, does it
contain the specific values you used in your queries?

Cheers,

Jeff


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Tom Lane
otar shavadze  writes:
> "  ->  Bitmap Index Scan on idx  (cost=0.00..23.80 rows=533
> width=0) (actual time=35.054..35.054 rows=90052 loops=1)"
> "Index Cond: (my_array @> '{8}'::integer[])"

Seems like your problem here is that the planner has no idea about the
selectivity of this condition --- if it did, I think it would have
made the right choice, because it would have made a much higher estimate
for the cost of the indexscan.

AFAICT, Postgres 9.5 does make a reasonably correct guess when given
up-to-date stats.  I speculate that you need to ANALYZE this table.
If there are a lot of distinct possible values in the arrays, increasing
the statistics target for the column might be needed.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread otar shavadze
I increased rows limit from 50 to 500, because now, difference visible much
better, so query is:

explain analyze *SELECT * FROM table_name WHERE my_array @>
'{x}'::integer[] ORDER BY id desc LIMIT 500*


with GIN index:

"Limit  (cost=107.83..109.08 rows=500 width=905) (actual
time=978.256..978.293 rows=500 loops=1)"
"  ->  Sort  (cost=107.83..109.16 rows=533 width=905) (actual
time=978.254..978.272 rows=500 loops=1)"
"Sort Key: id DESC"
"Sort Method: top-N heapsort  Memory: 589kB"
"->  Bitmap Heap Scan on table_name  (cost=23.93..83.69 rows=533
width=905) (actual time=50.612..917.422 rows=90049 loops=1)"
"  Recheck Cond: (my_array @> '{8}'::integer[])"
"  Heap Blocks: exact=46525"
"  ->  Bitmap Index Scan on idx  (cost=0.00..23.80 rows=533
width=0) (actual time=35.054..35.054 rows=90052 loops=1)"
"Index Cond: (my_array @> '{8}'::integer[])"
"Planning time: 0.202 ms"
"Execution time: 978.718 ms"


Without index:

"Limit  (cost=7723.12..7724.37 rows=500 width=122) (actual
time=184.041..184.102 rows=500 loops=1)"
"  ->  Sort  (cost=7723.12..7724.45 rows=534 width=122) (actual
time=184.039..184.052 rows=500 loops=1)"
"Sort Key: id DESC"
"Sort Method: top-N heapsort  Memory: 157kB"
"->  Seq Scan on table_name (cost=0.00..7698.93 rows=534 width=122)
(actual time=0.020..176.079 rows=84006 loops=1)"
"  Filter: (my_array @> '{14}'::integer[])"
"  Rows Removed by Filter: 450230"
"Planning time: 0.165 ms"
"Execution time: 184.155 ms"


Postgres version: 9.5; OS: Windows 7; RAM: 8GB

In picture is some config current values.


p.s. In "pg_stats" really many values (long lists in "most_common_vals",
"most_common_freqs") and in another columns
Which one columns should I show you? All?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Jeff Janes
On Tue, Nov 8, 2016 at 9:43 AM, otar shavadze  wrote:

> I have table with 500 000 rows, I have int[] column "my_array" in this
> table, this array column contains minimum 1 and maximum 5 different values.
>
> I have GIN index on my_array column:
>
> * "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"*
>
> Then I use this query: "*SELECT * FROM  table_name WHERE my_array @>
> '{3}'::integer[]  ORDER BY id LIMIT 50"*
>
> Execution time of this query is approximately 500-1000 ms. Then if I drop
> gin index "*idx*", query works extremely fast, less than 20 ms.
>

Please post the information requested here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions

Also, can you show,

select * from pg_stats where tablename ='table_name' and attname='my_array'
\x\g\x

Cheers,

Jeff


Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread Oleg Bartunov
On Tue, Nov 8, 2016 at 8:43 PM, otar shavadze  wrote:

> I have table with 500 000 rows, I have int[] column "my_array" in this
> table, this array column contains minimum 1 and maximum 5 different values.
>

you didn't show us postgres version.


>
> I have GIN index on my_array column:
>
> * "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"*
>
> Then I use this query: "*SELECT * FROM  table_name WHERE my_array @>
> '{3}'::integer[]  ORDER BY id LIMIT 50"*
>
> Execution time of this query is approximately 500-1000 ms. Then if I drop
> gin index "*idx*", query works extremely fast, less than 20 ms.
>

explain analyze would help us to see the problem.



>
> But, if I search value, which does not exists at all, for example no one
> array not contains number "77" and I search: * "WHERE my_array @>
> '{77}'::integer[]" *, then using gin index is much better and fast, (less
> than 20 ms), but without index, query takes 500-1000 ms.
>
>
> So, what to do? For values which does not in any one rows, using index is
> much better, but for values,  which are at least in several rows, using
>  index, slows down performance.
>
> Can somehow make, that searching was always fast (when value exist in
> array and when not)
>
>
>


[GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists

2016-11-08 Thread otar shavadze
I have table with 500 000 rows, I have int[] column "my_array" in this
table, this array column contains minimum 1 and maximum 5 different values.

I have GIN index on my_array column:

* "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"*

Then I use this query: "*SELECT * FROM  table_name WHERE my_array @>
'{3}'::integer[]  ORDER BY id LIMIT 50"*

Execution time of this query is approximately 500-1000 ms. Then if I drop
gin index "*idx*", query works extremely fast, less than 20 ms.

But, if I search value, which does not exists at all, for example no one
array not contains number "77" and I search: * "WHERE my_array @>
'{77}'::integer[]" *, then using gin index is much better and fast, (less
than 20 ms), but without index, query takes 500-1000 ms.


So, what to do? For values which does not in any one rows, using index is
much better, but for values,  which are at least in several rows, using
 index, slows down performance.

Can somehow make, that searching was always fast (when value exist in array
and when not)


Re: [GENERAL] GIN Indexes: Extensibility

2016-07-28 Thread Anton Ananich
Paul,

This is a really valuable idea. It will work in some situations for me. But in 
other situations I do not know if table will have a key of type int[] or 
string[] or even mixed. That’s why I’d wish to use JSON arrays and customize 
sort ordering.

Anyway I appreciate you shared this approach!
Regards,
Anthony Ananich
http://ananich.pro

On Jul 27, 2016, at 18:00, Paul Jungwirth  wrote:

> On 07/27/2016 07:44 AM, Vick Khera wrote:
>> On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  
>> wrote:
>>> In my situation this order is invalid. Obviously, year 2016 should go after
>>> 2014, like that:
>> 
>> I think you expect JSONB to sort differently than it does. I cannot
>> imagine what a "natural" ordering of arbitrary JSON objects is.
> 
> FWIW, Postgres arrays do sort in the way he's expecting:
> 
> paul=# create table t (id integer, v integer[]);
> CREATE TABLE
> paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, 
> array[2016]);
> INSERT 0 3
> paul=# select * from t order by v;
> id |v
> +--
>  1 | {2014}
>  2 | {2014,1}
>  3 | {2016}
> (3 rows)
> 
> So maybe convert to an array before sorting?
> 
> Paul
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] GIN Indexes: Extensibility

2016-07-28 Thread Anton Ananich
Vick, you are right. That’s why I’d wish to add some custom code to MY 
PostgreSQL instance and set such a sort order, which is optimal for my 
application.

On Jul 27, 2016, at 17:44, Vick Khera  wrote:

> On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  
> wrote:
>> In my situation this order is invalid. Obviously, year 2016 should go after
>> 2014, like that:
> 
> I think you expect JSONB to sort differently than it does. I cannot
> imagine what a "natural" ordering of arbitrary JSON objects is.
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GIN Indexes: Extensibility

2016-07-27 Thread Paul Jungwirth

On 07/27/2016 07:44 AM, Vick Khera wrote:

On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  wrote:

In my situation this order is invalid. Obviously, year 2016 should go after
2014, like that:


I think you expect JSONB to sort differently than it does. I cannot
imagine what a "natural" ordering of arbitrary JSON objects is.


FWIW, Postgres arrays do sort in the way he's expecting:

paul=# create table t (id integer, v integer[]);
CREATE TABLE
paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, 
array[2016]);

INSERT 0 3
paul=# select * from t order by v;
 id |v
+--
  1 | {2014}
  2 | {2014,1}
  3 | {2016}
(3 rows)

So maybe convert to an array before sorting?

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GIN Indexes: Extensibility

2016-07-27 Thread Vick Khera
On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  wrote:
> In my situation this order is invalid. Obviously, year 2016 should go after
> 2014, like that:

I think you expect JSONB to sort differently than it does. I cannot
imagine what a "natural" ordering of arbitrary JSON objects is.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] GIN Indexes: Extensibility

2016-07-27 Thread Anton Ananich
Dear All,

Here is what I have:

user=# create table FOO (key jsonb);
CREATE TABLE
user=# insert into FOO(key) values ('[2014]'), ('[2015]'), ('[2016]'), ('[2014, 
2]'), ('[2014, 2, 3]'), ('[2014, 3]'), ('[2014,2,4]'), ('[2014, 2,4]'), 
('[2014,3,13]'), ('[2014, 2, 15]');
INSERT 0 10
user=# SELECT key FROM FOO order by key;
  key  
---
 [2014]
 [2015] <==
 [2016] <==
 [2014, 2]
 [2014, 3] <==
 [2014, 2, 3]
 [2014, 2, 4]
 [2014, 2, 4]
 [2014, 2, 15]
 [2014, 3, 13]
(10 rows)


In my situation this order is invalid. Obviously, year 2016 should go after 
2014, like that:

  key  
---
 [2014]
 [2014, 2]
 [2014, 2, 3]
 [2014, 2, 4]
 [2014, 2, 4]
 [2014, 2, 15]
 [2014, 3] <==
 [2014, 3, 13]
 [2015] <==
 [2016] <==

This is a simplified example, my real application is much more complicated and 
sorted arrays could have tens of values, could even be arrays of arrays.For 
this reason I need to customize sort function.

I found that there is a way to customize function `int compare(Datum a, Datum 
b)` (proof link: 
https://www.postgresql.org/docs/9.5/static/gin-extensibility.html), but I found 
no example which shows how to use it.

I’d appreciate any information which could help me to achieve the described 
result, even if it is just a link to an existing example.

Regards,
Anthony Ananich




Re: [GENERAL] GiN indexes

2010-08-31 Thread Oleg Bartunov

On Tue, 31 Aug 2010, Stavroula Gasparotto wrote:


Currently, only the B-tree, GiST and GIN index types support
multicolumn indexes.



What does this mean exactly if I'm trying to create a multicolumn GIN
index? Does this mean the index can contain one or more tsvector type
fields only, or can I combine tsvector type field with other type fields
in the index, such as a timestamp column?


It does both.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general