Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
On Sat, Nov 12, 2016 at 5:33 PM, otar shavadzewrote: > Tried > > OPERATOR(pg_catalog.@>) > > > as Tom mentioned, but still, don't get fast performance when value does > not existed in any array. > Did you build the correct index? > > Also "played" with many different ways, gin, gist indexes (gin with and > without *gin__int_ops*) but, always, there was some situation, where > search in array was slow. > Yes. There will always be some situation when the array search is slow. Is that situation one that a specific person cares about? Hard to tell, since you have not given us any additional useful information. I don't know exactly, may be I am wrong, but what I understood after > several day "trying", is that, I never will use arrays, with tables more > than 500 000-1000 000 rows, because then searching in this array is > somehow problematic. > > I rebuild my structure and added another table (instead of using array) > and then used join's instead of searching in array. > > That's works perfectly, joining works fast as hell, even for several > millions rows in each table. > "Properly" normalizing your data is a wonderful thing, no doubt about it, if you are prepared to deal with the consequences of doing so. But not everyone has that luxury. Which is why there is more than one way of doing things. Cheers, Jeff
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
Tried OPERATOR(pg_catalog.@>) as Tom mentioned, but still, don't get fast performance when value does not existed in any array. Also "played" with many different ways, gin, gist indexes (gin with and without *gin__int_ops*) but, always, there was some situation, where search in array was slow. I don't know exactly, may be I am wrong, but what I understood after several day "trying", is that, I never will use arrays, with tables more than 500 000-1000 000 rows, because then searching in this array is somehow problematic. I rebuild my structure and added another table (instead of using array) and then used join's instead of searching in array. That's works perfectly, joining works fast as hell, even for several millions rows in each table. On Fri, Nov 11, 2016 at 12:58 AM, Tom Lanewrote: > Jeff Janes writes: > > On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane wrote: > >> If you are using that contrib module, and it's capturing this operator > >> reference, that would probably explain the bad estimate. You could > >> drop the extension if you're not depending on its other features, or you > >> could explicitly qualify the operator name ("operator(pg_catalog.@>)"), > >> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... > UPDATE > >> afterwards). > > > Isn't the operator determined at index build time? If he doesn't want to > > update to 9.6, I think he would need to rebuild the index, removing > > the "gin__int_ops" specification. > > The operator in the query isn't. But yes, if he's using an index that's > built on the extension's opclass, he'd need to rebuild that too in order > to still use the index with the core @> operator. > > regards, tom lane >
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
Jeff Janeswrites: > On Thu, Nov 10, 2016 at 7:11 AM, Tom Lane wrote: >> If you are using that contrib module, and it's capturing this operator >> reference, that would probably explain the bad estimate. You could >> drop the extension if you're not depending on its other features, or you >> could explicitly qualify the operator name ("operator(pg_catalog.@>)"), >> or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE >> afterwards). > Isn't the operator determined at index build time? If he doesn't want to > update to 9.6, I think he would need to rebuild the index, removing > the "gin__int_ops" specification. The operator in the query isn't. But yes, if he's using an index that's built on the extension's opclass, he'd need to rebuild that too in order to still use the index with the core @> operator. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
On Thu, Nov 10, 2016 at 7:11 AM, Tom Lanewrote: > otar shavadze writes: > >> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray > >> operator not the core operator? The intarray operator didn't get > plugged > >> into any real estimation logic until 9.6. > > > So, you mean that better would be go to version 9.6 ? > > If you are using that contrib module, and it's capturing this operator > reference, that would probably explain the bad estimate. You could > drop the extension if you're not depending on its other features, or you > could explicitly qualify the operator name ("operator(pg_catalog.@>)"), > or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE > afterwards). > Isn't the operator determined at index build time? If he doesn't want to update to 9.6, I think he would need to rebuild the index, removing the "gin__int_ops" specification. Cheers, Jeff
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
otar shavadzewrites: >> Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray >> operator not the core operator? The intarray operator didn't get plugged >> into any real estimation logic until 9.6. > So, you mean that better would be go to version 9.6 ? If you are using that contrib module, and it's capturing this operator reference, that would probably explain the bad estimate. You could drop the extension if you're not depending on its other features, or you could explicitly qualify the operator name ("operator(pg_catalog.@>)"), or you could upgrade to 9.6 (don't forget to do ALTER EXTENSION ... UPDATE afterwards). regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
@Jeff > most_common_elems. Is it empty, or is it not empty? If not empty, does > it contain the specific values you used in your queries? No, most_common_elems is not empty. it contain the specific values I used in queries. @Tom > > Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray > operator not the core operator? The intarray operator didn't get plugged > into any real estimation logic until 9.6. So, you mean that better would be go to version 9.6 ? On Wed, Nov 9, 2016 at 8:35 AM, Tom Lanewrote: > I wrote: > > Seems like your problem here is that the planner has no idea about the > > selectivity of this condition --- if it did, I think it would have > > made the right choice, because it would have made a much higher estimate > > for the cost of the indexscan. > > > AFAICT, Postgres 9.5 does make a reasonably correct guess when given > > up-to-date stats. I speculate that you need to ANALYZE this table. > > Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray > operator not the core operator? The intarray operator didn't get plugged > into any real estimation logic until 9.6. > > regards, tom lane >
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
I wrote: > Seems like your problem here is that the planner has no idea about the > selectivity of this condition --- if it did, I think it would have > made the right choice, because it would have made a much higher estimate > for the cost of the indexscan. > AFAICT, Postgres 9.5 does make a reasonably correct guess when given > up-to-date stats. I speculate that you need to ANALYZE this table. Hmmm ... actually, I wonder if maybe '@>' here is the contrib/intarray operator not the core operator? The intarray operator didn't get plugged into any real estimation logic until 9.6. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
On Tue, Nov 8, 2016 at 12:27 PM, otar shavadzewrote: > > p.s. In "pg_stats" really many values (long lists in "most_common_vals", > "most_common_freqs") and in another columns > Which one columns should I show you? All? > most_common_elems. Is it empty, or is it not empty? If not empty, does it contain the specific values you used in your queries? Cheers, Jeff
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
otar shavadzewrites: > " -> Bitmap Index Scan on idx (cost=0.00..23.80 rows=533 > width=0) (actual time=35.054..35.054 rows=90052 loops=1)" > "Index Cond: (my_array @> '{8}'::integer[])" Seems like your problem here is that the planner has no idea about the selectivity of this condition --- if it did, I think it would have made the right choice, because it would have made a much higher estimate for the cost of the indexscan. AFAICT, Postgres 9.5 does make a reasonably correct guess when given up-to-date stats. I speculate that you need to ANALYZE this table. If there are a lot of distinct possible values in the arrays, increasing the statistics target for the column might be needed. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
I increased rows limit from 50 to 500, because now, difference visible much better, so query is: explain analyze *SELECT * FROM table_name WHERE my_array @> '{x}'::integer[] ORDER BY id desc LIMIT 500* with GIN index: "Limit (cost=107.83..109.08 rows=500 width=905) (actual time=978.256..978.293 rows=500 loops=1)" " -> Sort (cost=107.83..109.16 rows=533 width=905) (actual time=978.254..978.272 rows=500 loops=1)" "Sort Key: id DESC" "Sort Method: top-N heapsort Memory: 589kB" "-> Bitmap Heap Scan on table_name (cost=23.93..83.69 rows=533 width=905) (actual time=50.612..917.422 rows=90049 loops=1)" " Recheck Cond: (my_array @> '{8}'::integer[])" " Heap Blocks: exact=46525" " -> Bitmap Index Scan on idx (cost=0.00..23.80 rows=533 width=0) (actual time=35.054..35.054 rows=90052 loops=1)" "Index Cond: (my_array @> '{8}'::integer[])" "Planning time: 0.202 ms" "Execution time: 978.718 ms" Without index: "Limit (cost=7723.12..7724.37 rows=500 width=122) (actual time=184.041..184.102 rows=500 loops=1)" " -> Sort (cost=7723.12..7724.45 rows=534 width=122) (actual time=184.039..184.052 rows=500 loops=1)" "Sort Key: id DESC" "Sort Method: top-N heapsort Memory: 157kB" "-> Seq Scan on table_name (cost=0.00..7698.93 rows=534 width=122) (actual time=0.020..176.079 rows=84006 loops=1)" " Filter: (my_array @> '{14}'::integer[])" " Rows Removed by Filter: 450230" "Planning time: 0.165 ms" "Execution time: 184.155 ms" Postgres version: 9.5; OS: Windows 7; RAM: 8GB In picture is some config current values. p.s. In "pg_stats" really many values (long lists in "most_common_vals", "most_common_freqs") and in another columns Which one columns should I show you? All? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
On Tue, Nov 8, 2016 at 9:43 AM, otar shavadzewrote: > I have table with 500 000 rows, I have int[] column "my_array" in this > table, this array column contains minimum 1 and maximum 5 different values. > > I have GIN index on my_array column: > > * "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"* > > Then I use this query: "*SELECT * FROM table_name WHERE my_array @> > '{3}'::integer[] ORDER BY id LIMIT 50"* > > Execution time of this query is approximately 500-1000 ms. Then if I drop > gin index "*idx*", query works extremely fast, less than 20 ms. > Please post the information requested here: https://wiki.postgresql.org/wiki/Slow_Query_Questions Also, can you show, select * from pg_stats where tablename ='table_name' and attname='my_array' \x\g\x Cheers, Jeff
Re: [GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
On Tue, Nov 8, 2016 at 8:43 PM, otar shavadzewrote: > I have table with 500 000 rows, I have int[] column "my_array" in this > table, this array column contains minimum 1 and maximum 5 different values. > you didn't show us postgres version. > > I have GIN index on my_array column: > > * "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"* > > Then I use this query: "*SELECT * FROM table_name WHERE my_array @> > '{3}'::integer[] ORDER BY id LIMIT 50"* > > Execution time of this query is approximately 500-1000 ms. Then if I drop > gin index "*idx*", query works extremely fast, less than 20 ms. > explain analyze would help us to see the problem. > > But, if I search value, which does not exists at all, for example no one > array not contains number "77" and I search: * "WHERE my_array @> > '{77}'::integer[]" *, then using gin index is much better and fast, (less > than 20 ms), but without index, query takes 500-1000 ms. > > > So, what to do? For values which does not in any one rows, using index is > much better, but for values, which are at least in several rows, using > index, slows down performance. > > Can somehow make, that searching was always fast (when value exist in > array and when not) > > >
[GENERAL] Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists
I have table with 500 000 rows, I have int[] column "my_array" in this table, this array column contains minimum 1 and maximum 5 different values. I have GIN index on my_array column: * "CREATE INDEX idx ON table_name USING GIN(my_array gin__int_ops)"* Then I use this query: "*SELECT * FROM table_name WHERE my_array @> '{3}'::integer[] ORDER BY id LIMIT 50"* Execution time of this query is approximately 500-1000 ms. Then if I drop gin index "*idx*", query works extremely fast, less than 20 ms. But, if I search value, which does not exists at all, for example no one array not contains number "77" and I search: * "WHERE my_array @> '{77}'::integer[]" *, then using gin index is much better and fast, (less than 20 ms), but without index, query takes 500-1000 ms. So, what to do? For values which does not in any one rows, using index is much better, but for values, which are at least in several rows, using index, slows down performance. Can somehow make, that searching was always fast (when value exist in array and when not)
Re: [GENERAL] GIN Indexes: Extensibility
Paul, This is a really valuable idea. It will work in some situations for me. But in other situations I do not know if table will have a key of type int[] or string[] or even mixed. That’s why I’d wish to use JSON arrays and customize sort ordering. Anyway I appreciate you shared this approach! Regards, Anthony Ananich http://ananich.pro On Jul 27, 2016, at 18:00, Paul Jungwirthwrote: > On 07/27/2016 07:44 AM, Vick Khera wrote: >> On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich >> wrote: >>> In my situation this order is invalid. Obviously, year 2016 should go after >>> 2014, like that: >> >> I think you expect JSONB to sort differently than it does. I cannot >> imagine what a "natural" ordering of arbitrary JSON objects is. > > FWIW, Postgres arrays do sort in the way he's expecting: > > paul=# create table t (id integer, v integer[]); > CREATE TABLE > paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, > array[2016]); > INSERT 0 3 > paul=# select * from t order by v; > id |v > +-- > 1 | {2014} > 2 | {2014,1} > 3 | {2016} > (3 rows) > > So maybe convert to an array before sorting? > > Paul > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] GIN Indexes: Extensibility
Vick, you are right. That’s why I’d wish to add some custom code to MY PostgreSQL instance and set such a sort order, which is optimal for my application. On Jul 27, 2016, at 17:44, Vick Kherawrote: > On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich > wrote: >> In my situation this order is invalid. Obviously, year 2016 should go after >> 2014, like that: > > I think you expect JSONB to sort differently than it does. I cannot > imagine what a "natural" ordering of arbitrary JSON objects is. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] GIN Indexes: Extensibility
On 07/27/2016 07:44 AM, Vick Khera wrote: On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananichwrote: In my situation this order is invalid. Obviously, year 2016 should go after 2014, like that: I think you expect JSONB to sort differently than it does. I cannot imagine what a "natural" ordering of arbitrary JSON objects is. FWIW, Postgres arrays do sort in the way he's expecting: paul=# create table t (id integer, v integer[]); CREATE TABLE paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, array[2016]); INSERT 0 3 paul=# select * from t order by v; id |v +-- 1 | {2014} 2 | {2014,1} 3 | {2016} (3 rows) So maybe convert to an array before sorting? Paul -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] GIN Indexes: Extensibility
On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananichwrote: > In my situation this order is invalid. Obviously, year 2016 should go after > 2014, like that: I think you expect JSONB to sort differently than it does. I cannot imagine what a "natural" ordering of arbitrary JSON objects is. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] GIN Indexes: Extensibility
Dear All, Here is what I have: user=# create table FOO (key jsonb); CREATE TABLE user=# insert into FOO(key) values ('[2014]'), ('[2015]'), ('[2016]'), ('[2014, 2]'), ('[2014, 2, 3]'), ('[2014, 3]'), ('[2014,2,4]'), ('[2014, 2,4]'), ('[2014,3,13]'), ('[2014, 2, 15]'); INSERT 0 10 user=# SELECT key FROM FOO order by key; key --- [2014] [2015] <== [2016] <== [2014, 2] [2014, 3] <== [2014, 2, 3] [2014, 2, 4] [2014, 2, 4] [2014, 2, 15] [2014, 3, 13] (10 rows) In my situation this order is invalid. Obviously, year 2016 should go after 2014, like that: key --- [2014] [2014, 2] [2014, 2, 3] [2014, 2, 4] [2014, 2, 4] [2014, 2, 15] [2014, 3] <== [2014, 3, 13] [2015] <== [2016] <== This is a simplified example, my real application is much more complicated and sorted arrays could have tens of values, could even be arrays of arrays.For this reason I need to customize sort function. I found that there is a way to customize function `int compare(Datum a, Datum b)` (proof link: https://www.postgresql.org/docs/9.5/static/gin-extensibility.html), but I found no example which shows how to use it. I’d appreciate any information which could help me to achieve the described result, even if it is just a link to an existing example. Regards, Anthony Ananich
Re: [GENERAL] GiN indexes
On Tue, 31 Aug 2010, Stavroula Gasparotto wrote: Currently, only the B-tree, GiST and GIN index types support multicolumn indexes. What does this mean exactly if I'm trying to create a multicolumn GIN index? Does this mean the index can contain one or more tsvector type fields only, or can I combine tsvector type field with other type fields in the index, such as a timestamp column? It does both. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general