Re: [HACKERS] Hstore: Query speedups with Gin index

2013-09-09 Thread Blake Smith
Thanks for getting back to me about this change Oleg. I took your advice
and reworked the patch by adding a new hstore gin opclass
(gin_hstore_combined_ops) and leaving the functionality of the default
hstore gin opclass the same. This should prevent the on-disk compatibility
issues from the first patch, and allow users to select the different
indexing method when they build the index. The hstore regression suite is
passing for me locally with the --enable-cassert configure flag. Please let
me know what you think and if there is any other work that would need to be
done (style cleanups, updating documentation, etc) to get this merged.

Thanks!

Blake






On Fri, Sep 6, 2013 at 1:47 PM, Oleg Bartunov obartu...@gmail.com wrote:

 Blake,

 I think it's better to implement this patch as a separate opclass, so
 users will have option to choose indexing.

 Oleg


 On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith blakesmi...@gmail.com wrote:

 Thanks for the feedback everyone. I've attached the patch that we are now
 running in production to service our hstore include queries. We rebuilt the
 index to account for the on-disk incompatibility. I've submitted the patch
 to commitfest here:
 https://commitfest.postgresql.org/action/patch_view?id=1203

 Michael: I don't have a formal benchmark, but several of our worst
 queries went from 10-20 seconds per query down to 50-400 ms. These are
 numbers we've seen when testing real production queries against our
 production dataset with real world access patterns.
 Oleg: Thanks for your thoughts on this change. As for the spgist / gin
 work you're doing, is there anything you need help with or are you still in
 the research phase? I'd love to help get something more robust merged into
 mainline if you think there's collaborative work to be done (even if it's
 only user testing).

 Thanks,

 Blake




 On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund 
 and...@2ndquadrant.comwrote:

 On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
  On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
   Michael Paquier michael.paqu...@gmail.com writes:
On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith 
 blakesmi...@gmail.com wrote:
The combined entry is used to support contains (@) queries,
 and the key
only item is used to support key contains (?) queries. This
 change seems
to help especially with hstore keys that have high cardinalities.
 Downsides
of this change is that it requires an index rebuild, and the
 index will be
larger in size.
  
Index rebuild would be a problem only for minor releases,
  
   That's completely false; people have expected major releases to be
   on-disk-compatible for several years now.  While there probably will
 be
   future releases in which we are willing to break storage
 compatibility,
   a contrib module doesn't get to dictate that.
  
   What might be a practical solution, especially if this isn't always a
   win (which seems likely given the index-bloat risk), is to make
 hstore
   offer two different GIN index opclasses, one that works the
 traditional
   way and one that works this way.
  
   Another thing that needs to be taken into account here is Oleg and
   Teodor's in-progress work on extending hstore:
   https://www.pgcon.org/2013/schedule/events/518.en.html
   I'm not sure if this patch would conflict with that at all, but it
   needs to be considered.
 
  We can disallow in-place upgrades for clusters that use certain contrib
  modules --- we have done that in the past.

 But that really cannot be acceptable for hstore. The probably most
 widely used extension there is.

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




 --
 Blake Smith
 http://blakesmith.me
 @blakesmith


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





-- 
Blake Smith
http://blakesmith.me
@blakesmith


0001-Add-gin_hstore_combined_ops-hstore-indexing-opclass.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hstore: Query speedups with Gin index

2013-09-05 Thread Blake Smith
Hi Peter,

Thanks for checking the tests. I wasn't able to duplicate your test
results. Did you run the hstore regression tests with the revised patch I
attached in the thread?  Attached is the output I got with the latest patch
applied.

Thanks!

Blake



On Tue, Sep 3, 2013 at 8:04 PM, Peter Eisentraut pete...@gmx.net wrote:

 On Tue, 2013-09-03 at 09:24 -0500, Blake Smith wrote:
  Thanks for the feedback everyone. I've attached the patch that we are
  now running in production to service our hstore include queries. We
  rebuilt the index to account for the on-disk incompatibility. I've
  submitted the patch to commitfest
  here: https://commitfest.postgresql.org/action/patch_view?id=1203

 I'm getting the attached failure from the hstore regression test.






-- 
Blake Smith
http://blakesmith.me
@blakesmith


regressions.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hstore: Query speedups with Gin index

2013-09-03 Thread Blake Smith
Thanks for the feedback everyone. I've attached the patch that we are now
running in production to service our hstore include queries. We rebuilt the
index to account for the on-disk incompatibility. I've submitted the patch
to commitfest here:
https://commitfest.postgresql.org/action/patch_view?id=1203

Michael: I don't have a formal benchmark, but several of our worst queries
went from 10-20 seconds per query down to 50-400 ms. These are numbers
we've seen when testing real production queries against our production
dataset with real world access patterns.
Oleg: Thanks for your thoughts on this change. As for the spgist / gin work
you're doing, is there anything you need help with or are you still in the
research phase? I'd love to help get something more robust merged into
mainline if you think there's collaborative work to be done (even if it's
only user testing).

Thanks,

Blake




On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
  On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
   Michael Paquier michael.paqu...@gmail.com writes:
On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith blakesmi...@gmail.com
 wrote:
The combined entry is used to support contains (@) queries, and
 the key
only item is used to support key contains (?) queries. This
 change seems
to help especially with hstore keys that have high cardinalities.
 Downsides
of this change is that it requires an index rebuild, and the index
 will be
larger in size.
  
Index rebuild would be a problem only for minor releases,
  
   That's completely false; people have expected major releases to be
   on-disk-compatible for several years now.  While there probably will be
   future releases in which we are willing to break storage compatibility,
   a contrib module doesn't get to dictate that.
  
   What might be a practical solution, especially if this isn't always a
   win (which seems likely given the index-bloat risk), is to make hstore
   offer two different GIN index opclasses, one that works the traditional
   way and one that works this way.
  
   Another thing that needs to be taken into account here is Oleg and
   Teodor's in-progress work on extending hstore:
   https://www.pgcon.org/2013/schedule/events/518.en.html
   I'm not sure if this patch would conflict with that at all, but it
   needs to be considered.
 
  We can disallow in-place upgrades for clusters that use certain contrib
  modules --- we have done that in the past.

 But that really cannot be acceptable for hstore. The probably most
 widely used extension there is.

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
Blake Smith
http://blakesmith.me
@blakesmith


hstore_gin_speedup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hstore: Query speedups with Gin index

2013-08-22 Thread Blake Smith
Hey everyone,

I'm looking for feedback on a contrib/hstore patch.

We've been experiencing slow @ queries involving an hstore column that's
covered by a Gin index. At the current postgresql git HEAD, the hstore -
gin interface produces the following text items to be indexed:

hstore: 'a'='1234', 'b'='test'
Produces indexed text items: Ka, V1234, Kb, Vtest

For the size of our production table (10s of millions of rows), I observed
significant query speedups by changing the index strategy to the following:

hstore: 'a'='1234', 'b'='test'
Produces indexed text items: Ka, KaV1234, Kb, KbVtest

The combined entry is used to support contains (@) queries, and the key
only item is used to support key contains (?) queries. This change seems
to help especially with hstore keys that have high cardinalities. Downsides
of this change is that it requires an index rebuild, and the index will be
larger in size.

Patch attached. Any thoughts on this change?

Thanks,

Blake


hstore_gin_speedup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers