Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-04 Thread Jeremy Finzel
On Sun, Jun 2, 2019 at 6:07 PM Tom Lane wrote: > Anyway, the larger point here is that right now btree_gin is just a quick > hack, and it seems like it might be worth putting some more effort into > it, because the addition of duplicate-compression changes the calculus > for whether it's useful.

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-03 Thread Jeff Janes
On Sun, Jun 2, 2019 at 7:07 PM Tom Lane wrote: > Jeff Janes writes: > > On Fri, May 24, 2019 at 11:26 AM Jeremy Finzel > wrote: > >> I have been hoping for clearer direction from the community about > >> specifically btree_gin indexes for low cardinality columns (as well as > low > >>

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-03 Thread Peter Geoghegan
On Sun, Jun 2, 2019 at 4:07 PM Tom Lane wrote: > Anyway, I said to Jeremy in the hallway that it might not be that > hard to bolt IOS support onto GIN for cases where the opclass is > a non-subdividing one, but after looking at the code I'm less sure > about that. GIN hasn't even got an

RE: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-03 Thread Steven Winfield
> In the case of a single column with a small set of distinct values over a > large set of rows, how would a Bloom filter be preferable to, say, a GIN > index on an integer value?  I don't think it would - it's probably better suited to the multi-column case you described previously. > I have

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-03 Thread Morris de Oryx
I didn't notice Bloom filters in the conversation so far, and have been waiting for *years* for a good excuse to use a Bloom filter. I ran into them years back in Splunk, which is a distributed log store. There's an obvious benefit to a probabalistic tool like a Bloom filter there since remote

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-02 Thread Morris de Oryx
Thanks to Tom Lane and Jeff Janes for chiming in with the level of detail they're able to provide. As an outsider-who-now-loves-Postgres, I don't know the history or deep details of all of the various index types. (Obviously.) As a long-time database programmer, I can say that low-cardinality

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-02 Thread Tom Lane
Jeff Janes writes: > On Fri, May 24, 2019 at 11:26 AM Jeremy Finzel wrote: >> I have been hoping for clearer direction from the community about >> specifically btree_gin indexes for low cardinality columns (as well as low >> cardinality multi-column indexes). In general there is very little >>

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-02 Thread Jeff Janes
On Fri, May 24, 2019 at 11:26 AM Jeremy Finzel wrote: > I have been hoping for clearer direction from the community about > specifically btree_gin indexes for low cardinality columns (as well as low > cardinality multi-column indexes). In general there is very little > discussion about this

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-02 Thread Morris de Oryx
Peter, Thanks a lot for the remedial help on EXPLAIN and timing results.

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-02 Thread Peter J. Holzer
On 2019-06-02 09:10:25 +1000, Morris de Oryx wrote: > Peter, thanks a lot for picking up on what I started, improving it, and > reporting back. I thought I was providing timing estimates from the EXPLAIN > cost dumps. Seems not. Well, there's another thing that I've learned. The cost is how long

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-01 Thread Morris de Oryx
>From what Peter showed, the answer to (part of) the original questions seems to be that *yes*, a B-tree GIN can be quite appealing. The test times aren't too worrisome, and the index size is about 1/12th of a B-tree. I added on the sizes, and divided each index size by a full B-tree: Method

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-01 Thread Morris de Oryx
Peter, thanks a lot for picking up on what I started, improving it, and reporting back. I *thought *I was providing timing estimates from the EXPLAIN cost dumps. Seems not. Well, there's another thing that I've learned. Your explanation of why the hash index bloats out makes complete sense, I

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-01 Thread Peter J. Holzer
On 2019-06-01 17:44:00 +1000, Morris de Oryx wrote: > Since I've been wondering about this subject, I figured I'd take a bit of time > and try to do some tests. I'm not new to databases or coding, but have been > using Postgres for less than two years. I haven't tried to generate large > blocks of

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-01 Thread Gavin Flower
On 01/06/2019 14:52, Morris de Oryx wrote: [...] For an example, imagine an address table with 100M US street addresses with two character state abbreviations. So, say there are around 60 values in there (the USPS is the mail system for a variety of US territories, possessions and friends in

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-01 Thread Morris de Oryx
Since I've been wondering about this subject, I figured I'd take a bit of time and try to do some tests. I'm not new to databases or coding, but have been using Postgres for less than two years. I haven't tried to generate large blocks of test data directly in Postgres before, so I'm *sure* that

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-31 Thread Morris de Oryx
Jeremy's question is *great*, and really well presented. I can't answer his questions, but I am keenly interested in this subject as well. The links he provides lead to some really interesting and well-though-out pieces, well worth reading. I'm going to try restating things in my own way in hopes

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-31 Thread Thomas Kellerer
Will Hartung schrieb am 31.05.2019 um 00:11: > If you have 10M rows with a “STATUS” column of 1 or 2, and an index > on that column, then you have a 2 node index with a bazillion row > pointers. Some systems (I can’t speak to PG in this regard) > degenerate in this kind of use case since the index

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-31 Thread Peter J. Holzer
On 2019-05-30 21:00:57 +0200, Peter J. Holzer wrote: > Firstly, the GIN index doesn't generally index single characters. It > uses some rule to split the field into tokens. For example, for a text > field, it might split the field into words (possibly with some > normalization like case-folding

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-30 Thread Peter J. Holzer
On 2019-05-30 10:11:36 -0700, Will Hartung wrote: > Well part of the problem is that “B-tree” is a pretty generic term. Pretty > much > every index is either some form of hash, or some form of tree. We use the term > B-tree, even though pretty much no major index is actually a binary tree. There

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-30 Thread Will Hartung
> On May 29, 2019, at 10:59 AM, Jeremy Finzel wrote: > > > > On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel > wrote: > I have been hoping for clearer direction from the community about > specifically btree_gin indexes for low cardinality columns (as well as low >

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-29 Thread Jeremy Finzel
On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel wrote: > I have been hoping for clearer direction from the community about > specifically btree_gin indexes for low cardinality columns (as well as low > cardinality multi-column indexes). In general there is very little > discussion about this

Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-24 Thread Jeremy Finzel
I have been hoping for clearer direction from the community about specifically btree_gin indexes for low cardinality columns (as well as low cardinality multi-column indexes). In general there is very little discussion about this both online and in the docs. Rather, the emphasis for GIN indexes