Re: [HACKERS] On-disk bitmap index patch

2006-08-02 Thread Ron Mayer
Tom Lane wrote: > Both of these pages say up front that they are considering read-only > data. Can I assume read-mostly partitions could use the read-I/O efficient indexes on update-intensive partitions of the same table could use b-tree indexes? All of my larger (90GB+) tables can have partiti

Re: [HACKERS] On-disk bitmap index patch

2006-07-29 Thread Luke Lonergan
Bruce, On 7/29/06 6:31 AM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote: > Right. People need a patch to test on their workloads, and analysis can > be done after feature freeze. Fair enough. - Luke ---(end of broadcast)--- TIP 5: don't forget to

Re: [HACKERS] On-disk bitmap index patch

2006-07-29 Thread Bruce Momjian
Luke Lonergan wrote: > Mark, > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Friday, July 28, 2006 9:26 PM > > > > But irrefutable? Irrefutable is not true. :-) > > How about unrefuted. The evidence has not been refuted, and not > directly disc

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Luke Lonergan
Mark, > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Friday, July 28, 2006 9:26 PM > > But irrefutable? Irrefutable is not true. :-) How about unrefuted. The evidence has not been refuted, and not directly discussed or discounted. BTREE can not be op

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread mark
On Fri, Jul 28, 2006 at 02:43:23PM -0700, Luke Lonergan wrote: > On 7/28/06 1:25 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote: > > What we don't want to happen is for us to release bitmapped indexes, and > > find out later that btree is better in all cases. Then we have to tell > > people not to

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-07-28 kell 16:25, kirjutas Bruce Momjian: > What we don't want to happen is for us to release bitmapped indexes, and > find out later that btree is better in all cases. Actually I'd love it if adding bitmap indexes to core pg would magically make btree several times fa

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-07-28 kell 16:18, kirjutas Tom Lane: > "Dann Corbit" <[EMAIL PROTECTED]> writes: > > Others have looked into the usefulness of bitmap indexes. Here is what > > they found: > > http://www.oracle.com/technology/pub/articles/sharma_indexes.html > > I like this guy's style

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Andrew Dunstan
Tom Lane wrote: Both of these pages say up front that they are considering read-only data. So one of the questions that has to be answered (and the submitters have been entirely mum about) is exactly how bad is the update performance? If it's really awful that's going to constrain the use case

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Luke Lonergan
Bruce, On 7/28/06 1:25 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote: > What we don't want to happen is for us to release bitmapped indexes, and > find out later that btree is better in all cases. Then we have to tell > people not to use bitmapped indexes until we fix it in the next major > rele

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Bruce Momjian
What we don't want to happen is for us to release bitmapped indexes, and find out later that btree is better in all cases. Then we have to tell people not to use bitmapped indexes until we fix it in the next major releasse. FYI, that is basically where we are right now with hash indexes. -

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Tom Lane
"Dann Corbit" <[EMAIL PROTECTED]> writes: > Others have looked into the usefulness of bitmap indexes. Here is what > they found: > http://www.oracle.com/technology/pub/articles/sharma_indexes.html I like this guy's style of argument: he admits a bitmap index on a unique column will be much bigger

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Dann Corbit
TECTED] > Subject: Re: [HACKERS] On-disk bitmap index patch > > Jim, > > On 7/28/06 10:17 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > If the usefulness of bitmap indexes is still in doubt, could someone at > > Greenplum provide data from actual

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Bruce Momjian
Luke Lonergan wrote: > Jim, > > On 7/28/06 10:17 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > If the usefulness of bitmap indexes is still in doubt, could someone at > > Greenplum provide data from actual data warehouses from actual > > customers? > > First, is anyone in doubt? Sure. I

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Luke Lonergan
Jim, On 7/28/06 10:17 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > If the usefulness of bitmap indexes is still in doubt, could someone at > Greenplum provide data from actual data warehouses from actual > customers? First, is anyone in doubt? - Luke ---(end of bro

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread Jim C. Nasby
On Thu, Jul 27, 2006 at 09:13:21AM -0700, Jie Zhang wrote: > On 7/26/06 11:50 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > > "Jie Zhang" <[EMAIL PROTECTED]> writes: > >> On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > >>> ... A nonuniform distribution would probably mean that some > >>>

Re: [HACKERS] On-disk bitmap index patch

2006-07-27 Thread Jie Zhang
On 7/26/06 11:50 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Jie Zhang" <[EMAIL PROTECTED]> writes: >> On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: >>> ... A nonuniform distribution would probably mean that some >>> of the bitmaps compress better-than-expected and others worse. I

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Tom Lane
"Jie Zhang" <[EMAIL PROTECTED]> writes: > On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: >> ... A nonuniform distribution would probably mean that some >> of the bitmaps compress better-than-expected and others worse. I have >> no idea how to model that and guess what the overall resul

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Jie Zhang
On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Mark Kirkwood <[EMAIL PROTECTED]> writes: >> An obvious deduction is that the TPCH dataset is much more amenable to >> run compression than my synthetic Zipfian data was. The interesting >> question is how well "real" datasets are run

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes: > An obvious deduction is that the TPCH dataset is much more amenable to > run compression than my synthetic Zipfian data was. The interesting > question is how well "real" datasets are run compressable, Yeah --- the back-of-the-envelope calculations I w

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Jie Zhang
On 7/26/06 8:55 PM, "Luke Lonergan" <[EMAIL PROTECTED]> wrote: > Tom, > > On 7/26/06 7:26 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > >> I wonder >> whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs > > We tried variations from 8-bit to 32-bit and came to the conclusion that > 8-bit

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Luke Lonergan
Tom, On 7/26/06 7:26 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > I wonder > whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs We tried variations from 8-bit to 32-bit and came to the conclusion that 8-bit was a better match for the more general case. For the moment I forget exactly wh

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Mark Kirkwood
Tom Lane wrote: I'm surprised no one caught me making this bogus computation. I realized this morning it's wrong: if there are 1 distinct values then on the average the 1-bits would be about 1 bits apart, not 100. Right - I didn't think 1 was *that* bad, but was too sleepy to try

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Tom Lane
I wrote: >> ... A realistic assumption for the numbers you mention is >> that the bitmaps have 1-bits about 100 bits apart, which means you >> could get maybe 3-to-1 compression using the runlength scheme that's >> in there ... leaving the bitmap a factor of 20 bigger than the btree. I'm surprise

Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Ayush Parashar
> I find those results moderately unconvincing, primarily because they > are based on choosing the least efficient possible data representation > (viz char(n)), and thus the btree indexes suffer severe and quite > artificial bloat. A database schema chosen with even minimal attention The specific

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Mark Kirkwood
Tom Lane wrote: And your point is? Assuming a reasonable datatype like int4, a btree index on this table would occupy say 20GB (16 bytes per entry plus fillfactor overhead). The bitmap version would require 10,000 bitmaps of 1G bits apiece, or 1250GB (not even counting overhead). You need so

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> If you have to hit one tuple out of four, it's pretty much guaranteed >> that you will need to fetch every heap page. > I think it's not uncommon to have data that is more clustered than expected. Sure, if the data is fairly static ...

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Ron Mayer
Tom Lane wrote: [EMAIL PROTECTED] writes: Reading 1/4, for a larger table, has a good chance of being faster than reading 4/4 of the table. :-) Really? If you have to hit one tuple out of four, it's pretty much guaranteed that you will need to fetch every heap page. I think it's not uncom

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Tom Lane
Josh Berkus writes: > One particular compelling situation for on-disk bitmaps is for terabyte > tables where a btree index would not fit into memory. Index > performance for an index which is 10x or more the size of RAM really > sucks ... I can come up with some test results if you doubt that

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Luke Lonergan
Tom, On 7/25/06 1:31 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Yeah, we finally gave up on rtree entirely. I don't want to see any > other index AMs languishing in the closet like that. If bitmap can > hold its own to the extent that people are interested in working on > it/improving it, then

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: >> The reason I have such high sales resistance is that we've carried the >> hash and rtree AMs for years, hoping that someone would do the work to >> make them actually worth using, with l

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Josh Berkus
Tom, > (I'm also wondering whether this doesn't overlap the use-case for GIN.) It does not. GIN is strictly for multi-value fields. I can think of applications where either GIN or Bitmaps would be an option, but for the majority, they wouldn't. One particular compelling situation for on-

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: > > I think we do know, have you reviewed the results in the briefing? > > I find those results moderately unconvincing, primarily because they > are based on choosing the least efficient

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-25 kell 12:49, kirjutas Jim C. Nasby: > On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote: > > We presented them at the Postgres Anniversary summit talk (Bruce M. was > > there) and the reaction was let's get this into 8.2 because many people > > there (mor

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Jim C. Nasby
On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote: > We presented them at the Postgres Anniversary summit talk (Bruce M. was > there) and the reaction was let's get this into 8.2 because many people > there (more than 5) really wanted to use it as a standard feature. > > A version of t

Re: [HACKERS] On-disk bitmap index patch

2006-07-25 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > I think we do know, have you reviewed the results in the briefing? I find those results moderately unconvincing, primarily because they are based on choosing the least efficient possible data representation (viz char(n)), and thus the btree indexes suf

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Luke Lonergan
Cc: Bruce Momjian; Jie Zhang; Hannu Krosing; Gavin Sherry; pgsql-hackers@postgresql.org; Luke Lonergan Subject:Re: [HACKERS] On-disk bitmap index patch On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > Reading 1/4, for a larger table, h

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread mark
On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > Reading 1/4, for a larger table, has a good chance of being faster than > > reading 4/4 of the table. :-) > Really? > > If you have to hit one tuple out of four, it's pretty much guaranteed > that you will ne

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Tom Lane
[EMAIL PROTECTED] writes: > Reading 1/4, for a larger table, has a good chance of being faster than > reading 4/4 of the table. :-) Really? If you have to hit one tuple out of four, it's pretty much guaranteed that you will need to fetch every heap page. So using an index provides zero I/O savin

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Mark Kirkwood
Jie Zhang wrote: On 7/24/06 6:59 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote: And also for AND-s of several indexes, where indexes are BIG, your btree indexes may be almost as big as tables but the resulting set of pages is small. Yeah, Hannu points it out very well -- the bitmap index w

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread mark
On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote: > Jie Zhang wrote: > > > IIRC they quoted the cardinality of 1 as something that is still > > > faster than btree for several usecases. > > > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > > inde

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Gavin Sherry
On Mon, 24 Jul 2006, Bruce Momjian wrote: > Jie Zhang wrote: > > > IIRC they quoted the cardinality of 1 as something that is still > > > faster than btree for several usecases. > > > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > > indexes may be almost as

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Jie Zhang
On 7/24/06 6:04 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote: > Jie Zhang wrote: >>> IIRC they quoted the cardinality of 1 as something that is still >>> faster than btree for several usecases. >>> >>> And also for AND-s of several indexes, where indexes are BIG, your btree >>> indexes may

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Bruce Momjian
Jie Zhang wrote: > > IIRC they quoted the cardinality of 1 as something that is still > > faster than btree for several usecases. > > > > And also for AND-s of several indexes, where indexes are BIG, your btree > > indexes may be almost as big as tables but the resulting set of pages is > > sm

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Jie Zhang
On 7/24/06 6:59 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote: > Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane: >> Gavin Sherry <[EMAIL PROTECTED]> writes: >>> On Sun, 23 Jul 2006, Tom Lane wrote: However, the main problem I've got with this is that a new index AM is a

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane: > Gavin Sherry <[EMAIL PROTECTED]> writes: > > On Sun, 23 Jul 2006, Tom Lane wrote: > >> However, the main problem I've got with this is that a new index AM is a > >> pretty large burden, and no one's made the slightest effort to sell

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Jie Zhang
Thanks Tom and Gavin for your comments! Yes, this patch is generated against 8.2 in a short time. As long as the code is working, I post the patch to get some comments and help. >> >> * The xlog routines need help; they seem to not be updated for recent >> changes in the API for xlog recovery c

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Luke Lonergan
Tom, On 7/23/06 5:25 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > If the column is sufficiently low cardinality, you might as well just do > a seqscan --- you'll be hitting most of the heap's pages anyway. I'm > still waiting to be convinced that there's a sweet spot wide enough to > justify supp

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Tom Lane
Gavin Sherry <[EMAIL PROTECTED]> writes: > On Sun, 23 Jul 2006, Tom Lane wrote: >> However, the main problem I've got with this is that a new index AM is a >> pretty large burden, and no one's made the slightest effort to sell >> pghackers on taking this on. > For low cardinality sets, bitmaps gre

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Gavin Sherry
On Sun, 23 Jul 2006, Tom Lane wrote: > Gavin Sherry <[EMAIL PROTECTED]> writes: > > Is anyone else looking at this patch? > > It's on my list of things to look at, but so are a lot of other patches ;-) > > A couple of comments after a *very* fast scan through the patch: > > * The xlog routines nee

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Tom Lane
Gavin Sherry <[EMAIL PROTECTED]> writes: > Is anyone else looking at this patch? It's on my list of things to look at, but so are a lot of other patches ;-) A couple of comments after a *very* fast scan through the patch: * The xlog routines need help; they seem to not be updated for recent chan

Re: [HACKERS] On-disk bitmap index patch

2006-07-23 Thread Gavin Sherry
On Mon, 17 Jul 2006, Jie Zhang wrote: > Hi, > > I have posted a patch to the CVS head for on-disk bitmap index to > pgsql-patches. If this can get in 8.2, that would be great. Any comments and > suggestions are welcome. > > I still need to add several items: > > (1) README file in src/backend/acce