> -----Original Message----- > > Minor question on this patch. AFAICS there is another patch that seems > > to be aiming at exactly the same use case. Jonah's Bloom filter patch. > > > > Shouldn't we have a dust off to see which one is best? Or at least a > > discussion to test whether they overlap? Perhaps you already did that > > and I missed it because I'm not very tuned in on this thread. > > > > -- > > Simon Riggs www.2ndQuadrant.com > > PostgreSQL Training, Services and Support > > We haven't had that discussion AFAIK, and definitely should. First > glance suggests they could coexist peacefully, with proper coaxing. If > I understand things properly, Jonah's patch filters tuples early in > the join process, and this patch tries to ensure that hash join > batches are kept in RAM when they're most likely to be used. So > they're orthogonal in purpose, and the patches actually apply *almost* > cleanly together. Jonah, any comments? If I continue to have some time > to devote, and get through all I think I can do to review this patch, > I'll gladly look at Jonah's too, FWIW. > > - Josh
The skew patch and bloom filter patch are orthogonal and can both be applied. The bloom filter patch is a great idea, and it is used in many other database systems. You can use the TPC-H data set to demonstrate that the bloom filter patch will significantly improve performance of multi-batch joins (with or without data skew). Any query that filters a build table before joining on the probe table will show improvements with a bloom filter. For example, select * from customer, orders where customer.c_nationkey = 10 and customer.c_custkey = orders.o_custkey The bloom filter on customer would allow us to avoid probing with orders tuples that cannot possibly find a match due to the selection criteria. This is especially beneficial for multi-batch joins where an orders tuple must be written to disk if its corresponding customer batch is not the in-memory batch. I have no experience reviewing patches, but I would be happy to help contribute/review the bloom filter patch as best I can. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers