Re: [HACKERS] join removal

2009-08-10 Thread Lawrence, Ramon
> I took at a first crack at coding up an implementation of > relation_is_distinct_for() tonight. I am not sure if this will help or not, but on the 8.4 code base we implemented two functions: - getCandidateKeys() - would recursively traverse a tree from a given node to the leaf nodes and determi

Re: [HACKERS] HashJoin w/option to unique-ify inner rel

2009-04-16 Thread Lawrence, Ramon
> Upon further review, it appears that a big part of this problem is > that cost_hashjoin() doesn't understand that it needs cost semi-joins > differently from inner or left joins. The bogus logic looks to be > right here: > startup_cost += hash_qual_cost.startup; > run_cost += hash_qual_c

Re: [HACKERS] a few crazy ideas about hash joins

2009-04-03 Thread Lawrence, Ramon
> > I would be especially interested in using a shared memory hash table > > that *all* backends can use - if the table is mostly read-only, as > > dimension tables often are in data warehouse applications. That would > > give zero startup cost and significantly reduced memory. > > I think that's

Re: [HACKERS] a few crazy ideas about hash joins

2009-04-03 Thread Lawrence, Ramon
> While investigating some performance problems recently I've had cause > to think about the way PostgreSQL uses hash joins. So here are a few > thoughts. Some of these have been brought up before. > > 1. When the hash is not expected to spill to disk, it preserves the > pathkeys of the outer si

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-03-06 Thread Lawrence, Ramon
> > I think you missed the point of the performance questions.  It wasn't > > about avoiding extra simple if-tests in the per-tuple loops; a few of > > those are certainly not going to add measurable cost given how complex > > the code is already.  (I really don't think you should be duplicating >

Re: [HACKERS] Proposed Patch to Improve Performance ofMulti-BatchHash Join for Skewed Data Sets

2009-02-26 Thread Lawrence, Ramon
> That seems VERY useful - can you post the other ones (Z1, etc.) so I > can download them all? The Z1 data set is posted at: http://people.ok.ubc.ca/rlawrenc/tpch1g1z.zip I have not generated Z2, Z3, Z4 for 1G, but I can generate the Z2 and Z3 data sets, and in a hour or two they will be at: h

Re: [HACKERS] Proposed Patch to Improve Performance ofMulti-BatchHash Join for Skewed Data Sets

2009-02-26 Thread Lawrence, Ramon
> They're automatically generated by the dbgen utility, a link to which > was originally published somewhere in this thread. That tool creates a > few text files suitable (with some tweaking) for a COPY command. I've > got the original files... the .tbz I just made is 1.8 GB :) Anyone have > somepl

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-02-26 Thread Lawrence, Ramon
> From: Tom Lane > Heikki's got a point here: the planner is aware that hashjoin doesn't > like skewed distributions, and it assigns extra cost accordingly if it > can determine that the join key is skewed. (See the "bucketsize" stuff > in cost_hashjoin.) If this patch is accepted we'll want to t

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-02-24 Thread Lawrence, Ramon
> -Original Message- > From: Robert Haas > Sadly, there seem to be a number of cases in the Z7 database where the > optimization makes things significantly worse (specifically, queries > 2, 3, and 7, but especially query 3). Have you investigated what is > going on there? I had thought th

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-02-19 Thread Lawrence, Ramon
From: pgsql-hackers-ow...@postgresql.org on behalf of Robert Haas I think what we need here is some very simple testing to demonstrate that this patch demonstrates a speed-up even when the inner side of the join is a joinrel rather than a baserel. Can you suggest

Re: [HACKERS] The testing of multi-batch hash joins with skewed data sets patch

2009-02-10 Thread Lawrence, Ramon
> -Original Message- > From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- > ow...@postgresql.org] On Behalf Of Tom Lane > But really there are two different performance regimes here, one where > the hash data is large enough to spill to disk and one where it isn't. > Reducing w

Re: [HACKERS] The testing of multi-batch hash joins with skewed data sets patch

2009-02-10 Thread Lawrence, Ramon
> The idea I came up with for benchmarking was a little similar to what I > remember from the original tests. I have a sales orders table and a > products > table. My version of the sales orders table contains a customer column. > Data > for 10 customers is populated into the sales orders table, cu

[HACKERS] Help with Join Performance Testing

2009-01-21 Thread Lawrence, Ramon
A hash join modification patch is under review for 8.4 that needs performance testing. We would appreciate help with this testing. A testing version of the patch is attached in addition to testing instructions and where to retrieve a sample data set. The basic idea of the patch is that it

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2009-01-14 Thread Lawrence, Ramon
> Here is a cleaned-up version. I fixed a number of whitespace issues, > improved a few comments, and rearranged one set of nested if-else > statements (hopefully without breaking anything in the process). > > Josh / eggyknap - > > Can you rerun your performance tests with this version of the p

Re: [HACKERS] Potential Join Performance Issue

2009-01-07 Thread Lawrence, Ramon
> Has this been completed? TODO item? > > > I'd be more inclined to deal with the issue by trying to establish a > > > "safety margin" in the estimate of whether the hash will go > > multi-batch. > > > IOW we should disuse_physical_tlist if the hash is estimated to be > > close to but still withi

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2008-12-28 Thread Lawrence, Ramon
> I thought about this, but upon due reflection I think it's the wrong > approach. Raising work_mem is a pretty common tuning step - it's 4MB > even on my small OLTP systems, and in a data-warehousing environment > where this optimization will bring the most benefit, it could easily > be higher.

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2008-12-27 Thread Lawrence, Ramon
> -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > I looked at this some more. I'm a little concerned about the way > we're maintaining the in-memory hash table. Since the highest legal > statistics target is now 10,000, it's possible that we could have two > orders

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

2008-12-23 Thread Lawrence, Ramon
> > > Because there is no nice way in PostgreSQL (that I know of) to derive > > > a histogram after a join (on an intermediate result) currently > > > usingMostCommonValues is only enabled on a join when the outer (probe) > > > side is a table scan (seq scan only actually). See > > > getMostCommon

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-12-17 Thread Lawrence, Ramon
ql.org [mailto:pgsql-hackers- > ow...@postgresql.org] On Behalf Of Robert Haas > Sent: December 17, 2008 7:54 PM > To: Lawrence, Ramon > Cc: Tom Lane; pgsql-hackers@postgresql.org; Bryce Cutt > Subject: Re: [HACKERS] Proposed Patch to Improve Performance of Multi- > Batch Hash Join for

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-11-24 Thread Lawrence, Ramon
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > I'm a tad worried about what happens when the values that are frequently > occurring in the outer relation are also frequently occurring in the > inner (which hardly seems an improbable case). Don't you stand a severe > risk

Re: [HACKERS] WIP: Hash Join-Filter Pruning using Bloom Filters

2008-11-10 Thread Lawrence, Ramon
> -Original Message- > From: Jonah H. Harris [mailto:[EMAIL PROTECTED] > I have a new patch which does not create a bloom filter unless it sees > that the hash join is going to batch. I'll send it along later > tonight. > > Currently it's additional space not accounted for by work_mem. >

Re: [HACKERS] WIP: Hash Join-Filter Pruning using Bloom Filters

2008-11-10 Thread Lawrence, Ramon
> -Original Message- > On Sun, Nov 2, 2008 at 10:49 PM, Jonah H. Harris > <[EMAIL PROTECTED]> wrote: > It's effective as-is for a preliminary patch. The GUC code is the > least of my worries. > > > Can you provide some figures on the performance impact of the bloom > filter? I have teste

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-11-06 Thread Lawrence, Ramon
> -Original Message- > > Minor question on this patch. AFAICS there is another patch that seems > > to be aiming at exactly the same use case. Jonah's Bloom filter patch. > > > > Shouldn't we have a dust off to see which one is best? Or at least a > > discussion to test whether they overlap

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-11-02 Thread Lawrence, Ramon
> From: Tom Lane [mailto:[EMAIL PROTECTED] > What alternatives are there for people who do not run Windows? > > regards, tom lane The TPC-H generator is a standard code base provided at http://www.tpc.org/tpch/. We have been able to compile this code on Linux. However, we

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-11-02 Thread Lawrence, Ramon
Okanagan E-mail: [EMAIL PROTECTED] > -Original Message- > From: Joshua Tolley [mailto:[EMAIL PROTECTED] > Sent: November 1, 2008 3:42 PM > To: Lawrence, Ramon > Cc: pgsql-hackers@postgresql.org; Bryce Cutt > Subject: Re: [HACKERS] Proposed Patch to Improve Performance of M

[HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-10-20 Thread Lawrence, Ramon
We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. Project name: Histojoin Patch file: histojoin_v1.patch This patch implements the Histojoin join algorithm as an optional feature added to the standard Hybrid Hash

Re: [HACKERS] Potential Join Performance Issue

2008-09-12 Thread Lawrence, Ramon
> From: Tom Lane [mailto:[EMAIL PROTECTED] > I was intending to do it the other way, actually. An extra field in > HashPath hardly costs anything. The other reason for it is that there > are other possible uses for knowing whether a hash will be multi-batch. > (For example, if we were prepared to

[HACKERS] Potential Join Performance Issue

2008-09-09 Thread Lawrence, Ramon
PostgreSQL development community: Our research group has been using the PostgreSQL code base to test new join algorithms. During testing, we noticed that the planner is not pushing down projections to the outer relation in a hash join. Although this makes sense for in-memory (1 batch) joins,