Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Because there is no nice way in PostgreSQL (that I know of) to derive a histogram after a join (on an intermediate result) currently usingMostCommonValues is only enabled on a join when the outer (probe) side is a table scan (seq scan only actually). See getMostCommonValues (soon to be called ExecHashJoinGetMostCommonValues) for the logic that determines this. Here is the result of explain (on a 100MB version of PostgreSQL): Hash Left Join (cost=16232.00..91035.00 rows=60 width=526) Hash Cond: (l.l_partkey = p.p_partkey) - Hash Left Join (cost=15368.00..75171.00 rows=60 width=395) Hash Cond: (l.l_orderkey = o.o_orderkey) - Seq Scan on lineitem l (cost=0.00..17867.00 rows=60 width=125) - Hash (cost=8073.00..8073.00 rows=15 width=270) - Hash Left Join (cost=700.50..8073.00 rows=15 width=270) Hash Cond: (o.o_custkey = c.c_custkey) - Seq Scan on orders o (cost=0.00..4185.00 rows=15 width=109) - Hash (cost=513.00..513.00 rows=15000 width=161) - Seq Scan on customer c (cost=0.00..513.00 rows=15000 width=161) - Hash (cost=614.00..614.00 rows=2 width=131) - Seq Scan on part p (cost=0.00..614.00 rows=2 width=131) If you take a look at the explain for that join you will see that the first of the relations joined are orders and customer on custkey. There is almost no skew in the o_custkey attribute of orders even in the Z2 dataset so the difference between hashjoin with and without usingMostCommonValues enabled is quite small. The second join performed is to join the result of the first join with lineitem on orderkey. There is no skew at all in the l_orderkey attribute of lineitem so usingMostCommonValues can not help at all. The third join performed is to join the result of the second join with part on partkey. There is a lot of skew in the l_partkey attribute of lineitem but because the probe side of the third join is an intermediate from the second join and not a seq scan the algorithm cannot figure out the MCVs of the probe side. So on the query presented almost no skew can be exploited on the first join and no other joins can have their skew exploited at all because of the order PostgreSQL does the joins in. Basically yes, you would not see any real benefit from using the most common values on this query. We experimented with sampling (mentioned in the paper) to make an educated guess of MCVs on intermediary results but found that because a random sample could not be obtained the results were always very inaccurate. I basically just read a percentage of tuples from the probe relation before partitioning the build relation, derived the MCVs in a single pass, wrote the tuples back out to a temp file (because reading back from here is less expensive than resetting the probe side tree), then did the join as usual while remembering to read back from my temp file before reading the rest of the probe side tuples. Our tests indicate that sampling is not likely a good solution for deriving MCVs from intermediary results. In the Java implementation of histojoin we experimented with exploiting skew in multiple joins of a star join with some success (detailed in paper). I am not sure how this would be accomplished nicely in PostgreSQL. If the cost operators knew how to order the joins to make the best use of skew in the relations PostgreSQL could use the benefits of histojoin more often if perhaps doing a join with skew first would have speed benefits over doing the smaller join first. This change could be a future addition to PostgreSQL if this patch is accepted. It relies on getting the stats tuple for the join during the planning phase (in the cost function) and estimating the benefit that would have on the join cost. - Bryce Cutt On Mon, Dec 22, 2008 at 6:15 AM, Joshua Tolley eggyk...@gmail.com wrote: On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote: [Some performance testing.] I (finally!) have a chance to post my performance testing results... my apologies for the really long delay. Excuses omitted Unfortunately I'm not seeing wonderful speedups with the particular queries I did in this case. I generated three 1GB datasets, with skews set at 1, 2, and 3. The test script I wrote turns on enable_usestatmcvs and runs EXPLAIN ANALYZE on the same query five times. Then it turns enable_usestatmcvs off, and runs the same query five more times. It does this with each of the three datasets in turn, and then starts over at the beginning until I tell it to quit. My results showed a statistically significant improvement in speed only on the skew == 3 dataset. I did the same tests twice, once with default_statistics_target set to 10, and once with it set to 100. I've attached boxplots of the total query times as reported by EXPLAIN ANALYZE (dst10 in the filename indicates default_statistics_target
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt pandas...@gmail.com wrote: Because there is no nice way in PostgreSQL (that I know of) to derive a histogram after a join (on an intermediate result) currently usingMostCommonValues is only enabled on a join when the outer (probe) side is a table scan (seq scan only actually). See getMostCommonValues (soon to be called ExecHashJoinGetMostCommonValues) for the logic that determines this. It's starting to seem to me that the case where this patch provides a benefit is so narrow that I'm not sure it's worth the extra code. Admittedly, when it works, it is pretty dramatic, as in the numbers that I posted previously. I'm OK with the fact that it is restricted to hash joins on a single variable where the probe relation is a sequential scan, because that actually happens pretty frequently, at least in my queries. But, if there's no way to consistently get any benefit out of this when joining more than two tables, then I'm not sure it's worth it. Is it realistic to think that the MCVs of the base relation might still be applicable to the joinrel? It's certainly easy to think of counterexamples, but it might be a good approximation more often than not. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Tue, Dec 23, 2008 at 09:22:27AM -0500, Robert Haas wrote: On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt pandas...@gmail.com wrote: Because there is no nice way in PostgreSQL (that I know of) to derive a histogram after a join (on an intermediate result) currently usingMostCommonValues is only enabled on a join when the outer (probe) side is a table scan (seq scan only actually). See getMostCommonValues (soon to be called ExecHashJoinGetMostCommonValues) for the logic that determines this. So my test case of do a whole bunch of hash joins in a test query isn't really valid. Makes sense. I did another, more haphazard test on a query with fewer joins, and saw noticeable speedups. It's starting to seem to me that the case where this patch provides a benefit is so narrow that I'm not sure it's worth the extra code. Not that anyone asked, but I don't consider myself qualified to render judgement on that point. Code size is, I guess, a maintainability issue, and I'm not terribly experienced maintaining PostgreSQL :) Is it realistic to think that the MCVs of the base relation might still be applicable to the joinrel? It's certainly easy to think of counterexamples, but it might be a good approximation more often than not. It's equivalent to our assumption that distributions of values in columns in the same table are independent. Making that assumption in this case would probably result in occasional dramatic speed improvements similar to the ones we've seen in less complex joins, offset by just-as-occasional dramatic slowdowns of similar magnitude. In other words, it will increase the variance of our results. - Josh signature.asc Description: Digital signature
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
It's equivalent to our assumption that distributions of values in columns in the same table are independent. Making that assumption in this case would probably result in occasional dramatic speed improvements similar to the ones we've seen in less complex joins, offset by just-as-occasional dramatic slowdowns of similar magnitude. In other words, it will increase the variance of our results. Under what circumstances do you think that it would produce a dramatic slowdown? I'm confused. I thought the penalty for picking a bad set of values for the in-memory hash table was pretty small. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Tue, Dec 23, 2008 at 10:14:29AM -0500, Robert Haas wrote: It's equivalent to our assumption that distributions of values in columns in the same table are independent. Making that assumption in this case would probably result in occasional dramatic speed improvements similar to the ones we've seen in less complex joins, offset by just-as-occasional dramatic slowdowns of similar magnitude. In other words, it will increase the variance of our results. Under what circumstances do you think that it would produce a dramatic slowdown? I'm confused. I thought the penalty for picking a bad set of values for the in-memory hash table was pretty small. ...Robert I take that back :) I agree with what others have already said, that it shouldn't cause dramatic slowdowns when we get it wrong. - Josh signature.asc Description: Digital signature
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote: [Some performance testing.] I (finally!) have a chance to post my performance testing results... my apologies for the really long delay. Excuses omitted Unfortunately I'm not seeing wonderful speedups with the particular queries I did in this case. I generated three 1GB datasets, with skews set at 1, 2, and 3. The test script I wrote turns on enable_usestatmcvs and runs EXPLAIN ANALYZE on the same query five times. Then it turns enable_usestatmcvs off, and runs the same query five more times. It does this with each of the three datasets in turn, and then starts over at the beginning until I tell it to quit. My results showed a statistically significant improvement in speed only on the skew == 3 dataset. I did the same tests twice, once with default_statistics_target set to 10, and once with it set to 100. I've attached boxplots of the total query times as reported by EXPLAIN ANALYZE (dst10 in the filename indicates default_statistics_target was 10, and so on), my results parsed out of the EXPLAIN ANALYZE output (test.filtered.10 and test.filtered.100), the results of one-tailed Student's T tests of the result set (ttests), and the R code to run the tests if anyone's really interested (t.test.R). The results data includes six columns: the skew value, whether enable_usestatmcvs was on or not (represented by a 1 or 0), total times for each of the three joins that made up the query, and total time for the query itself. The results above pay attention only to the total query time. Finally, the query involved: SELECT * FROM lineitem l LEFT JOIN part p ON (p.p_partkey = l.l_partkey) LEFT JOIN orders o ON (o.o_orderkey = l.l_orderky) LEFT JOIN customer c ON (c.c_custkey = o.o_custkey); - Josh / eggyknap attachment: boxplot-dst10.pngattachment: boxplot-dst100.pngSKEWUSESTAT J1 J2 J3 TOT 1 1 50461.443000397244.673000 453217.081000 459501.492 1 1 47884.085000392737.144000 453039.924000 460809.210 1 1 52175.049000473484.66 518528.66 523864.739 1 1 47127.359000463970.123000 510257.929000 515556.171 1 1 49382.039000492098.877000 542123.146000 547503.329 1 0 43094.98464022.565000 509026.652000 514349.238 1 0 45901.734000439642.013000 490180.994000 495489.335 1 0 43127.40430072.203000 475914.797000 481192.279 1 0 42070.676000375572.825000 423910.457000 429677.988 1 0 56491.288000498455.906000 551204.091000 557467.631 2 1 58372.411000461959.358000 508724.227000 514004.653 2 1 55187.182000451564.246000 497331.791000 502957.730 2 1 61093.577000443683.358000 493160.552000 498868.413 2 1 55299.883000482283.701000 541617.568000 548030.650 2 1 54002.928000499089.964000 544504.041000 549828.715 2 0 56133.232000452656.945000 501956.569000 507287.362 2 0 56900.88478264.522000 537943.058000 544455.088 2 0 61512.999000480176.724000 541688.121000 548684.876 2 0 55106.671000474847.36 522074.604000 527428.018 2 0 57440.536000512357.019000 558515.194000 563922.575 3 1 48912.233000519270.741000 562948.024000 568318.976 3 1 51509.014000455114.005000 502253.369000 507639.017 3 1 48977.903000399254.515000 442796.459000 448157.712 3 1 52664.751000398226.595000 02.503000 449745.454 3 1 57036.981000498623.476000 541792.07 547105.638 3 0 53972.755000490592.656000 544792.70 550086.185 3 0 59046.762000490597.511000 534615.83 539919.402 3 0 49112.387000517318.422000 574361.142000 581877.479 3 0 50138.407000499705.817000 545116.168000 550505.373 3 0 48691.832000510223.136000 564247.448000 570378.601 1 1 68256.834000496599.31 557998.082000 565697.676 1 1 56864.637000456848.446000 502898.716000 508340.867 1 1 53933.953000479646.739000 528711.936000 534046.589 1 1 56468.009000456499.306000 503936.705000 509286.867 1 1 56117.481000464881.592000 511655.733000 517015.575 1 0 60140.954000466226.599000 519332.729000 524760.071 1 0 56106.889000487886.698000 544010.57 550316.703 1 0 62452.804000509665.97 556011.068000 561309.527 1 0 58373.154000468318.808000 515009.584000 520342.427 1 0 52479.479000499852.717000 546099.564000 551457.608 2 1 58950.898000487229.024000 535760.246000 541083.469 2 1
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
[Some performance testing.] I ran this query 10x with this patch applied, and then 10x again with enable_hashjoin_usestatmvcs set to false to disable the optimization: select sum(1) from (select * from part, lineitem where p_partkey = l_partkey) x; With the optimization enabled, the query took between 26.6 and 38.3 seconds with an average of 31.6. With the optimization disabled, the query took between 48.3 and 69.0 seconds with an average of 60.0 seconds. It appears that the 100 entries in pg_statistic cover about 32% of l_partkey: tpch=# WITH x AS ( SELECT stanumbers1, array_length(stanumbers1, 1) AS len FROM pg_statistic WHERE starelid='lineitem'::regclass AND staattnum = (SELECT attnum FROM pg_attribute WHERE attrelid='lineitem'::regclass AND attname='l_partkey') ) SELECT sum(x.stanumbers1[y.g]) FROM x, (select generate_series(1, x.len) g from x) y; sum 0.3276 (1 row) (there's probably a better way to write that query...) stadistinct for l_partkey is 23,050; the actual number of distinct values is 199,919. IOW, 0.0005% of the distinct values account for 32.76% of the table. That's a lot of skew, but not unrealistic - I've seen tables where more than half of the rows were covered by a single value. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Robert, I thoroughly appreciate the constructive criticism. The compile errors are due to my development process being convoluted. I will endeavor to not waste your time in the future with errors caused by my development process. I have updated the code to follow the conventions and suggestions given. I am now working on adding the requested documentation. I will not submit the next patch until that is done. The functionality has not changed so you can performance test with the patch you have. As for that particularly ugly piece of code. I figured that out while digging through the selfuncs code. Basically I needed a way to get the stats tuple for the outer relation join column of the join but to do that I needed to figure out how to get the actual relation id and attribute number that was being joined. I have not yet figured out a better way to do this but I am sure there is someone on the mailing list with far more knowledge of this than I have. I could possibly be more vigorous in testing to make sure the things I am casting are exactly what I expect. My tests have always been consistent so far. I am essentially doing what is done in selfuncs. I believe I could use the examine_variable() function in selfuncs.c except I would first need a PlannerInfo and I don't think I can get that from inside the join initialization code. - Bryce Cutt On Mon, Dec 15, 2008 at 8:51 PM, Robert Haas robertmh...@gmail.com wrote: I have to admit that I haven't fully grokked what this patch is about just yet, so what follows is mostly a coding style review at this point. It would help a lot if you could add some comments to the new functions that are being added to explain the purpose of each at a very high level. There's clearly been a lot of thought put into some parts of this logic, so it would be worth explaining the reasoning behind that logic. This patch applies clearly against CVS HEAD, but does not compile (please fix the warning, too). nodeHash.c:88: warning: no previous prototype for 'freezeNextMCVPartiton' nodeHash.c: In function 'freezeNextMCVPartiton': nodeHash.c:148: error: 'struct HashJoinTableData' has no member named 'inTupIOs' I commented out the offending line. It errored out again here: nodeHashjoin.c: In function 'getMostCommonValues': nodeHashjoin.c:136: error: wrong type argument to unary plus After removing the stray + sign, it compiled, but failed the rangefuncs regression test. If you're going to keep the enable_hashjoin_usestatmvcs() GUC around, you need to patch rangefuncs.out so that the regression tests pass. I think, however, that there was some discussion of removing that before the patch is committed; if so, please do that instead. Keeping the GUC would also require patching the documentation, which the current patch does not do. getMostCommonValues() isn't a good name for a non-static function because there's nothing to tip the reader off to the fact that it has something to do with hash joins; compare with the other function names defined in the same header file. On the flip side, that function has only one call site, so it should probably be made static and not declared in the header file at all. Some of the other new functions need similar treatment. I am also a little suspicious of this bit of code: relid = getrelid(((SeqScan *) ((SeqScanState *) outerPlanState(hjstate))-ps.plan)-scanrelid, estate-es_range_table); clause = (FuncExprState *) lfirst(list_head(hjstate-hashclauses)); argstate = (ExprState *) lfirst(list_head(clause-args)); variable = (Var *) argstate-expr; I'm not very familiar with the hash join code, but it seems like there are a lot of assumptions being made there about what things are pointing to what other things. Is this this actually safe? And if it is, perhaps a comment explaining why? getMostCommonValues() also appears to be creating and maintaining a counter called collisionsWhileHashing, but nothing is ever done with the counter. On a similar note, the variables relattnum, atttype, and atttypmod don't appear to be necessary; 2 out of 3 of them are only used once, so maybe inlining the reference and dropping the variable would make more sense. Also, the if (HeapTupleIsValid(statsTuple)) block encompasses the whole rest of the function, maybe if (!HeapTupleIsValid(statsTuple)) return? I don't understand why hashtable-mostCommonTuplePartition[bucket].tuples and .frozen need to be initialized to 0. It looks to me like those are in a zero-filled array that was just allocated, so it shouldn't be necessary to re-zero them, unless I'm missing something. freezeNextMCVPartiton is mis-spelled consistently throughout (the last three letters should be ion). I also don't think it makes sense to enclose everything but the first two lines of that function in an else-block. There is some initialization code in ExecHashJoin() that looks like it
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Dr. Lawrence: I'm still working on reviewing this patch. I've managed to load the sample TPCH data from tpch1g1z.zip after changing the line endings to UNIX-style and chopping off the trailing vertical bars. (If anyone is interested, I have the results of pg_dump | bzip2 -9 on the resulting database, which I would be happy to upload if someone has server space. It is about 250MB.) But, I'm not sure quite what to do in terms of generating queries. TPCHSkew contains QGEN.EXE, but that seems to require that you provide template queries as input, and I'm not sure where to get the templates. Any suggestions? Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Robert, You do not need to use qgen.exe to generate queries as you are not running the TPC-H benchmark test. Attached is an example of the 22 sample TPC-H queries according to the benchmark. We have not tested using the TPC-H queries for this particular patch and only use the TPC-H database as a large, skewed data set. The simpler queries we test involve joins of Part-Lineitem or Supplier-Lineitem such as: Select * from part, lineitem where p_partkey = l_partkey OR Select count(*) from part, lineitem where p_partkey = l_partkey The count(*) version is usually more useful for comparisons as the generation of output tuples on the client side (say with pgadmin) dominates the actual time to complete the query. To isolate query costs, we also test using a simple server-side function. The setup description I have also attached. I would be happy to help in any way I can. Bryce is currently working on an updated patch according to your suggestions. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: ramon.lawre...@ubc.ca -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Robert Haas Sent: December 17, 2008 7:54 PM To: Lawrence, Ramon Cc: Tom Lane; pgsql-hackers@postgresql.org; Bryce Cutt Subject: Re: [HACKERS] Proposed Patch to Improve Performance of Multi- Batch Hash Join for Skewed Data Sets Dr. Lawrence: I'm still working on reviewing this patch. I've managed to load the sample TPCH data from tpch1g1z.zip after changing the line endings to UNIX-style and chopping off the trailing vertical bars. (If anyone is interested, I have the results of pg_dump | bzip2 -9 on the resulting database, which I would be happy to upload if someone has server space. It is about 250MB.) But, I'm not sure quite what to do in terms of generating queries. TPCHSkew contains QGEN.EXE, but that seems to require that you provide template queries as input, and I'm not sure where to get the templates. Any suggestions? Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- using 10100 as a seed to the RNG -- QUERY_1 select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate = date '1998-09-01' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; -- QUERY_2 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 28 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'MIDDLE EAST' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'MIDDLE EAST' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100; -- QUERY_3 select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate date '1995-03-31' and l_shipdate date '1995-03-31' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; -- QUERY_4 select o_orderpriority, count(*) as order_count from orders where o_orderdate = date '1997-10-01' and o_orderdate date '1998-02-01' and exists ( select * from lineitem where l_orderkey = o_orderkey
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
I have to admit that I haven't fully grokked what this patch is about just yet, so what follows is mostly a coding style review at this point. It would help a lot if you could add some comments to the new functions that are being added to explain the purpose of each at a very high level. There's clearly been a lot of thought put into some parts of this logic, so it would be worth explaining the reasoning behind that logic. This patch applies clearly against CVS HEAD, but does not compile (please fix the warning, too). nodeHash.c:88: warning: no previous prototype for 'freezeNextMCVPartiton' nodeHash.c: In function 'freezeNextMCVPartiton': nodeHash.c:148: error: 'struct HashJoinTableData' has no member named 'inTupIOs' I commented out the offending line. It errored out again here: nodeHashjoin.c: In function 'getMostCommonValues': nodeHashjoin.c:136: error: wrong type argument to unary plus After removing the stray + sign, it compiled, but failed the rangefuncs regression test. If you're going to keep the enable_hashjoin_usestatmvcs() GUC around, you need to patch rangefuncs.out so that the regression tests pass. I think, however, that there was some discussion of removing that before the patch is committed; if so, please do that instead. Keeping the GUC would also require patching the documentation, which the current patch does not do. getMostCommonValues() isn't a good name for a non-static function because there's nothing to tip the reader off to the fact that it has something to do with hash joins; compare with the other function names defined in the same header file. On the flip side, that function has only one call site, so it should probably be made static and not declared in the header file at all. Some of the other new functions need similar treatment. I am also a little suspicious of this bit of code: relid = getrelid(((SeqScan *) ((SeqScanState *) outerPlanState(hjstate))-ps.plan)-scanrelid, estate-es_range_table); clause = (FuncExprState *) lfirst(list_head(hjstate-hashclauses)); argstate = (ExprState *) lfirst(list_head(clause-args)); variable = (Var *) argstate-expr; I'm not very familiar with the hash join code, but it seems like there are a lot of assumptions being made there about what things are pointing to what other things. Is this this actually safe? And if it is, perhaps a comment explaining why? getMostCommonValues() also appears to be creating and maintaining a counter called collisionsWhileHashing, but nothing is ever done with the counter. On a similar note, the variables relattnum, atttype, and atttypmod don't appear to be necessary; 2 out of 3 of them are only used once, so maybe inlining the reference and dropping the variable would make more sense. Also, the if (HeapTupleIsValid(statsTuple)) block encompasses the whole rest of the function, maybe if (!HeapTupleIsValid(statsTuple)) return? I don't understand why hashtable-mostCommonTuplePartition[bucket].tuples and .frozen need to be initialized to 0. It looks to me like those are in a zero-filled array that was just allocated, so it shouldn't be necessary to re-zero them, unless I'm missing something. freezeNextMCVPartiton is mis-spelled consistently throughout (the last three letters should be ion). I also don't think it makes sense to enclose everything but the first two lines of that function in an else-block. There is some initialization code in ExecHashJoin() that looks like it belongs in ExecHashTableCreate. It appears to me that the interface to isAMostCommonValue() could be simplified by just making it return the partition number. It could perhaps be renamed something like ExecHashGetMCVPartition(). Does ExecHashTableDestroy() need to explicitly pfree hashtable-mostCommonTuplePartition and hashtable-flushOrderedMostCommonTuplePartition? I would think those would be allocated out of hashCxt - if they aren't, they probably should be. Department of minor nitpicks: (1) C++-style comments are not permitted, (2) function names need to be capitalized like_this() or LikeThis() but not likeThis(), (3) when defining a function, the return type should be placed on the line preceding the actual function name, so that the function name is at the beginning of the line, (4) curly braces should be avoided around a block containing only one statement, (5) excessive blank lines should be avoided (for example, the one in costsize.c is clearly unnecessary, and there's at least one place where you add two consecutive blank lines), and (6) I believe the accepted way to write an empty loop is an indented semi-colon on the next line, rather than {} on the same line as the while. I will try to do some more substantive testing of this as well. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] I'm a tad worried about what happens when the values that are frequently occurring in the outer relation are also frequently occurring in the inner (which hardly seems an improbable case). Don't you stand a severe risk of blowing out the in-memory hash table? It doesn't appear to me that the code has any way to back off once it's decided that a certain set of join key values are to be treated in-memory. Splitting the main join into more batches certainly doesn't help with that. Also, AFAICS the benefit of this patch comes entirely from avoiding dump and reload of tuples bearing the most common values, which means it's a significant waste of cycles when there's only one batch. It'd be better to avoid doing any of the extra work in the single-batch case. One thought that might address that point as well as the difficulty of getting stats in nontrivial cases is to wait until we've overrun memory and are forced to start batching, and at that point determine on-the-fly which are the most common hash values from inspection of the hash table as we dump it out. This would amount to optimizing on the basis of frequency in the *inner* relation not the outer, but offhand I don't see any strong theoretical basis why that wouldn't be just as good. It could lose if the first work_mem worth of inner tuples isn't representative of what follows; but this hardly seems more dangerous than depending on MCV stats that are for the whole outer relation rather than the portion of it being selected. regards, tom lane You are correct with both observations. The patch only has a benefit when there is more than one batch. Also, there is a potential issue with MCV hash table overflows if the number of tuples that match the MCVs in the build relation is very large. Bryce has created a patch (attached) that disables the code for one batch joins. This patch also checks for MCV hash table overflows and handles them by flushing from the MCV hash table back to the main hash table. The main hash table will then resolve overflows as usual. Note that this will cause the worse case of a build table with all the same values to be handled the same as the current hash code, i.e., it will attempt to re-partition until it eventually gives up and then allocates the entire partition in memory. There may be a better way to handle this case, but the new patch will remain consistent with the current hash join implementation. The issue with determining and using the MCV stats is more challenging than it appears. First, knowing the MCVs of the build table will not help us. What we need are the MCVs of the probe table because by knowing those values we will keep the tuples with those values in the build relation in memory. For example, consider a join between tables Part and LineItem. Assume 1 popular part accounts for 10% of all LineItems. If Part is the build relation and LineItem is the probe relation, then by keeping that 1 part record in memory, we will guarantee that we do not need to write out 10% of LineItem. If a selection occurs on LineItem before the join, it may change the distribution of LineItem (the MCVs) but it is probable that they are still a good estimate of the MCVs in the derived LineItem relation. (We did experiments on trying to sample the first few thousand tuples of the probe relation to dynamically determine the MCVs but generally found this was inaccurate due to non-random samples.) In essence, the goal is to smartly pick the tuples that remain in the in-memory batch before probing begins. Since the number of MCVs is small, incorrectly selecting build tuples to remain in memory has negligible cost. If we assume that LineItem has been filtered so much that it is now smaller than Part and is the build relation then the MCV approach does not apply. There is no skew in Part on partkey (since it is the PK) and knowing the MCV partkeys in LineItem does not help us because they each only join with a single tuple in Part. In this case, the MCV approach should not be used because no benefit is possible, and it will not be used because there will be no MCVs for Part.partkey. The bad case with MCV hash table overflow requires a many-to-many join between the two relations which would not occur on the more typical PK-FK joins. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] histojoin_v3.patch Description: histojoin_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Lawrence, Ramon [EMAIL PROTECTED] writes: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. ... The basic idea is to keep build relation tuples in a small in-memory hash table that have join values that are frequently occurring in the probe relation. I looked at this patch a little. I'm a tad worried about what happens when the values that are frequently occurring in the outer relation are also frequently occurring in the inner (which hardly seems an improbable case). Don't you stand a severe risk of blowing out the in-memory hash table? It doesn't appear to me that the code has any way to back off once it's decided that a certain set of join key values are to be treated in-memory. Splitting the main join into more batches certainly doesn't help with that. Also, AFAICS the benefit of this patch comes entirely from avoiding dump and reload of tuples bearing the most common values, which means it's a significant waste of cycles when there's only one batch. It'd be better to avoid doing any of the extra work in the single-batch case. One thought that might address that point as well as the difficulty of getting stats in nontrivial cases is to wait until we've overrun memory and are forced to start batching, and at that point determine on-the-fly which are the most common hash values from inspection of the hash table as we dump it out. This would amount to optimizing on the basis of frequency in the *inner* relation not the outer, but offhand I don't see any strong theoretical basis why that wouldn't be just as good. It could lose if the first work_mem worth of inner tuples isn't representative of what follows; but this hardly seems more dangerous than depending on MCV stats that are for the whole outer relation rather than the portion of it being selected. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Wed, Nov 05, 2008 at 04:06:11PM -0800, Bryce Cutt wrote: The error is causes by me Asserting against the wrong variable. I never noticed this as I apparently did not have assertions turned on on my development machine. That is fixed now and with the new patch version I have attached all assertions are passing with your query and my test queries. I added another assertion to that section of the code so that it is a bit more vigorous in confirming the hash table partition is correct. It does not change the operation of the code. There are two partition counts. One holds the maximum number of buckets in the hash table and the other counts the number of actual buckets created for hash values. I was incorrectly testing against the second one because that was valid before I started using a hash table to store the buckets. The enable_hashjoin_usestatmcvs flag was valuable for my own research and tests and likely useful for your review but Tom is correct that it can be removed in the final version. - Bryce Cutt Well, this version seems to work as advertised. Skewed data sets tend to hash join more quickly with this turned on, and data sets with deliberately bad statistics don't perform much differently than with the feature turned off. The patch applies cleanly to CVS HEAD. I don't consider myself qualified to do a decent code review. However I noticed that the comments are all done with // instead of /* ... */. That should probably be changed. To those familiar with code review: is there more I should do to review this? - Josh / eggyknap signature.asc Description: Digital signature
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Wed, Nov 5, 2008 at 5:06 PM, Bryce Cutt [EMAIL PROTECTED] wrote: The error is causes by me Asserting against the wrong variable. I never noticed this as I apparently did not have assertions turned on on my development machine. That is fixed now and with the new patch version I have attached all assertions are passing with your query and my test queries. I added another assertion to that section of the code so that it is a bit more vigorous in confirming the hash table partition is correct. It does not change the operation of the code. There are two partition counts. One holds the maximum number of buckets in the hash table and the other counts the number of actual buckets created for hash values. I was incorrectly testing against the second one because that was valid before I started using a hash table to store the buckets. The enable_hashjoin_usestatmcvs flag was valuable for my own research and tests and likely useful for your review but Tom is correct that it can be removed in the final version. - Bryce Cutt Well, that builds nicely, lets me import the data, and I've seen a performance improvement with enable_hashjoin_usestatmcvs on vs. off. I plan to test that more formally (though probably not fully to the extent you did in your paper; just enough to feel comfortable that I'm getting similar results). Then I'll spend some time poking in the code, for the relatively little good I feel I can do in that capacity, and I'll also investigate scenarios with particularly inaccurate statistics. Stay tuned. - Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Thu, 2008-11-06 at 15:33 -0700, Joshua Tolley wrote: Stay tuned. Minor question on this patch. AFAICS there is another patch that seems to be aiming at exactly the same use case. Jonah's Bloom filter patch. Shouldn't we have a dust off to see which one is best? Or at least a discussion to test whether they overlap? Perhaps you already did that and I missed it because I'm not very tuned in on this thread. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Thu, Nov 6, 2008 at 3:52 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Thu, 2008-11-06 at 15:33 -0700, Joshua Tolley wrote: Stay tuned. Minor question on this patch. AFAICS there is another patch that seems to be aiming at exactly the same use case. Jonah's Bloom filter patch. Shouldn't we have a dust off to see which one is best? Or at least a discussion to test whether they overlap? Perhaps you already did that and I missed it because I'm not very tuned in on this thread. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support We haven't had that discussion AFAIK, and definitely should. First glance suggests they could coexist peacefully, with proper coaxing. If I understand things properly, Jonah's patch filters tuples early in the join process, and this patch tries to ensure that hash join batches are kept in RAM when they're most likely to be used. So they're orthogonal in purpose, and the patches actually apply *almost* cleanly together. Jonah, any comments? If I continue to have some time to devote, and get through all I think I can do to review this patch, I'll gladly look at Jonah's too, FWIW. - Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
-Original Message- Minor question on this patch. AFAICS there is another patch that seems to be aiming at exactly the same use case. Jonah's Bloom filter patch. Shouldn't we have a dust off to see which one is best? Or at least a discussion to test whether they overlap? Perhaps you already did that and I missed it because I'm not very tuned in on this thread. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support We haven't had that discussion AFAIK, and definitely should. First glance suggests they could coexist peacefully, with proper coaxing. If I understand things properly, Jonah's patch filters tuples early in the join process, and this patch tries to ensure that hash join batches are kept in RAM when they're most likely to be used. So they're orthogonal in purpose, and the patches actually apply *almost* cleanly together. Jonah, any comments? If I continue to have some time to devote, and get through all I think I can do to review this patch, I'll gladly look at Jonah's too, FWIW. - Josh The skew patch and bloom filter patch are orthogonal and can both be applied. The bloom filter patch is a great idea, and it is used in many other database systems. You can use the TPC-H data set to demonstrate that the bloom filter patch will significantly improve performance of multi-batch joins (with or without data skew). Any query that filters a build table before joining on the probe table will show improvements with a bloom filter. For example, select * from customer, orders where customer.c_nationkey = 10 and customer.c_custkey = orders.o_custkey The bloom filter on customer would allow us to avoid probing with orders tuples that cannot possibly find a match due to the selection criteria. This is especially beneficial for multi-batch joins where an orders tuple must be written to disk if its corresponding customer batch is not the in-memory batch. I have no experience reviewing patches, but I would be happy to help contribute/review the bloom filter patch as best I can. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Thu, Nov 6, 2008 at 5:31 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote: -Original Message- Minor question on this patch. AFAICS there is another patch that seems to be aiming at exactly the same use case. Jonah's Bloom filter patch. Shouldn't we have a dust off to see which one is best? Or at least a discussion to test whether they overlap? Perhaps you already did that and I missed it because I'm not very tuned in on this thread. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support We haven't had that discussion AFAIK, and definitely should. First glance suggests they could coexist peacefully, with proper coaxing. If I understand things properly, Jonah's patch filters tuples early in the join process, and this patch tries to ensure that hash join batches are kept in RAM when they're most likely to be used. So they're orthogonal in purpose, and the patches actually apply *almost* cleanly together. Jonah, any comments? If I continue to have some time to devote, and get through all I think I can do to review this patch, I'll gladly look at Jonah's too, FWIW. - Josh The skew patch and bloom filter patch are orthogonal and can both be applied. The bloom filter patch is a great idea, and it is used in many other database systems. You can use the TPC-H data set to demonstrate that the bloom filter patch will significantly improve performance of multi-batch joins (with or without data skew). Any query that filters a build table before joining on the probe table will show improvements with a bloom filter. For example, select * from customer, orders where customer.c_nationkey = 10 and customer.c_custkey = orders.o_custkey The bloom filter on customer would allow us to avoid probing with orders tuples that cannot possibly find a match due to the selection criteria. This is especially beneficial for multi-batch joins where an orders tuple must be written to disk if its corresponding customer batch is not the in-memory batch. I have no experience reviewing patches, but I would be happy to help contribute/review the bloom filter patch as best I can. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] I've no patch review experience, either -- this is my first one. See http://wiki.postgresql.org/wiki/Reviewing_a_Patch for details on what a reviewer ought to do in general; various patch review discussions on the -hackers list have also proven helpful. As regards this patch specifically, it seems we could merge the two patches into one and consider them together. However, the bloom filter patch is listed as a Work in Progress on http://wiki.postgresql.org/wiki/CommitFest_2008-11. Perhaps it needs more work before being considered seriously? Jonah, what do you think would be most helpful? - Josh / eggyknap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. I'm running into problems with this patch. It applies cleanly, and the technique you provided for generating sample data works just fine (though I admit I haven't verified that the expected skew exists in the data). But the server crashes when I try to load the data. The backtrace is below, labeled Backtrace 1; since it happens in ExecScanHashMostCommonTuples, I figure it's because of the patch and not something else odd (unless perhaps my hardware is flakey -- I'll try it on other hardware as soon as I can, to verify). Note that I'm running this on Ubuntu 8.10, 32-bit x86, running a kernel Ubuntu labels as 2.6.27-7-generic #1 SMP. The statement in execution at the time was ALTER TABLE SUPPLIER ADD CONSTRAINT SUPPLIER_FK1 FOREIGN KEY (S_NATIONKEY) references NATION (N_NATIONKEY); Further, when I go back into the database in psql, simply issuing a \d command crashes the backend with a similar backtrace, labeled Backtrace 2, below. The query underlying \d and its EXPLAIN output are also included, just for kicks. - Josh * BACKTRACE 1 Core was generated by `postgres: jtolley jtolley [local] ALTE'. Program terminated with signal 6, Aborted. [New process 20407] #0 0xb80b0430 in __kernel_vsyscall () (gdb) bt #0 0xb80b0430 in __kernel_vsyscall () #1 0xb7f22880 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7f24248 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0x0831540e in ExceptionalCondition ( conditionName=0x8433274 !(hjstate-hj_OuterTupleMostCommonValuePartition hashtable-nMostCommonTuplePartitions), errorType=0x834b66d FailedAssertion, fileName=0x84331d9 nodeHash.c, lineNumber=880) at assert.c:57 #4 0x081b457b in ExecScanHashMostCommonTuples (hjstate=0x8720a6c, econtext=0x8720af8) at nodeHash.c:880 #5 0x081b60de in ExecHashJoin (node=0x8720a6c) at nodeHashjoin.c:357 #6 0x081a4748 in ExecProcNode (node=0x8720a6c) at execProcnode.c:406 #7 0x081a242b in standard_ExecutorRun (queryDesc=0x870957c, direction=ForwardScanDirection, count=1) at execMain.c:1343 #8 0x081c2036 in _SPI_execute_plan (plan=0x87181bc, paramLI=0x0, snapshot=0x8485300, crosscheck_snapshot=0x0, read_only=1 '\001', fire_triggers=0 '\0', tcount=1) at spi.c:1976 #9 0x081c2350 in SPI_execute_snapshot (plan=0x87181bc, Values=0x0, Nulls=0x0, snapshot=0x8485300, crosscheck_snapshot=0x0, read_only=value optimized out, fire_triggers=value optimized out, tcount=1) at spi.c:408 #10 0x082e1921 in RI_Initial_Check (trigger=0xbfeb0afc, fk_rel=0xb5a21938, pk_rel=0xb5a20754) at ri_triggers.c:2763 #11 0x08178613 in ATRewriteTables (wqueue=0xbfeb0d88) at tablecmds.c:5026 #12 0x0817ef36 in ATController (rel=0xb5a21938, cmds=value optimized out, recurse=value optimized out) at tablecmds.c:2294 #13 0x08261dd5 in ProcessUtility (parsetree=0x86ca17c, queryString=0x86c96ec ALTER TABLE SUPPLIER\nADD CONSTRAINT SUPPLIER_FK1 FOREIGN KEY (S_NATIONKEY) references NATION (N_NATIONKEY);, params=0x0, isTopLevel=1 '\001', dest=0x86ca2b4, completionTag=0xbfeb0fc8 ) at utility.c:569 #14 0x0825e2ae in PortalRunUtility (portal=0x86fadfc, utilityStmt=0x86ca17c, isTopLevel=value optimized out, dest=0x86ca2b4, completionTag=0xbfeb0fc8 ) at pquery.c:1176 #15 0x0825f2c0 in PortalRunMulti (portal=0x86fadfc, isTopLevel=value optimized out, dest=0x86ca2b4, altdest=0x86ca2b4, completionTag=0xbfeb0fc8 ) at pquery.c:1281 #16 0x0825fb54 in PortalRun (portal=0x86fadfc, count=2147483647, isTopLevel=6 '\006', dest=0x86ca2b4, altdest=0x86ca2b4, completionTag=0xbfeb0fc8 ) at pquery.c:812 #17 0x0825a757 in exec_simple_query ( query_string=0x86c96ec ALTER TABLE SUPPLIER\nADD CONSTRAINT SUPPLIER_FK1 FOREIGN KEY (S_NATIONKEY) references NATION (N_NATIONKEY);) at postgres.c:992 #18 0x0825bfff in PostgresMain (argc=4, argv=0x8667b08, username=0x8667ae0 jtolley) at postgres.c:3569 #19 0x082261cf in ServerLoop () at postmaster.c:3258 #20 0x08227190 in PostmasterMain (argc=1, argv=0x8664250) at postmaster.c:1031 #21 0x081cc126 in main (argc=1, argv=0x8664250) at main.c:188 (gdb) * BACKTRACE 2 Core was generated by `postgres: jtolley jtolley [local] SELE'. Program terminated with signal 6, Aborted. [New process 20967] #0 0xb80b0430 in __kernel_vsyscall () (gdb) bt #0 0xb80b0430 in __kernel_vsyscall () #1 0xb7f22880 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7f24248 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0x0831540e in ExceptionalCondition ( conditionName=0x8433274 !(hjstate-hj_OuterTupleMostCommonValuePartition hashtable-nMostCommonTuplePartitions), errorType=0x834b66d FailedAssertion, fileName=0x84331d9 nodeHash.c, lineNumber=880) at assert.c:57 #4
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. I also recommend modifying docs/src/sgml/config.sgml to include the enable_hashjoin_usestatmcvs option. - Josh / eggyknap signature.asc Description: Digital signature
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Joshua Tolley [EMAIL PROTECTED] writes: On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. I also recommend modifying docs/src/sgml/config.sgml to include the enable_hashjoin_usestatmcvs option. If the patch is actually a win, why would we bother with such a GUC at all? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane wrote: Joshua Tolley writes: On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. I also recommend modifying docs/src/sgml/config.sgml to include the enable_hashjoin_usestatmcvs option. If the patch is actually a win, why would we bother with such a GUC at all? regards, tom lane Good point. Leaving it in place for patch review purposes is useful, but we can probably lose it in the end. - - Josh / eggyknap -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: http://getfiregpg.org iEYEARECAAYFAkkRujsACgkQRiRfCGf1UMNSTACfbpDSQn0HGSVr3jI30GJApcRD YbQAn2VZdI/aIalGBrbn1hlRWPEvbgV5 =LKZ3 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
The error is causes by me Asserting against the wrong variable. I never noticed this as I apparently did not have assertions turned on on my development machine. That is fixed now and with the new patch version I have attached all assertions are passing with your query and my test queries. I added another assertion to that section of the code so that it is a bit more vigorous in confirming the hash table partition is correct. It does not change the operation of the code. There are two partition counts. One holds the maximum number of buckets in the hash table and the other counts the number of actual buckets created for hash values. I was incorrectly testing against the second one because that was valid before I started using a hash table to store the buckets. The enable_hashjoin_usestatmcvs flag was valuable for my own research and tests and likely useful for your review but Tom is correct that it can be removed in the final version. - Bryce Cutt On Wed, Nov 5, 2008 at 7:22 AM, Joshua Tolley [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane wrote: Joshua Tolley writes: On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. I also recommend modifying docs/src/sgml/config.sgml to include the enable_hashjoin_usestatmcvs option. If the patch is actually a win, why would we bother with such a GUC at all? regards, tom lane Good point. Leaving it in place for patch review purposes is useful, but we can probably lose it in the end. - - Josh / eggyknap -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: http://getfiregpg.org iEYEARECAAYFAkkRujsACgkQRiRfCGf1UMNSTACfbpDSQn0HGSVr3jI30GJApcRD YbQAn2VZdI/aIalGBrbn1hlRWPEvbgV5 =LKZ3 -END PGP SIGNATURE- histojoin_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Wed, Nov 05, 2008 at 04:06:11PM -0800, Bryce Cutt wrote: The error is causes by me Asserting against the wrong variable. I never noticed this as I apparently did not have assertions turned on on my development machine. That is fixed now and with the new patch version I have attached all assertions are passing with your query and my test queries. I added another assertion to that section of the code so that it is a bit more vigorous in confirming the hash table partition is correct. It does not change the operation of the code. There are two partition counts. One holds the maximum number of buckets in the hash table and the other counts the number of actual buckets created for hash values. I was incorrectly testing against the second one because that was valid before I started using a hash table to store the buckets. The enable_hashjoin_usestatmcvs flag was valuable for my own research and tests and likely useful for your review but Tom is correct that it can be removed in the final version. - Bryce Cutt Thanks for the new patch; I'll take a look as soon as I can (prolly tomorrow). - Josh signature.asc Description: Digital signature
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Joshua, Thank you for offering to review the patch. The easiest way to test would be to generate your own TPC-H data and load it into a database for testing. I have posted the TPC-H generator at: http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip The generator can produce skewed data sets. It was produced by Microsoft Research. After unzipping, on a Windows machine, you can just run the command: dbgen -s 1 -z 1 This will produce a TPC-H database of scale 1 GB with a Zipfian skew of z=1. More information on the generator is in the document README-S.DOC. Source is provided for the generator, so you should be able to run it on other operating systems as well. The schema DDL is at: http://people.ok.ubc.ca/rlawrenc/tpch_pg_ddl.txt Note that the load time for 1G data is 1-2 hours and for 10G data is about 24 hours. I recommend you do not add the foreign keys until after the data is loaded. The other alternative is to do a pgdump on our data sets. However, the download size would be quite large, and it will take a couple of days for us to get you the data in that form. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] -Original Message- From: Joshua Tolley [mailto:[EMAIL PROTECTED] Sent: November 1, 2008 3:42 PM To: Lawrence, Ramon Cc: pgsql-hackers@postgresql.org; Bryce Cutt Subject: Re: [HACKERS] Proposed Patch to Improve Performance of Multi- Batch Hash Join for Skewed Data Sets On Mon, Oct 20, 2008 at 4:42 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. Project name: Histojoin Patch file: histojoin_v1.patch This patch implements the Histojoin join algorithm as an optional feature added to the standard Hybrid Hash Join (HHJ). A flag is used to enable or disable the Histojoin features. When Histojoin is disabled, HHJ acts as normal. The Histojoin features allow HHJ to use PostgreSQL's statistics to do skew aware partitioning. The basic idea is to keep build relation tuples in a small in-memory hash table that have join values that are frequently occurring in the probe relation. This improves performance of HHJ when multiple batches are used by 10% to 50% for skewed data sets. The performance improvements of this patch can be seen in the paper (pages 25-30) at: http://people.ok.ubc.ca/rlawrenc/histojoin2.pdf All generators and materials needed to verify these results can be provided. This is a patch against the HEAD of the repository. This patch does not contain platform specific code. It compiles and has been tested on our machines in both Windows (MSVC++) and Linux (GCC). Currently the Histojoin feature is enabled by default and is used whenever HHJ is used and there are Most Common Value (MCV) statistics available on the probe side base relation of the join. To disable this feature simply set the enable_hashjoin_usestatmcvs flag to off in the database configuration file or at run time with the 'set' command. One potential improvement not included in the patch is that Most Common Value (MCV) statistics are only determined when the probe relation is produced by a scan operator. There is a benefit to using MCVs even when the probe relation is not a base scan, but we were unable to determine how to find statistics from a base relation after other operators are performed. This patch was created by Bryce Cutt as part of his work on his M.Sc. thesis. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] I'm interested in trying to review this patch. Having not done patch review before, I can't exactly promise grand results, but if you could provide me with the data to check your results? In the meantime I'll go read the paper. - Josh / eggyknap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Sun, Nov 2, 2008 at 4:48 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote: Joshua, Thank you for offering to review the patch. The easiest way to test would be to generate your own TPC-H data and load it into a database for testing. I have posted the TPC-H generator at: http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip The generator can produce skewed data sets. It was produced by Microsoft Research. After unzipping, on a Windows machine, you can just run the command: dbgen -s 1 -z 1 This will produce a TPC-H database of scale 1 GB with a Zipfian skew of z=1. More information on the generator is in the document README-S.DOC. Source is provided for the generator, so you should be able to run it on other operating systems as well. The schema DDL is at: http://people.ok.ubc.ca/rlawrenc/tpch_pg_ddl.txt Note that the load time for 1G data is 1-2 hours and for 10G data is about 24 hours. I recommend you do not add the foreign keys until after the data is loaded. The other alternative is to do a pgdump on our data sets. However, the download size would be quite large, and it will take a couple of days for us to get you the data in that form. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] I'll try out the TPC-H generator first :) Thanks. - Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Lawrence, Ramon [EMAIL PROTECTED] writes: The easiest way to test would be to generate your own TPC-H data and load it into a database for testing. I have posted the TPC-H generator at: http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip The generator can produce skewed data sets. It was produced by Microsoft Research. What alternatives are there for people who do not run Windows? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
From: Tom Lane [mailto:[EMAIL PROTECTED] What alternatives are there for people who do not run Windows? regards, tom lane The TPC-H generator is a standard code base provided at http://www.tpc.org/tpch/. We have been able to compile this code on Linux. However, we were unable to get the Microsoft modifications to this code to compile on Linux (although they are supposed to be portable). So, we just used the Windows version with wine on our test Debian machine. I have also posted the text files for the TPC-H 1G 1Z data set at: http://people.ok.ubc.ca/rlawrenc/tpch1g1z.zip Note that you need to trim the extra characters at the end of the lines for PostgreSQL to read them properly. Since the data takes a while to generate and load, we can also provide a compressed version of the PostgreSQL data directory of the databases with the data already loaded. -- Ramon Lawrence -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Mon, Oct 20, 2008 at 4:42 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote: We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. Project name: Histojoin Patch file: histojoin_v1.patch This patch implements the Histojoin join algorithm as an optional feature added to the standard Hybrid Hash Join (HHJ). A flag is used to enable or disable the Histojoin features. When Histojoin is disabled, HHJ acts as normal. The Histojoin features allow HHJ to use PostgreSQL's statistics to do skew aware partitioning. The basic idea is to keep build relation tuples in a small in-memory hash table that have join values that are frequently occurring in the probe relation. This improves performance of HHJ when multiple batches are used by 10% to 50% for skewed data sets. The performance improvements of this patch can be seen in the paper (pages 25-30) at: http://people.ok.ubc.ca/rlawrenc/histojoin2.pdf All generators and materials needed to verify these results can be provided. This is a patch against the HEAD of the repository. This patch does not contain platform specific code. It compiles and has been tested on our machines in both Windows (MSVC++) and Linux (GCC). Currently the Histojoin feature is enabled by default and is used whenever HHJ is used and there are Most Common Value (MCV) statistics available on the probe side base relation of the join. To disable this feature simply set the enable_hashjoin_usestatmcvs flag to off in the database configuration file or at run time with the 'set' command. One potential improvement not included in the patch is that Most Common Value (MCV) statistics are only determined when the probe relation is produced by a scan operator. There is a benefit to using MCVs even when the probe relation is not a base scan, but we were unable to determine how to find statistics from a base relation after other operators are performed. This patch was created by Bryce Cutt as part of his work on his M.Sc. thesis. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] I'm interested in trying to review this patch. Having not done patch review before, I can't exactly promise grand results, but if you could provide me with the data to check your results? In the meantime I'll go read the paper. - Josh / eggyknap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
We propose a patch that improves hybrid hash join's performance for large multi-batch joins where the probe relation has skew. Project name: Histojoin Patch file: histojoin_v1.patch This patch implements the Histojoin join algorithm as an optional feature added to the standard Hybrid Hash Join (HHJ). A flag is used to enable or disable the Histojoin features. When Histojoin is disabled, HHJ acts as normal. The Histojoin features allow HHJ to use PostgreSQL's statistics to do skew aware partitioning. The basic idea is to keep build relation tuples in a small in-memory hash table that have join values that are frequently occurring in the probe relation. This improves performance of HHJ when multiple batches are used by 10% to 50% for skewed data sets. The performance improvements of this patch can be seen in the paper (pages 25-30) at: http://people.ok.ubc.ca/rlawrenc/histojoin2.pdf All generators and materials needed to verify these results can be provided. This is a patch against the HEAD of the repository. This patch does not contain platform specific code. It compiles and has been tested on our machines in both Windows (MSVC++) and Linux (GCC). Currently the Histojoin feature is enabled by default and is used whenever HHJ is used and there are Most Common Value (MCV) statistics available on the probe side base relation of the join. To disable this feature simply set the enable_hashjoin_usestatmcvs flag to off in the database configuration file or at run time with the 'set' command. One potential improvement not included in the patch is that Most Common Value (MCV) statistics are only determined when the probe relation is produced by a scan operator. There is a benefit to using MCVs even when the probe relation is not a base scan, but we were unable to determine how to find statistics from a base relation after other operators are performed. This patch was created by Bryce Cutt as part of his work on his M.Sc. thesis. -- Dr. Ramon Lawrence Assistant Professor, Department of Computer Science, University of British Columbia Okanagan E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] histojoin_v1.patch Description: histojoin_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers