On Thu, 2003-09-18 at 20:20, Tom Lane wrote:
> Jenny Zhang <[EMAIL PROTECTED]> writes:
> > ...  It seems to me that small 
> > effective_cache_size favors the choice of nested loop joins (NLJ) 
> > while the big effective_cache_size is in favor of merge joins (MJ).  
> 
> No, I wouldn't think that, because a nestloop plan will involve repeated
> fetches of the same tuples whereas a merge join doesn't (at least not
> when it sorts its inner input, as this plan does).  Larger cache
> improves the odds of a repeated fetch not having to do I/O.  In practice
> a larger cache area would also have some effects on access costs for the
> sort's temp file, but I don't think the planner's cost model for sorting
> takes that into account.
I think there is some misunderstanding here.  What I meant to say is:
>From the plans we got, the optimizer favors the choice of nested loop
joins (NLJ) while the big effective_cache_size is in favor of merge
joins (MJ).  Which we think is not appropriate.  We verified that
sort_mem has no impact on the plans.  Though it would be nice to take
that into account.
> 
> As Matt Clark points out nearby, the real question is whether these
> planner estimates have anything to do with reality.  EXPLAIN ANALYZE
> results would be far more interesting than plain EXPLAIN.
> 
> > However, within the same run set consist of 6 runs, we see 2-3% 
> > standard deviation for the run metrics associated with the multiple
> > stream part of the test (as opposed to the single stream part).
> 
> <python> Och, laddie, we useta *dream* of 2-3% variation </python>
> 
BTW, I am a she :-)
> > We would like to reduce the variation to be less than 1% so that a 
> > 2% change between two different kernels would be significant. 
> 
> I think this is a pipe dream.  Variation in where the data gets laid
> down on your disk drive would alone create more than that kind of delta.
> I'm frankly amazed you could get repeatability within 2-3%.
> 
Greg is right.  The repeatability is due to the aggregate results for a
whole test run.  As for individual query, the power test(single stream)
is very consistent, and the throughput test(multiple streams), any given
query execution time varies up to 15% if no swapping.  If we set
sort_mem too high and swapping occurs, the variation is bigger.

Jenny


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to