Re: [HACKERS] 8192 BLCKSZ ?]

Jan Wieck Mon, 04 Dec 2000 14:59:00 -0800
Don Baccus wrote:
>
> ...
> I expect TOAST to work even better).  Users will still be able change to
> larger blocksizes (perhaps a wise thing to do if a large percentage of their
> data won't fit into a single PG block).   Users using the default will
> be able to store rows of *awesome* length, efficiently.

    Depends...

    Actually  the  toaster already jumps in if your tuples exceed
    BLKSZ/4, so with the default of 8K blocks it  tries  to  keep
    all tuples smaller than 2K. The reasons behind that are:

    1.  An average tuple size of 8K means an average of 4K unused
        space at the end of each block. Wasting  space  means  to
        waste IO bandwidth.

    2.  Since  big  items  are  unlikely  to  be search criteria,
        needing to read them into memory for every  chech  for  a
        match on other columns is a waste again.  So the more big
        items are off from the main tuple, the smaller  the  main
        table becomes, the more likely it is that the main tuples
        (holding  the  keys)  are  cached  and  the   cheaper   a
        sequential scan becomes.

    Of  course,  especially  for  2. there is a break even point.
    That is when the extra fetches to send toast  values  to  the
    client  cost  more  than  there  was  saved from not doing it
    during  the  main  scan  already.  A  full  table  SELECT   *
    definitely  costs  more  if  TOAST  is involved. But who does
    unqualified SELECT * from a multi-gig table without  problems
    anyway?   Usually  you  pick  a single or a few based on some
    other key attributes - don't you?

    Let's make an example. You have a forum server that  displays
    one  article  plus the date and sender of all follow-ups. The
    article bodies are usually big (1-10K). So you do a SELECT  *
    to  fetch  the actually displayed article, and another SELECT
    sender, date_sent just to get the info for the follow-ups. If
    we  assume a uniform distribution of body size and an average
    of 10 follow-ups, that'd mean that we  save  52K  of  IO  and
    cache usage for each article displayed.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #
Re: [HACKERS] 8192 BLCKSZ ?]

Reply via email to