Hi,

Some quick review comments:

On 2014-02-13 18:14:54 +0530, Amit Kapila wrote:
> +     /*
> +      * EWT can be generated for all new tuple versions created by Update
> +      * operation. Currently we do it when both the old and new tuple 
> versions
> +      * are on same page, because during recovery if the page containing old
> +      * tuple is corrupt, it should not cascade that corruption to other 
> pages.
> +      * Under the general assumption that for long runs most updates tend to
> +      * create new tuple version on same page, there should not be 
> significant
> +      * impact on WAL reduction or performance.
> +      *
> +      * We should not generate EWT when we need to backup the whole block in
> +      * WAL as in that case there is no saving by reduced WAL size.
> +      */
> +
> +     if (RelationIsEnabledForWalCompression(reln) &&
> +             (oldbuf == newbuf) &&
> +             !XLogCheckBufferNeedsBackup(newbuf))
> +     {
> +             uint32          enclen;

You should note that thew check for RelationIsEnabledForWalCompression()
here is racy and that that's ok because the worst that can happen is
that a uselessly generated delta.
>       xlrec.target.node = reln->rd_node;
>       xlrec.target.tid = oldtup->t_self;
>       xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup->t_data);
> @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
>       xlrec.newtid = newtup->t_self;
>       if (new_all_visible_cleared)
>               xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
> +     if (compressed)
> +             xlrec.flags |= XLOG_HEAP_DELTA_ENCODED;

I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and
conditional on !need_tuple_data.



>  /*
> + * Determine whether the buffer referenced has to be backed up. Since we 
> don't
> + * yet have the insert lock, fullPageWrites and forcePageWrites could change
> + * later, but will not cause any problem because this function is used only 
> to
> + * identify whether EWT is required for update.
> + */
> +bool
> +XLogCheckBufferNeedsBackup(Buffer buffer)
> +{

Should note very, very boldly that this can only be used in contexts
where a race is acceptable.

> diff --git a/src/backend/utils/adt/pg_rbcompress.c 
> b/src/backend/utils/adt/pg_rbcompress.c
> new file mode 100644
> index 0000000..877ccd7
> --- /dev/null
> +++ b/src/backend/utils/adt/pg_rbcompress.c
> @@ -0,0 +1,355 @@
> +/* ----------
> + * pg_rbcompress.c -
> + *
> + *           This is a delta encoding scheme specific to PostgreSQL and 
> designed
> + *           to compress similar tuples. It can be used as it is or extended 
> for
> + *           other purpose in PostgrSQL if required.
> + *
> + *           Currently, this just checks for a common prefix and/or suffix, 
> but
> + *           the output format is similar to the LZ format used in 
> pg_lzcompress.c.
> + *
> + * Copyright (c) 1999-2014, PostgreSQL Global Development Group
> + *
> + * src/backend/utils/adt/pg_rbcompress.c
> + * ----------
> + */

This needs significantly more explanations about the algorithm and the
reasoning behind it.


> +static const PGRB_Strategy strategy_default_data = {
> +     32,                                                     /* Data chunks 
> less than 32 bytes are not
> +                                                              * compressed */
> +     INT_MAX,                                        /* No upper limit on 
> what we'll try to
> +                                                              * compress */
> +     35,                                                     /* Require 25% 
> compression rate, or not worth
> +                                                              * it */
> +};

compression rate looks like it's mismatch between comment and code.

> +/* ----------
> + * pgrb_out_ctrl -
> + *
> + *           Outputs the last and allocates a new control byte if needed.
> + * ----------
> + */
> +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \
> +do { \
> +     if ((__ctrl & 0xff) == 0)                                               
>                                                 \
> +     {                                                                       
>                                                                         \
> +             *(__ctrlp) = __ctrlb;                                           
>                                                 \
> +             __ctrlp = (__buf)++;                                            
>                                                 \
> +             __ctrlb = 0;                                                    
>                                                         \
> +             __ctrl = 1;                                                     
>                                                                 \
> +     }                                                                       
>                                                                         \
> +} while (0)
> +

double underscore variables are reserved for the compiler and os.

> +/* ----------
> + * pgrb_out_literal -
> + *
> + *           Outputs a literal byte to the destination buffer including the
> + *           appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \
> +do { \
> +     pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);                                
>                                 \
> +     *(_buf)++ = (unsigned char)(_byte);                                     
>                                         \
> +     _ctrl <<= 1;                                                            
>                                                         \
> +} while (0)
> +
> +
> +/* ----------
> + * pgrb_out_tag -
> + *
> + *           Outputs a backward reference tag of 2-4 bytes (depending on
> + *           offset and length) to the destination buffer including the
> + *           appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_tag(_ctrlp,_ctrlb,_ctrl,_buf,_len,_off) \
> +do { \
> +     pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);                                
>                                 \
> +     _ctrlb |= _ctrl;                                                        
>                                                         \
> +     _ctrl <<= 1;                                                            
>                                                         \
> +     if (_len > 17)                                                          
>                                                         \
> +     {                                                                       
>                                                                         \
> +             (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | 0x0f);    
>         \
> +             (_buf)[1] = (unsigned char)(((_off) & 0xff));                   
>                         \
> +             (_buf)[2] = (unsigned char)((_len) - 18);                       
>                                 \
> +             (_buf) += 3;                                                    
>                                                         \
> +     } else {                                                                
>                                                                 \
> +             (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | ((_len) - 
> 3)); \
> +             (_buf)[1] = (unsigned char)((_off) & 0xff);                     
>                                 \
> +             (_buf) += 2;                                                    
>                                                         \
> +     }                                                                       
>                                                                         \
> +} while (0)
> +

What's the reason to use macros here? Just use inline functions when
dealing with file-local stuff.

> +/* ----------
> + * pgrb_delta_encode - find common prefix/suffix between inputs and encode.
> + *
> + *   source is the input data to be compressed
> + *   slen is the length of source data
> + *  history is the data which is used as reference for compression
> + *   hlen is the length of history data
> + *   The encoded result is written to dest, and its length is returned in
> + *   finallen.
> + *   The return value is TRUE if compression succeeded,
> + *   FALSE if not; in the latter case the contents of dest
> + *   are undefined.
> + *   ----------
> + */
> +bool
> +pgrb_delta_encode(const char *source, int32 slen,
> +                               const char *history, int32 hlen,
> +                               char *dest, uint32 *finallen,
> +                               const PGRB_Strategy *strategy)
> +{
> +     unsigned char *bp = ((unsigned char *) dest);
> +     unsigned char *bstart = bp;
> +     const char *dp = source;
> +     const char *dend = source + slen;
> +     const char *hp = history;
> +     unsigned char ctrl_dummy = 0;
> +     unsigned char *ctrlp = &ctrl_dummy;
> +     unsigned char ctrlb = 0;
> +     unsigned char ctrl = 0;
> +     int32           result_size;
> +     int32           result_max;
> +     int32           need_rate;
> +     int                     prefixlen;
> +     int                     suffixlen;
> +
> +     /*
> +      * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for
> +      * delta encode as this is the maximum size of history offset.
> +      * XXX: still true?
> +      */

Why didn't you define a maximum tuple size in the strategy definition
above then?

> +     if (hlen >= PGRB_HISTORY_SIZE || hlen < PGRB_MIN_MATCH)
> +             return false;
> +
> +     /*
> +      * Our fallback strategy is the default.
> +      */
> +     if (strategy == NULL)
> +             strategy = PGRB_strategy_default;
>
> +     /*
> +      * If the strategy forbids compression (at all or if source chunk size 
> out
> +      * of range), fail.
> +      */
> +     if (slen < strategy->min_input_size ||
> +             slen > strategy->max_input_size)
> +             return false;
> +
> +     need_rate = strategy->min_comp_rate;
> +     if (need_rate < 0)
> +             need_rate = 0;
> +     else if (need_rate > 99)
> +             need_rate = 99;

Is there really need for all this stuff here? This is so specific to the
usecase that I have significant doubts that all the pglz boiler plate
makes much sense.

> +     /*
> +      * Compute the maximum result size allowed by the strategy, namely the
> +      * input size minus the minimum wanted compression rate.  This had 
> better
> +      * be <= slen, else we might overrun the provided output buffer.
> +      */
> +     /*if (slen > (INT_MAX / 100))
> +     {
> +             /* Approximate to avoid overflow */
> +             /*result_max = (slen / 100) * (100 - need_rate);
> +     }
> +     else
> +     {
> +             result_max = (slen * (100 - need_rate)) / 100;
> +     }*/

err?

> +--
> +-- Test to update continuos and non continuos columns
> +--

*continuous

I have to admit, I have serious doubts about this approach. I have a
very hard time believing this won't cause performance regression in many
common cases... More importantly I don't think doing the compression on
this level is that interesting. I know Heikki argued for it, but I think
extending the bitmap that's computed for HOT to cover all columns and
doing this on a column level sounds much more sensible to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to