Re: [HACKERS] pgsql: Optimize pglz compressor for small inputs.

2013-07-17 Thread Heikki Linnakangas

On 14.07.2013 20:12, Stephen Frost wrote:

* Heikki Linnakangas (heikki.linnakan...@iki.fi) wrote:

This patch alleviates that in two ways. First, instead of storing pointers
in the hash table, store 16-bit indexes into the hist_entries array. That
slashes the size of the hash table to 1/2 or 1/4 of the original, depending
on the pointer width. Secondly, adjust the size of the hash table based on
input size. For very small inputs, you don't need a large hash table to
avoid collisions.


   The coverity scanner has a bit of an issue with this patch which, at
   least on first blush, looks like a valid concern.

   While the change in pg_lzcompress.c:pglz_find_match() to loop on:

   while (hent != INVALID_ENTRY_PTR)
   {
  const char *ip = input;
  const char *hp = hent-pos;

   looks good, and INVALID_ENTRY_PTR is the address of the first entry in
   the array (and can't be NULL), towards the end of the loop we do:

   hent = hent-next;
   if (hent)
  ...

   Should we really be checking for 'hent != INVALID_ENTRY_PTR' here?  If
   not, and hent really can end up as NULL, then we're going to segfault
   on the next loop due to the unchecked 'hent-pos' early in the loop.
   If hent can never be NULL, then we probably don't need this check at
   all.


hent can never be NULL, the code should indeed check for 'hent != 
INVALID_ENTRY_PTR'. The check is not required from a correctness point 
of view, the idea is just to avoid calculating the 'good_match' 
variable, if you're going to fall out of the loop anyway.


I'm actually a bit surprised the compiler doesn't optimize it that way 
anyway, without the explicit if-block, but at least gcc -O2 (version 
4.7.3) seem to do that. So, I guess we should keep the check.


Committed '(hent)' - '(hent != INVALID_ENTRY_PTR)'. Thanks for the report!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql: Optimize pglz compressor for small inputs.

2013-07-14 Thread Stephen Frost
Heikki,

* Heikki Linnakangas (heikki.linnakan...@iki.fi) wrote:
 This patch alleviates that in two ways. First, instead of storing pointers
 in the hash table, store 16-bit indexes into the hist_entries array. That
 slashes the size of the hash table to 1/2 or 1/4 of the original, depending
 on the pointer width. Secondly, adjust the size of the hash table based on
 input size. For very small inputs, you don't need a large hash table to
 avoid collisions.

  The coverity scanner has a bit of an issue with this patch which, at
  least on first blush, looks like a valid concern.
  
  While the change in pg_lzcompress.c:pglz_find_match() to loop on:
  
  while (hent != INVALID_ENTRY_PTR)
  {
  const char *ip = input;
  const char *hp = hent-pos;

  looks good, and INVALID_ENTRY_PTR is the address of the first entry in
  the array (and can't be NULL), towards the end of the loop we do:

  hent = hent-next;
  if (hent)
  ...

  Should we really be checking for 'hent != INVALID_ENTRY_PTR' here?  If
  not, and hent really can end up as NULL, then we're going to segfault
  on the next loop due to the unchecked 'hent-pos' early in the loop.
  If hent can never be NULL, then we probably don't need this check at
  all.

Thanks,

Stephen


signature.asc
Description: Digital signature