Re: [pdf-devel] Bug in LZW filter?

Aleksander Morgado Tue, 11 Oct 2011 14:17:11 -0700

Hi Georg & JP,

JP: Any help on this really appreciated...


I spent some time last week analyzing the issue, and couldn't understand
why the need of some of the changes. See my comments below to see if
they suggest you any idea.

> Changes sorted by line number (as in patch):
> [3]:
> -#define LZW_MAX_DICTSIZE  (1 << LZW_MAX_BITSIZE)
> +#define LZW_MAX_DICTSIZE  ((1 << LZW_MAX_BITSIZE) + 1)
>  #define LZW_NULL_INDEX    ~0U
> --------------------------------------------------------------------

I really don't get this +1. I understand that currently the code needs
the extra size, or we will write out of bounds, but the logic tells me
that it shouldn't. The maximum size of the dictionary will be (should
be) 2^12 = 4096, so therefore the fix is not to have the +1, but to try
to avoid writing 4097 items.

> 
> [2]:
> @@ -419,7 +421,7 @@
>            lzw_buffer_put_code (&st->buffer, st->string.prefix);
>            st->string.prefix = st->string.suffix;
> 
> -          if (st->buffer.maxval - st->early_change == st->dict.size)
> +          if (st->buffer.maxval - st->early_change + 2 == st->dict.size)
>              {
>                if (!lzw_buffer_inc_bitsize (&st->buffer))
>                  {
> 
> @@ -434,7 +436,7 @@
>    if (finish)
>      {
>        lzw_buffer_put_code (&st->buffer, st->string.prefix);
> -      if ((st->buffer.maxval - st->early_change) == st->dict.size)
> +      if ((st->buffer.maxval + st->early_change) == st->dict.size)
>          {
>            lzw_buffer_inc_bitsize (&st->buffer);
>          }
> --------------------------------------------------------------------
> 

I also don't get the +2 or the sign change here.

buffer.maxval is the maximum value allowed with a given bitsize. For the
initial case of having 9 bits, maxval will be (2^9)-1 = 511; for the max
case of 12 bits, maxval will be (2^12)-1 = 4095.

If EarlyChange = 0; I would assume that we increase bitsize when the
dictionary size (dict.size) is equal to maxval, this is 511 for the
initial case of the change from 9 to 10 bits.

If EarlyChange = 1; I would assume that we increase the bitsize one code
earlier, this is when dict.size is equal to maxval-1 = (2^12)-2 = 510.

So from the logic of the algorithm, and considering EarlyChange
behavior, I would say that the algorithm is ok as it is, without any +2
and without the sign change. But, but, the code won't work unless we do
those changes (I'm assuming your test data files TD0006,7,8 are ok).

What it really seems to me is that the logic in the code is as it should
be, but maybe we are not updating dict.size properly, or we are assuming
that the dict.size holds the size of the dictionary, while it really
isn't. With your changes on, we increase bitsize if dict.size is 513
(EarlyChange=0) or if it is 512 (EarlyChange=1). This doesn't go with
the algorithm logic, there's an offset of 2 somewhere applied before to
dict.size, which shouldn't have been applied, or something. Possible
suspects could be the two additional codes added initially; initial
dictionary size when starting to encode is 256+2=258; but those should
really be considered in dict.size, I believe.

> [4],[5]:
> @@ -530,7 +532,7 @@
>    lzw_buffer_init (&filter_state->buffer, LZW_MIN_BITSIZE);
>    lzw_dict_init (&filter_state->dict);
>    filter_state->old_code = LZW_NULL_INDEX;
> -  filter_state->decoded = filter_state->dec_buf + (LZW_MAX_DICTSIZE-2);
> +  filter_state->decoded = filter_state->dec_buf + (LZW_MAX_DICTSIZE - 3);
>    filter_state->dec_size = 0;
>    filter_state->state_pos = LZWDEC_STATE_START;
>    filter_state->tmp_error = NULL;
> @@ -664,7 +666,7 @@
>        while (st->new_code != LZW_EOD_CODE &&
>               st->new_code != LZW_RESET_CODE)
>          {
> -          st->decoded = st->dec_buf + (LZW_MAX_DICTSIZE - 2);
> +          st->decoded = st->dec_buf + (LZW_MAX_DICTSIZE - 3);
> 
>            /* Is new code in the dict? */
>            if (st->new_code < st->dict.size)
> --------------------------------------------------------------------
> 

These changes, needed because of the new LZW_MAX_DICTSIZE, shouldn't
also be needed, as that value shouldn't really be modified at the end.
But there is one interesting thing in here; what is the -2 for in these
line? Any hint?:
        filter_state->decoded = filter_state->dec_buf +
        (LZW_MAX_DICTSIZE-2);


> [1]:
> @@ -687,7 +689,7 @@
>            if (!lzwdec_put_decoded (st, out))
>              return PDF_STM_FILTER_APPLY_STATUS_NO_OUTPUT; /* No more
> output */
> 
> -          if (st->dict.size == st->buffer.maxval - 1 - st->early_change)
> +          if (st->dict.size == st->buffer.maxval + 1 - st->early_change)
>              {
>                lzw_buffer_inc_bitsize (&st->buffer);
> 
> 

As you said, encoding and decoding should be symmetrical; as soon as we
know how the encoder should be fixed, we'll know what to do in the
decoder, so I will not comment this sign change.

Cheers!

-- 
Aleksander

signature.asc
Description: This is a digitally signed message part

Re: [pdf-devel] Bug in LZW filter?

Reply via email to