Hi Georg & JP, JP: Any help on this really appreciated...
I spent some time last week analyzing the issue, and couldn't understand why the need of some of the changes. See my comments below to see if they suggest you any idea. > Changes sorted by line number (as in patch): > [3]: > -#define LZW_MAX_DICTSIZE (1 << LZW_MAX_BITSIZE) > +#define LZW_MAX_DICTSIZE ((1 << LZW_MAX_BITSIZE) + 1) > #define LZW_NULL_INDEX ~0U > -------------------------------------------------------------------- I really don't get this +1. I understand that currently the code needs the extra size, or we will write out of bounds, but the logic tells me that it shouldn't. The maximum size of the dictionary will be (should be) 2^12 = 4096, so therefore the fix is not to have the +1, but to try to avoid writing 4097 items. > > [2]: > @@ -419,7 +421,7 @@ > lzw_buffer_put_code (&st->buffer, st->string.prefix); > st->string.prefix = st->string.suffix; > > - if (st->buffer.maxval - st->early_change == st->dict.size) > + if (st->buffer.maxval - st->early_change + 2 == st->dict.size) > { > if (!lzw_buffer_inc_bitsize (&st->buffer)) > { > > @@ -434,7 +436,7 @@ > if (finish) > { > lzw_buffer_put_code (&st->buffer, st->string.prefix); > - if ((st->buffer.maxval - st->early_change) == st->dict.size) > + if ((st->buffer.maxval + st->early_change) == st->dict.size) > { > lzw_buffer_inc_bitsize (&st->buffer); > } > -------------------------------------------------------------------- > I also don't get the +2 or the sign change here. buffer.maxval is the maximum value allowed with a given bitsize. For the initial case of having 9 bits, maxval will be (2^9)-1 = 511; for the max case of 12 bits, maxval will be (2^12)-1 = 4095. If EarlyChange = 0; I would assume that we increase bitsize when the dictionary size (dict.size) is equal to maxval, this is 511 for the initial case of the change from 9 to 10 bits. If EarlyChange = 1; I would assume that we increase the bitsize one code earlier, this is when dict.size is equal to maxval-1 = (2^12)-2 = 510. So from the logic of the algorithm, and considering EarlyChange behavior, I would say that the algorithm is ok as it is, without any +2 and without the sign change. But, but, the code won't work unless we do those changes (I'm assuming your test data files TD0006,7,8 are ok). What it really seems to me is that the logic in the code is as it should be, but maybe we are not updating dict.size properly, or we are assuming that the dict.size holds the size of the dictionary, while it really isn't. With your changes on, we increase bitsize if dict.size is 513 (EarlyChange=0) or if it is 512 (EarlyChange=1). This doesn't go with the algorithm logic, there's an offset of 2 somewhere applied before to dict.size, which shouldn't have been applied, or something. Possible suspects could be the two additional codes added initially; initial dictionary size when starting to encode is 256+2=258; but those should really be considered in dict.size, I believe. > [4],[5]: > @@ -530,7 +532,7 @@ > lzw_buffer_init (&filter_state->buffer, LZW_MIN_BITSIZE); > lzw_dict_init (&filter_state->dict); > filter_state->old_code = LZW_NULL_INDEX; > - filter_state->decoded = filter_state->dec_buf + (LZW_MAX_DICTSIZE-2); > + filter_state->decoded = filter_state->dec_buf + (LZW_MAX_DICTSIZE - 3); > filter_state->dec_size = 0; > filter_state->state_pos = LZWDEC_STATE_START; > filter_state->tmp_error = NULL; > @@ -664,7 +666,7 @@ > while (st->new_code != LZW_EOD_CODE && > st->new_code != LZW_RESET_CODE) > { > - st->decoded = st->dec_buf + (LZW_MAX_DICTSIZE - 2); > + st->decoded = st->dec_buf + (LZW_MAX_DICTSIZE - 3); > > /* Is new code in the dict? */ > if (st->new_code < st->dict.size) > -------------------------------------------------------------------- > These changes, needed because of the new LZW_MAX_DICTSIZE, shouldn't also be needed, as that value shouldn't really be modified at the end. But there is one interesting thing in here; what is the -2 for in these line? Any hint?: filter_state->decoded = filter_state->dec_buf + (LZW_MAX_DICTSIZE-2); > [1]: > @@ -687,7 +689,7 @@ > if (!lzwdec_put_decoded (st, out)) > return PDF_STM_FILTER_APPLY_STATUS_NO_OUTPUT; /* No more > output */ > > - if (st->dict.size == st->buffer.maxval - 1 - st->early_change) > + if (st->dict.size == st->buffer.maxval + 1 - st->early_change) > { > lzw_buffer_inc_bitsize (&st->buffer); > > As you said, encoding and decoding should be symmetrical; as soon as we know how the encoder should be fixed, we'll know what to do in the decoder, so I will not comment this sign change. Cheers! -- Aleksander
signature.asc
Description: This is a digitally signed message part