On Sat, Aug 3, 2019 at 11:11 PM Binguo Bao <djydew...@gmail.com> wrote: > > John Naylor <john.nay...@2ndquadrant.com> 于2019年8月2日周五 下午3:12写道: >> >> I like the use of a macro here. However, I think we can find a better >> location for the definition. See the header comment of fmgr.h: >> "Definitions for the Postgres function manager and function-call >> interface." Maybe tuptoaster.h is as good a place as any? > > PG_DETOAST_ITERATE isn't a sample function-call interface, > But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is > similar to PG_DETOAST_ITERATE, make condition check first and then > decide whether to call the function. Besides, PG_DETOAST_DATUM, > PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE, > PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable > to put all the de-TOAST interface together.
Hmm, it's strange that those macros ended up there, but now I see why it makes sense to add new ones there also. >> Okay, and see these functions now return void. The orignal >> pglz_decompress() returned a value that was check against corruption. >> Is there a similar check we can do for the iterative version? > > As far as I know, we can just do such check after all compressed data is > decompressed. > If we are slicing, we can't do the check. Okay. >> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with >> >> pglz_decompress(), and I have some questions on it: >> >> >> >> a). >> >> + srcend = (const unsigned char *) (source->limit == source->capacity >> >> ? source->limit : (source->limit - 4)); >> >> >> >> What does the 4 here mean in this expression? >> > >> > Since we fetch chunks one by one, if we make srcend equals to the source >> > buffer limit, >> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed the >> > source buffer limit and read unallocated bytes. >> >> Why is this? That tells me the limit is incorrect. Can the setter not >> determine the right value? > > There are three statments change `sp` value in the while loop `while (sp < > srcend && dp < destend)`: > `ctrl = *sp++;` > `off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;` > `len += *sp++` > Although we make sure `sp` is less than `srcend` when enter while loop, `sp` > is likely to > go beyond the `srcend` in the loop, and we should ensure that `sp` is always > smaller than `buf->limit` to avoid > reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. > Only one case is exceptional, > we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's > imposisble to read unallocated > data via `sp`. Thank you for the detailed explanation and the comment. >> Please explain further. Was the "sp + 1" correct behavior (and why), >> or only for debugging setting ctrl/c correctly? > > In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` > before > entering the loop `while (sp < srcend && dp < destend)`, when entering the > loop > and read a control byte(sp equals to `srcend` now), the program can't enter > the > loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set > `iter->ctrlc` to 0, > exit the first loop and then this iteration is over. At the next iteration, > the control byte will be reread since `iter->ctrlc` equals to 0, but the > previous control byte > is not used. Changing the condition to `sp + 1 < srcend` avoid only one > control byte is read > then the iterator is over. Okay, that's quite subtle. I agree the v6/7 way is more clear in this regard. >> Also, I don't think >> the new logic for the ctrl/c variables is an improvement: >> >> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case, >> which is confusing). Any time you initialize with something not 0 or >> 1, it's a magic number, and here it's far from where the loop variable >> is used. This is harder to read. > > `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the > end of > the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is > 0~7, > When `ctrlc` reaches 8, a control byte is read from the source > buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read > from the > source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be > intialized with '8'. My point here is it looks strange out of context, but "0" looked normal. Maybe a comment in init_detoast_buffer(), something like "8 means read a control byte from the source buffer on the first iteration, see pg_lzdecompress_iterate()". Or, possibly, we could have a macro like INVALID_CTRLC. That might even improve the readability of the original function. This is just an idea, and maybe others would disagree, so you don't need to change it for now. >> 3. At the end of the loop, iter->ctrl/c are unconditionally set. In >> v5, there was a condition which would usually avoid this copying of >> values through pointers. > > Patch v6 just records the value of `ctrlc` at the end of each iteration(or > loop) > whether it is valid (0~7) or 8, and initializes `ctrlc` on the next > iteration(or loop) correctly. > I think it is more concise in patch v6. And, in the case mentioned above where we enter the while loop with sp = src_end - 1 , we can read the control byte and still store the correct value for ctrlc. >>> [varlena.c api] >> That's better, and I think we can take it a little bit farther. >> >> 1. Notice that TextPositionState is allocated on the stack in >> text_position(), which passes both the "state" pointer and the "iter" >> pointer to text_position_setup(), and only then sets state->iter = >> iter. We can easily set this inside text_position(). That would get >> rid of the need for other callers to pass NULL iter to >> text_position_setup(). > > > Done. That looks much cleaner, thanks. I've repeated my performance test to make sure there's no additional regression in my suffix tests: master patch v7 comp. end 1560s 1600ms uncomp. end 896ms 890ms The regression from master in the compressed case is about 2.5%, which is no different from the last patch I tested, so that's good. At this point, there are no functional things that I think we need to change. It's close to ready-for-committer. For the next version, I'd like you go through the comments and edit for grammar, spelling, and clarity as you see fit. I know you're not a native speaker of English, so I can help you with anything that remains. Also note we use braces on their own lines { like this } We do have a source formatting tool (pgindent), but it helps readability for committers to have it mostly standard beforehand. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services