Re: Proposal: Adding compression of temporary files

Filip Janus Tue, 22 Apr 2025 00:18:12 -0700

Since the patch was prepared months ago, it needs to be rebased.

    -Filip-



ne 13. 4. 2025 v 21:53 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
> > > +    else
> > > +        if (nbytesOriginal - file->pos != 0)
> > > +            /* curOffset must be corrected also if compression is
> > > +             * enabled, nbytes was changed by compression but we
> > > +             * have to use the original value of nbytes
> > > +             */
> > > +            file->curOffset-=bytestowrite;
> > >
> > > It's not something introduced by the compression patch - the first part
> > > is what we used to do before. But I find it a bit confusing - isn't it
> > > mixing the correction of "logical file position" adjustment we did
> > > before, and also the adjustment possibly needed due to compression?
> > >
> > > In fact, isn't it going to fail if the code gets multiple loops in
> > >
> > >     while (wpos < file->nbytes)
> > >     {
> > >        ...
> > >     }
> > >
> > > because bytestowrite will be the value from the last loop? I haven't
> > > tried, but I guess writing wide tuples (more than 8k) might fail.
> > >
> >
> > I will definitely test it with larger tuples than 8K.
> >
> > Maybe I don't understand it correctly,
> > the adjustment is performed in the case that file->nbytes and file->pos
> > differ.
> > So it must persist also if we are working with the compressed data, but
> the
> > problem is that data stored and compressed on disk has different sizes
> than
> > data incoming uncompressed ones, so what should be the correction value.
> > By debugging, I realized that the correction should correspond to the
> size
> > of
> > bytestowrite from the last iteration of the loop.
>
> I agree, this looks strange. If the idea is to set curOffset to its
> original value + pos, and the original value was advanced multiple times
> by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
> seems incorrect to adjust it only once. From what I see current tests do
> not exercise a case where the while will get multiple loops, so it looks
> fine.
>
> At the same time maybe I'm missing something, but how exactly such test
> for 8k tuples and multiple loops in the while block should look like?
> E.g. when I force a hash join on a table with a single wide text column,
> the minimal tuple that is getting written to the temporary file still
> has rather small length, I assume due to toasting. Is there some other
> way to achieve that?
>
>

0002-Add-test-for-temporary-files-compression-this-commit.patch
Description: Binary data

0001-This-commit-adds-support-for-temporary-files-compres.patch
Description: Binary data

Re: Proposal: Adding compression of temporary files

Reply via email to