On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 7/15/24 20:50, Julien Tachoires wrote: > > Hi, > > > > Le ven. 7 juin 2024 à 06:18, Julien Tachoires <jul...@gmail.com> a écrit : > >> > >> Le ven. 7 juin 2024 à 05:59, Tomas Vondra > >> <tomas.von...@enterprisedb.com> a écrit : > >>> > >>> On 6/6/24 12:58, Julien Tachoires wrote: > >>>> ... > >>>> > >>>> When compiled with LZ4 support (--with-lz4), this patch enables data > >>>> compression/decompression of these temporary files. Each transaction > >>>> change that must be written on disk (ReorderBufferDiskChange) is now > >>>> compressed and encapsulated in a new structure. > >>>> > >>> > >>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this > >>> be supported even for pglz, or whatever algorithms we add in the future? > >> > >> That's right, reworking this patch in that sense. > > > > Please find a new version of this patch adding support for LZ4, pglz > > and ZSTD. It introduces the new GUC logical_decoding_spill_compression > > which is used to set the compression method. In order to stay aligned > > with the other server side GUCs related to compression methods > > (wal_compression, default_toast_compression), the compression level is > > not exposed to users. > > > > Sounds reasonable. I wonder if it might be useful to allow specifying > the compression level in those places, but that's clearly not something > this patch needs to do. > > > The last patch of this set is still in WIP, it adds the machinery > > required for setting the compression methods as a subscription option: > > CREATE SUBSCRIPTION ... WITH (spill_compression = ...); > > I think there is a major problem with this approach: the logical > > decoding context is tied to one replication slot, but multiple > > subscriptions can use the same replication slot. How should this work > > if 2 subscriptions want to use the same replication slot but different > > compression methods? > > > > Do we really support multiple subscriptions sharing the same slot? I > don't think we do, but maybe I'm missing something. > > > At this point, compression is only available for the changes spilled > > on disk. It is still not clear to me if the compression of data > > transiting through the streaming protocol should be addressed by this > > patch set or by another one. Thought ? > > > > I'd stick to only compressing the data spilled to disk. It might be > useful to compress the streamed data too, but why shouldn't we compress > the regular (non-streamed) transactions too? Yeah, it's more efficient > to compress larger chunks, but we can fit quite large transactions into > logical_decoding_work_mem without spilling. > > FWIW I'd expect that to be handled at the libpq level - there's already > a patch for that, but I haven't checked if it would handle this. But > maybe more importantly, I think compressing streamed data might need to > handle some sort of negotiation of the compression algorithm, which > seems fairly complex. > > To conclude, I'd leave this out of scope for this patch. >
Your point sounds reasonable to me. OTOH, if we want to support compression for spill case then shouldn't there be a question how frequent such an option would be required? Users currently have an option to stream large transactions for parallel apply or otherwise in which case no spilling is required. I feel sooner or later we will make such behavior (streaming=parallel) as default, and then spilling should happen in very few cases. Is it worth adding this new option and GUC if that is true? -- With Regards, Amit Kapila.