On Saturday, October 27, 2012 11:34 PM Noah Misch > On Sat, Oct 27, 2012 at 04:57:46PM +0530, Amit Kapila wrote: > > On Saturday, October 27, 2012 4:03 AM Noah Misch wrote: > > > Could you elaborate on your reason for continuing to treat TOAST as > a > > > special > > > case? As best I recall, the only reason to do so before was the > fact > > > that > > > TOAST can change the physical representation of a column even when > > > executor > > > did not change its logical content. Since you're no longer relying > on > > > the > > > executor's opinion of what changed, a TOASTed value is not special. > > > > I thought for initial version of patch, without this change, patch > will have > > less impact and less test. > > Not that I'm aware. If you still think so, please explain. > > > For this patch I am interested to go with delta encoding approach > based on > > column boundaries. > > Fair enough. > > > > If you conclude that finding sub-column similarity is not > worthwhile, at > > > least > > > teach your algorithm to aggregate runs of changing or unchanging > columns > > > into > > > fewer delta instructions. If a table contains twenty unchanging > bool > > > columns, > > > you currently use at least 80 bytes to encode that fact. By > treating > > > the run > > > of columns as a unit for delta encoding purposes, you could encode > it in > > > 23 > > > bytes. > > > > Do you mean to say handle for non-continuous unchanged columns? > > My statement above was a mess. > > > I believe for continuous unchanged columns its already handled until > there > > are any alignment changes. Example > > > > create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, > f7 > > bool, > > f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13 > bool, > > f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 > bool, > > > > f20 bool, f21 bool); > > > > insert into tbl values(10, > > > 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t', > 't', > > 't'); > > > > update tbl set f1 = 20; > > > > The delta algorithm for the above operation reduced the size of the > tuple > > from 24 bytes to 12 bytes. > > > > 4 bytes - IGN command and LEN > > 4 bytes - ADD command and LEN > > 4 bytes - Data block > > I now see that this case is already handled. Sorry for the noise. > Incidentally, I tried this variant: > > create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7 > bool, > f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13 > bool, > f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 > bool, > f20 bool, f21 bool, f22 int, f23 int); > insert into tbl values(1, > 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t', > 't', > 't', 2, 3); > update tbl set f1 = 2, f22 = 4, f23 = 6; > > It yielded an erroneous delta: IGN 4, ADD 4, COPY 24, IGN 4, ADD 4, COPY > 28, > IGN 4, ADD 4. (The delta happens to be longer than the data and goes > unused).
I think with new algorithm based on inputs by you this case will be handled in much better way. I am planning to try 2 approaches: 1. try to Use LZ compression in the manner suggested by Heikki as if it works, it can be simpler. 2. devise new algorithm based on your suggestions and referring LZ/VCdiff algorithms. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers