On Mon, Jul 1, 2013 at 3:49 PM, Andres Freund <and...@2ndquadrant.com> wrote:
>> I must be missing something.  At that point, yes, you'd like to avoid
>> re-toasting unnecessarily, but ISTM you've already bought the farm.
>> Unless I'm misunderstanding the code as written, you'd just end up
>> writing the indirect pointer back out to disk in that scenario.
>> That's not gonna work...
>
> You didn't misunderstand anything unfortunately. I broke that somewhere
> along the road, probably when factoring things out to
> toast_datum_differs(). Fixed...
> Which shows that we need tests. I've added some using a function in
> regress.c that makes all datums indirect. Together with a triggers that
> allows some testing.

OK, that's nifty.

+                               elog(ERROR, "deleteing a tuple indirect datums 
doesn't make sense");

That's spelled wrong.

+       Assert(VARATT_IS_EXTERNAL_ONDISK(old_value));
+
+       /* fast path for the common case where we have the toast oid available 
*/
+       if (VARATT_IS_EXTERNAL_ONDISK(old_value) &&
+               VARATT_IS_EXTERNAL_ONDISK(new_value))
+               return memcmp((char *) old_value, (char *) new_value,
+                                         VARSIZE_EXTERNAL(old_value)) != 0;

You've already asserted VARATT_IS_EXTERNAL_ONDISK(old_value), so there
can't be any need to test that condition again.

+       changed = memcmp(VARDATA_ANY(old_data), VARDATA_ANY(new_data),
+                                 VARSIZE_ANY_EXHDR(old_data)) != 0;
+       if (!changed)
+               *needs_rewrite = true;
+
+       /* heap_tuple_untoast_attr copied data */
+       pfree(old_data);
+       if (new_value != new_data)
+               pfree(new_data);
+       return changed;

So... basically, we always set needs_rewrite to !changed, except when
VARATT_IS_EXTERNAL_ONDISK(new_value).  In that latter case we return
after doing memcmp().  That's kind of byzantine and could probably be
simplified by pushing the first if-clause of this function up into the
caller and then making the function's job to simply compare the datums
on the assumption that the new datum is not external on-disk.  Then
you wouldn't need this funny three-way return value done as two
Booleans.

But backing up a minute, this is really a significant behavior change
that is independent of the purpose of the rest of this patch.  What
you're proposing here is that every time we consider toasting a value
on update, we should first check whether it's byte-for-byte equivalent
to the old value.  That may or may not be a good idea - it will suck
if, for example, a user repeatedly updates a very long string by
changing only the last character thereof, but it will win in other
cases.  Whether it's a good idea or not, I think it deserves to be a
separate patch.  For purposes of this patch, I think you should just
assume that any external-indirect value needs to be retoasted, just as
we currently assume for untoasted values.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to