Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-28 Thread Andres Freund
On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Just for some clarity, that also happens with expressions like:
  WHERE
  ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
  '_RETURN', NULL)
  ORDER BY ROW(ev_class, rulename, ev_action);
 
  Previously we wouldn't detoast ev_action here, but now we do.
 ...
 The extra detoast you're seeing in this example is caused by the ROW()
 in the ORDER BY.  Which, as I said, is just bad SQL coding.

Good point.

  I agree that this needs to get backpatched at some point. But people
  also get very upset if queries gets slower by a magnitude or two after a
  minor version upgrade. And this does have the potential to do that, no?
 
 They don't get nearly as upset as they do if the database loses their
 data.  Yes, in an ideal world, we could fix this and not suffer any
 performance loss anywhere.  But no such option is on the table, and
 waiting is not going to make one appear.  What waiting *will* do is
 afford more opportunities for this bug to eat people's data.

We make tradeoffs between performance and safety *all the time*. A new
performance regression that has the potential of affecting a fair number
of installations very well can be worse than a decade old correctness
bug. A bug that only will hit in some specific usage scenarios and will
only affect individual datums.

And you *have* come up with an alternative approach. It might very well
be inferior, but that certainly doesn't make this approach one without
alternatives.

Anyway, I've now voiced my doubts about immediate backpatching. Any
other opinions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
 They don't get nearly as upset as they do if the database loses their
 data.  Yes, in an ideal world, we could fix this and not suffer any
 performance loss anywhere.  But no such option is on the table, and
 waiting is not going to make one appear.  What waiting *will* do is
 afford more opportunities for this bug to eat people's data.

 We make tradeoffs between performance and safety *all the time*.

Really?  I'm not aware of any case where we've left unrecoverable data
corruption go unfixed.  It's true that we've rejected fixing some cases
where plpgsql code might transiently try to use a stale toast pointer ---
but that's a lot different from losing stored data permanently.

 And you *have* come up with an alternative approach.

No, I haven't.  I experimented with a different approach, at Noah's
insistence.  But I did not and will not try to extend it to a complete
solution, first because I'm unsure that a 100% solution can reasonably
be reached that way, and second because that approach *also* entails
performance penalties --- unavoidable ones, unlike this approach where
people can modify their SQL code to avoid fetching unnecessary columns.

If somebody *else* is sufficiently hell-bent on doing it that way that
they will take responsibility for completing and back-patching that
approach, I will stand aside and let them find out the downsides.
Otherwise, I propose to commit and back-patch this version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-27 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
 Meh ... is it likely that the columns involved in an ordering comparison
 would be so wide as to be toasted out-of-line?  Such a query would only be
 fast if the row value were indexed, which would pretty much preclude use
 of wide columns.

 Just for some clarity, that also happens with expressions like:
 WHERE
 ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
 '_RETURN', NULL)
 ORDER BY ROW(ev_class, rulename, ev_action);

 which is what is generated by such query generators - where the leading
 columns *are* indexed but not necessarily unique.

Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
Your first example could be greatly improved by expanding the whole-row
Var into a ROW() construct (so that RowCompareExpr could be used), and
the second one by exploding the ROW() order-by into separate order-by
columns.  Maybe someday we can do that, or persuade the query generators
not to generate such brain-dead SQL in the first place.  But in the
meantime these coding techniques lead to highly suboptimal plans anyway,
with or without TOAST considerations.  It's also worth noting that
it's merest luck that the existing code isn't *slower* about such
queries; if there were any significant number of comparisons of the
toasted columns occurring during the sort step, it could come out far
behind.  So I'm not finding myself terribly concerned here.

Also, I did a bit more research and verified that my patch doesn't cause
any extra detoasting activity for simple set-returning-function cases,
for example:

regression=# create or replace function pgr() returns setof pg_rewrite as
'declare r pg_rewrite; 
begin
for r in select * from pg_rewrite loop
  return next r;
end loop;
end' language plpgsql;
CREATE FUNCTION

regression=# explain (analyze, buffers) select r.* from pgr() r;
 QUERY PLAN 

 Function Scan on pgr r  (cost=0.25..10.25 rows=1000 width=135) (actual 
time=0.881..0.911 rows=177 loops=1)
   Buffers: shared hit=36
 Planning time: 0.059 ms
 Execution time: 0.986 ms

The same for SQL-language functions, either inlined or not.  It's not so
good if you insist on putting the SRF call in the targetlist:

explain (analyze, buffers) select pgr();  
QUERY PLAN  
--
 Result  (cost=0.00..5.25 rows=1000 width=0) (actual time=0.941..10.575 
rows=177 loops=1)
   Buffers: shared hit=179
 Planning time: 0.029 ms
 Execution time: 10.677 ms

On the other hand, in real-world usage (not EXPLAIN), a query like that is
certainly going to be detoasting all the fields anyway to return them to
the client.

On the whole I feel fairly good about the opinion that this change won't
be disastrous for mainstream usages, and will be beneficial for
performance some of the time.  Since I'm not hearing any volunteers to
try to convert the other approach into a complete patch, I plan to push
forward with this one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-27 Thread Andres Freund
On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
  Meh ... is it likely that the columns involved in an ordering comparison
  would be so wide as to be toasted out-of-line?  Such a query would only be
  fast if the row value were indexed, which would pretty much preclude use
  of wide columns.
 
  Just for some clarity, that also happens with expressions like:
  WHERE
  ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
  '_RETURN', NULL)
  ORDER BY ROW(ev_class, rulename, ev_action);
 
  which is what is generated by such query generators - where the leading
  columns *are* indexed but not necessarily unique.
 
 Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
 Your first example could be greatly improved by expanding the whole-row
 Var into a ROW() construct (so that RowCompareExpr could be used), and
 the second one by exploding the ROW() order-by into separate order-by
 columns.

The problem is that - at least to my knowledge - it's not possible to do
the WHERE part as an indexable clause using individual columns.

 On the whole I feel fairly good about the opinion that this change won't
 be disastrous for mainstream usages, and will be beneficial for
 performance some of the time.

I am less convinced of that. But I don't have a better idea. How about
letting it stew in HEAD for a while? It's not like it's affecting all
that many people, given the amount of reports over the last couple
years.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-27 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
 Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
 Your first example could be greatly improved by expanding the whole-row
 Var into a ROW() construct (so that RowCompareExpr could be used), and
 the second one by exploding the ROW() order-by into separate order-by
 columns.

 The problem is that - at least to my knowledge - it's not possible to do
 the WHERE part as an indexable clause using individual columns.

You mean like this?

regression=# EXPLAIN verbose SELECT * FROM pg_rewrite r WHERE r  ('x'::name, 
'11854'::oid, NULL, NULL, NULL, NULL, null);
   QUERY PLAN   
 
-
 Seq Scan on pg_catalog.pg_rewrite r  (cost=0.00..46.21 rows=59 width=983)
   Output: rulename, ev_class, ev_type, ev_enabled, is_instead, ev_qual, 
ev_action
   Filter: (r.*  ROW('x'::name, 11854::oid, NULL::unknown, NULL::unknown, 
NULL::unknown, NULL::unknown, NULL::unknown))
 Planning time: 0.119 ms
(4 rows)

regression=# EXPLAIN verbose SELECT * FROM pg_rewrite r WHERE row(rulename, 
ev_class, ev_type, ev_enabled, is_instead, ev_qual, ev_action)  ('x'::name, 
'11854'::oid, NULL, NULL, NULL, NULL, null);

   QUERY PLAN   


 Index Scan using pg_rewrite_rel_rulename_index on pg_catalog.pg_rewrite r  
(cost=0.15..13.50 rows=1 width=983)
   Output: rulename, ev_class, ev_type, ev_enabled, is_instead, ev_qual, 
ev_action
   Index Cond: (ROW(r.rulename, r.ev_class) = ROW('x'::name, 11854::oid))
   Filter: (ROW(r.rulename, r.ev_class, r.ev_type, r.ev_enabled, r.is_instead, 
(r.ev_qual)::text, (r.ev_action)::text)  ROW('x'::name, 11854::oid, 
NULL::char, NULL::char, NULL::boolean, NULL::text, NULL::text))
 Planning time: 0.201 ms
(5 rows)

The code for extracting prefixes of RowCompareExprs like that has existed
for quite some time.  But it doesn't understand about whole-row variables.

 On the whole I feel fairly good about the opinion that this change won't
 be disastrous for mainstream usages, and will be beneficial for
 performance some of the time.

 I am less convinced of that. But I don't have a better idea. How about
 letting it stew in HEAD for a while? It's not like it's affecting all
 that many people, given the amount of reports over the last couple
 years.

Well, mumble.  It's certainly true that it took a long time for someone to
produce a reproducible test case.  But it's not like we don't regularly
hear reports of corrupted data with missing chunk number 0 for toast
value   Are you really going to argue that few of those reports can
be blamed on this class of bug?  If so, on what evidence?  Of course
I have no evidence either to claim that this *is* biting people; we don't
know, since it never previously occurred to us to ask complainants if they
were using arrays-of-composites or one of the other risk cases.  But it
seems to me that letting a known data-loss bug go unfixed on the grounds
that the fix might create performance issues for some people is not a
prudent way to proceed.  People expect a database to store their data
reliably, period.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-27 Thread Andres Freund
On 2014-04-27 17:49:53 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
  Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
  Your first example could be greatly improved by expanding the whole-row
  Var into a ROW() construct (so that RowCompareExpr could be used), and
  the second one by exploding the ROW() order-by into separate order-by
  columns.
 
  The problem is that - at least to my knowledge - it's not possible to do
  the WHERE part as an indexable clause using individual columns.
 
 You mean like this?
 
 ..
 The code for extracting prefixes of RowCompareExprs like that has existed
 for quite some time.  But it doesn't understand about whole-row variables.

Yea, but:

 Just for some clarity, that also happens with expressions like:
 WHERE
 ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
 '_RETURN', NULL)
 ORDER BY ROW(ev_class, rulename, ev_action);

Previously we wouldn't detoast ev_action here, but now we do.

e.g.
set enable_seqscan = false; set enable_bitmapscan = false;
EXPLAIN (ANALYZE, BUFFERS)
SELECT oid
FROM pg_rewrite
WHERE ROW(ev_class, rulename, ev_action) = ROW('pg_user_mappings'::regclass, 
'_RETURN',NULL)
ORDER BY ROW(ev_class, rulename, ev_action);

goes from 0.527ms to 9.698ms. And that's damn few rows.

  On the whole I feel fairly good about the opinion that this change won't
  be disastrous for mainstream usages, and will be beneficial for
  performance some of the time.
 
  I am less convinced of that. But I don't have a better idea. How about
  letting it stew in HEAD for a while? It's not like it's affecting all
  that many people, given the amount of reports over the last couple
  years.
 
 Well, mumble.  It's certainly true that it took a long time for someone to
 produce a reproducible test case.  But it's not like we don't regularly
 hear reports of corrupted data with missing chunk number 0 for toast
 value   Are you really going to argue that few of those reports can
 be blamed on this class of bug?  If so, on what evidence?

No, I am not claiming that.

 Of course
 I have no evidence either to claim that this *is* biting people; we don't
 know, since it never previously occurred to us to ask complainants if they
 were using arrays-of-composites or one of the other risk cases.  But it
 seems to me that letting a known data-loss bug go unfixed on the grounds
 that the fix might create performance issues for some people is not a
 prudent way to proceed.  People expect a database to store their data
 reliably, period.

I agree that this needs to get backpatched at some point. But people
also get very upset if queries gets slower by a magnitude or two after a
minor version upgrade. And this does have the potential to do that, no?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-27 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Just for some clarity, that also happens with expressions like:
 WHERE
 ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, '_RETURN', 
 NULL)
 ORDER BY ROW(ev_class, rulename, ev_action);

 Previously we wouldn't detoast ev_action here, but now we do.

No, we don't; not in that WHERE expression, because it's a RowCompareExpr
which does not involve forming any composite datums.  We'd only detoast if
the row comparison actually gets to that column --- which is the same
behavior as before.

The extra detoast you're seeing in this example is caused by the ROW()
in the ORDER BY.  Which, as I said, is just bad SQL coding.

 I agree that this needs to get backpatched at some point. But people
 also get very upset if queries gets slower by a magnitude or two after a
 minor version upgrade. And this does have the potential to do that, no?

They don't get nearly as upset as they do if the database loses their
data.  Yes, in an ideal world, we could fix this and not suffer any
performance loss anywhere.  But no such option is on the table, and
waiting is not going to make one appear.  What waiting *will* do is
afford more opportunities for this bug to eat people's data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Heikki Linnakangas

On 04/25/2014 02:40 AM, Tom Lane wrote:

* The patch changes HeapTupleGetDatum from a simple inline macro into
a function call.  This means that third-party extensions will not get
protection against creation of toast-pointer-containing composite Datums
until they recompile.


One consequence of that is that an extension compiled with headers from 
new minor version won't work with binaries from an older minor version. 
Packagers beware.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 04/25/2014 02:40 AM, Tom Lane wrote:
 * The patch changes HeapTupleGetDatum from a simple inline macro into
 a function call.  This means that third-party extensions will not get
 protection against creation of toast-pointer-containing composite Datums
 until they recompile.

 One consequence of that is that an extension compiled with headers from 
 new minor version won't work with binaries from an older minor version. 
 Packagers beware.

Yeah, that's a possible issue, though I think we've done such things
before.  In any case, alternative approaches to fixing this would likely
also involve extensions needing to call core functions that don't exist
today; though maybe the number of affected extensions would be smaller.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Andres Freund
Hi,

On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
 * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
 might happen if the caller changes CurrentMemoryContext between
 heap_form_tuple and HeapTupleGetDatum.

It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
really have a better backward compatible idea :(

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
 * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
 might happen if the caller changes CurrentMemoryContext between
 heap_form_tuple and HeapTupleGetDatum.

 It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
 really have a better backward compatible idea :(

It's hardly without precedent; see PG_RETURN_INT64 or PG_RETURN_FLOAT8 on
a 32-bit machine, for starters.  There's never been an assumption that
these macros couldn't do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Andres Freund
On 2014-04-25 11:22:09 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
  * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
  might happen if the caller changes CurrentMemoryContext between
  heap_form_tuple and HeapTupleGetDatum.
 
  It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
  really have a better backward compatible idea :(
 
 It's hardly without precedent; see PG_RETURN_INT64 or PG_RETURN_FLOAT8 on
 a 32-bit machine, for starters.  There's never been an assumption that
 these macros couldn't do that.

There's a fair bit of difference between allocating 8 bytes and
allocation of nearly unbounded size... But as I said, I don't really
have a better idea.

I agree that the risk from this patch seems more manageable than your
previous approach.

The case I am worried most about is queries like:
SELECT a, b FROM f WHERE f  ROW(38, 'whatever') ORDER BY f;
I've seen such generated by a some query generators for paging. But I
guess that's something we're going to have to accept.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The case I am worried most about is queries like:
 SELECT a, b FROM f WHERE f  ROW(38, 'whatever') ORDER BY f;
 I've seen such generated by a some query generators for paging. But I
 guess that's something we're going to have to accept.

Meh ... is it likely that the columns involved in an ordering comparison
would be so wide as to be toasted out-of-line?  Such a query would only be
fast if the row value were indexed, which would pretty much preclude use
of wide columns.

I'm actually more worried about the function-returning-tuple case, as that
might bite people who thought they'd use some cute functional notation or
other and it wouldn't cost 'em anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Andres Freund
On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  The case I am worried most about is queries like:
  SELECT a, b FROM f WHERE f  ROW(38, 'whatever') ORDER BY f;
  I've seen such generated by a some query generators for paging. But I
  guess that's something we're going to have to accept.
 
 Meh ... is it likely that the columns involved in an ordering comparison
 would be so wide as to be toasted out-of-line?  Such a query would only be
 fast if the row value were indexed, which would pretty much preclude use
 of wide columns.

In the cases I've seen it it was usually used in addition to a indexable
condition, just for paging across different http requests.

As completely ridiculous example:
before:
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r  
('x'::name, '11854'::oid, NULL, NULL, NULL, NULL);
QUERY PLAN  
  
--
 Seq Scan on pg_rewrite r  (cost=0.00..12.36 rows=36 width=720) (actual 
time=0.425..0.425 rows=0 loops=1)
   Filter: (r.*  ROW('x'::name, 11854::oid, NULL::unknown, NULL::unknown, 
NULL::unknown, NULL::unknown))
   Rows Removed by Filter: 109
   Buffers: shared hit=11
 Planning time: 0.141 ms
 Execution time: 0.485 ms

after:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r  ('x'::name, 
'11854'::oid, NULL, NULL, NULL, NULL);
 QUERY PLAN 


 Seq Scan on pg_rewrite r  (cost=0.00..12.36 rows=36 width=720) (actual 
time=14.257..14.257 rows=0 loops=1)
   Filter: (r.*  ROW('x'::name, 11854::oid, NULL::unknown, NULL::unknown, 
NULL::unknown, NULL::unknown))
   Rows Removed by Filter: 109
   Buffers: shared hit=152
 Planning time: 0.139 ms
 Execution time: 14.310 ms
(6 rows)


 I'm actually more worried about the function-returning-tuple case, as that
 might bite people who thought they'd use some cute functional notation or
 other and it wouldn't cost 'em anything.

Right, that's not actually all that infrequent :/.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-25 Thread Andres Freund
On 2014-04-25 18:25:44 +0200, Andres Freund wrote:
 On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   The case I am worried most about is queries like:
   SELECT a, b FROM f WHERE f  ROW(38, 'whatever') ORDER BY f;
   I've seen such generated by a some query generators for paging. But I
   guess that's something we're going to have to accept.
  
  Meh ... is it likely that the columns involved in an ordering comparison
  would be so wide as to be toasted out-of-line?  Such a query would only be
  fast if the row value were indexed, which would pretty much preclude use
  of wide columns.
 
 In the cases I've seen it it was usually used in addition to a indexable
 condition, just for paging across different http requests.
 
 As completely ridiculous example:

 before:
 postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r  
 ('x'::name, '11854'::oid, NULL, NULL, NULL, NULL);
 QUERY PLAN
 

Just for some clarity, that also happens with expressions like:
WHERE
ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
'_RETURN', NULL)
ORDER BY ROW(ev_class, rulename, ev_action);

which is what is generated by such query generators - where the leading
columns *are* indexed but not necessarily unique.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-24 Thread Tom Lane
I wrote:
 I'm actually planning to set this patch on the shelf for a bit and go
 investigate the other alternative, ie, not generating composite Datums
 containing toast pointers in the first place.

Here's a draft patch along those lines.  It turned out to be best to
leave heap_form_tuple() alone and instead put the dirty work into
HeapTupleGetDatum().  That has some consequences worth discussing:

* The patch changes HeapTupleGetDatum from a simple inline macro into
a function call.  This means that third-party extensions will not get
protection against creation of toast-pointer-containing composite Datums
until they recompile.  I'm not sure that this is a big deal, though.
After looking through the existing core and contrib code, it seems that
nearly all users of HeapTupleGetDatum are building their tuples from
locally-sourced data that could not possibly be toasted anyway.  (This is
true a-priori for users of BuildTupleFromCStrings, for example, and seems
to be true of all uses of HeapTupleGetDatum in SQL-callable functions.)
So it's entirely possible that there would be no live bug anywhere even
without recompiles.

* If we were sufficiently worried about the previous point, we could
get some protection against unfixed extension code by not removing
toast_flatten_tuple_attribute() in the back branches, only in HEAD.
I'm doubtful that it's worth the cycles though.

* Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
might happen if the caller changes CurrentMemoryContext between
heap_form_tuple and HeapTupleGetDatum.  I could only find two places
that did that, though, both in plpgsql.  I thought about having
HeapTupleGetDatum try to identify the context the passed tuple had been
allocated in, but the problem with that is the passed tuple isn't
necessarily heap-allocated at all --- in fact, one of the two problematic
places in plpgsql passes a pointer to a stack-local variable, so we'd
actually crash if we tried to apply GetMemoryChunkContext() to it.
Of course we can (and I did) change plpgsql, but the question is whether
any third-party code has copied either coding pattern.  On balance it
seems best to just use palloc; that's unlikely to cause anything worse
than a memory leak.

* I was quite pleased with the size of the patch: under 100 net new lines,
half of that being a new regression test case.  And it's worth emphasizing
that this is a *complete* fix, modulo the question of third-party code
recompiles.  The patch I showed a few days ago added several times that
much just to fix arrays of composites, and I wasn't too confident that it
fixed every case even for arrays.

* The cases where an extra detoast would be incurred are pretty narrow:
basically, whole-row-Var references, ROW() constructs, and the outputs of
functions returning tuples.  As discussed earlier, whether this would cost
anything or save anything would depend on the number of subsequent uses of
the composite Datum's previously-toasted fields.  But I'm thinking that
the amount of application code that would actually be impacted in either
direction is probably pretty small.  Moreover, we aren't adding any
noticeable overhead in cases where no detoasting occurs, unlike the
situation with the previous patch.  (In fact, we're saving some overhead
by getting rid of syscache lookups in toast_flatten_tuple_attribute.)

So, despite Noah's misgivings, I'm thinking this is the way to proceed.

Comments?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..c64ede9 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** heap_copytuple_with_tuple(HeapTuple src,
*** 617,622 
--- 617,657 
  	memcpy((char *) dest-t_data, (char *) src-t_data, src-t_len);
  }
  
+ /* 
+  *		heap_copy_tuple_as_datum
+  *
+  *		copy a tuple as a composite-type Datum
+  * 
+  */
+ Datum
+ heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc)
+ {
+ 	HeapTupleHeader td;
+ 
+ 	/*
+ 	 * If the tuple contains any external TOAST pointers, we have to inline
+ 	 * those fields to meet the conventions for composite-type Datums.
+ 	 */
+ 	if (HeapTupleHasExternal(tuple))
+ 		return toast_flatten_tuple_to_datum(tuple-t_data,
+ 			tuple-t_len,
+ 			tupleDesc);
+ 
+ 	/*
+ 	 * Fast path for easy case: just make a palloc'd copy and insert the
+ 	 * correct composite-Datum header fields (since those may not be set if
+ 	 * the given tuple came from disk, rather than from heap_form_tuple).
+ 	 */
+ 	td = (HeapTupleHeader) palloc(tuple-t_len);
+ 	memcpy((char *) td, (char *) tuple-t_data, tuple-t_len);
+ 
+ 	HeapTupleHeaderSetDatumLength(td, tuple-t_len);
+ 	HeapTupleHeaderSetTypeId(td, tupleDesc-tdtypeid);
+ 	HeapTupleHeaderSetTypMod(td, tupleDesc-tdtypmod);
+ 
+ 	return PointerGetDatum(td);
+ }
+ 
  /*
   * heap_form_tuple
   *		construct a tuple from 

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-22 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
 I'm actually planning to set this patch on the shelf for a bit and go
 investigate the other alternative, ie, not generating composite Datums
 containing toast pointers in the first place.

 I maintain that the potential slowdown is too great to consider adopting that
 for the sake of a cleaner patch.  Your last message examined a 67% performance
 regression.  The strategy you're outlining now can slow a query by 1,000,000%.

[ shrug... ]  It could also speed up a query by similar factors.  I see
no good reason to suppose that it would be a net loss overall.  I agree
that it might change performance characteristics in a way that we'd
ideally not do in the back branches.  But the fact remains that we've
got a bad bug to fix, and absent a reasonably trustworthy functional fix,
arguing about performance characteristics is a waste of breath.  I can
make it arbitrarily fast if it's not required to give the right answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 I wonder how it would work out to instead delay this new detoast effort until
 toast_insert_or_update().

That would require toast_insert_or_update() to know about every container
datatype.  I doubt it could lead to an extensible or maintainable
solution.

I'm actually planning to set this patch on the shelf for a bit and go
investigate the other alternative, ie, not generating composite Datums
containing toast pointers in the first place.  We now know that the idea
that we aren't going to take performance hits *somewhere* is an illusion,
and I still suspect that the other way is going to lead to a smaller and
cleaner patch.  The main performance downside for plpgsql might be
addressable by making sure that plpgsql record variables fall on the heap
tuple rather than the composite Datum side of the line.  I'm also quite
concerned about correctness: I don't have a lot of confidence that this
patch has closed every loophole with respect to arrays, and it hasn't even
touched ranges or any of the related one-off bugs that I believe exist.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Andres Freund
Hi,

On 2014-04-20 15:38:23 -0400, Tom Lane wrote:
 Some poking around with oprofile says that much of the added time is
 coming from added syscache and typcache lookups, as I suspected.
 
 I did some further work on the patch to make it possible to cache
 the element tuple descriptor across calls for these two functions.
 With the attached patch, the first test case comes down to about 335 ms
 and the second to about 1475 ms (plpgsql is still leaving some extra
 cache lookups on the table).  More could be done with some further API
 changes, but I think this is about as much as is safe to back-patch.

I unfortunately haven't followed this in detail, but shouldn't it be
relatively easily to make this even cheaper by checking for
HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
composite's columns, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I unfortunately haven't followed this in detail, but shouldn't it be
 relatively easily to make this even cheaper by checking for
 HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
 composite's columns, right?

That's the point I made further down: we could do that if we were willing
to abandon the principle that nested fields shouldn't be compressed.
It's not very clear what it'd cost us to give that up.  (Too bad we didn't
define a HEAP_HASCOMPRESSED flag bit ...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Andres Freund
On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I unfortunately haven't followed this in detail, but shouldn't it be
  relatively easily to make this even cheaper by checking for
  HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
  composite's columns, right?
 
 That's the point I made further down:

Oh, sorry. I started reading this thread from the end just now.

 we could do that if we were willing
 to abandon the principle that nested fields shouldn't be compressed.
 It's not very clear what it'd cost us to give that up.

I don't think the cost of that would be all that high. As you argue,
without that trick the cost of iterating over all columns will be paid
all the time, whereas double compression will take effect much less
often. And might even end up being beneficial.
The risk of significant performance regressions due to backpatching
seems significantly less likely if we pay heed to HASEXTERNAL.

 (Too bad we didn't define a HEAP_HASCOMPRESSED flag bit ...)

And too bad that infomask bits are so scarce :(. We really need to
reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Merlin Moncure
On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I unfortunately haven't followed this in detail, but shouldn't it be
  relatively easily to make this even cheaper by checking for
  HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
  composite's columns, right?

 That's the point I made further down:

 Oh, sorry. I started reading this thread from the end just now.

 we could do that if we were willing
 to abandon the principle that nested fields shouldn't be compressed.
 It's not very clear what it'd cost us to give that up.

 I don't think the cost of that would be all that high.

I think it's pretty reasonable too.

 And too bad that infomask bits are so scarce :(. We really need to
 reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

The only consequence of that is losing support for in-place update for
pre-9.0 (of which the only supported version is 8.4).  I figure it's
also pretty reasonable to drop support for IPU for out of support
versions for new versions going forward.  That would recover the bits
and yield some nice cleanups in tqual.c.

8.4 is scheduled to go out of support in July, so if you agree with my
reasoning 9.4 would be a candidate for not honoring 8.4 upgrade
support.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Alvaro Herrera
Merlin Moncure wrote:
 On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund and...@2ndquadrant.com 
 wrote:

  And too bad that infomask bits are so scarce :(. We really need to
  reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.
 
 The only consequence of that is losing support for in-place update for
 pre-9.0 (of which the only supported version is 8.4).  I figure it's
 also pretty reasonable to drop support for IPU for out of support
 versions for new versions going forward.  That would recover the bits
 and yield some nice cleanups in tqual.c.

There's no way to be sure that the bits have been removed from tables
that were upgraded from a 8.4 server to a 9.0 server and from there to a
newer server.  We'd need some way to catalogue which tables have been
cleaned prior to an upgrade to a release that no longer accepts those
bits; pg_upgrade would then say table xyz needs to be vacuumed before
the upgrade in check mode, or something like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-21 Thread Noah Misch
On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  I wonder how it would work out to instead delay this new detoast effort 
  until
  toast_insert_or_update().
 
 That would require toast_insert_or_update() to know about every container
 datatype.  I doubt it could lead to an extensible or maintainable
 solution.

If that's its worst drawback, it's excellent.

 I'm actually planning to set this patch on the shelf for a bit and go
 investigate the other alternative, ie, not generating composite Datums
 containing toast pointers in the first place.  We now know that the idea
 that we aren't going to take performance hits *somewhere* is an illusion,
 and I still suspect that the other way is going to lead to a smaller and
 cleaner patch.  The main performance downside for plpgsql might be
 addressable by making sure that plpgsql record variables fall on the heap
 tuple rather than the composite Datum side of the line.  I'm also quite
 concerned about correctness: I don't have a lot of confidence that this
 patch has closed every loophole with respect to arrays, and it hasn't even
 touched ranges or any of the related one-off bugs that I believe exist.

I maintain that the potential slowdown is too great to consider adopting that
for the sake of a cleaner patch.  Your last message examined a 67% performance
regression.  The strategy you're outlining now can slow a query by 1,000,000%.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-20 Thread Tom Lane
I wrote:
 The main problem with this patch, as I see it, is that it'll introduce
 extra syscache lookup overhead even when there are no toasted fields
 anywhere.  I've not really tried to quantify how much, since that would
 require first agreeing on a benchmark case --- anybody have a thought
 about what that ought to be?  But in at least some of these functions,
 we'll be paying a cache lookup or two per array element, which doesn't
 sound promising.

I created a couple of test cases that I think are close to worst-case for
accumArrayResult() and array_set(), which are the two functions that seem
most worth worrying about.  The accumArrayResult test case is

create table manycomplex as
select row(random(),random())::complex as c
from generate_series(1,100);
vacuum analyze manycomplex;

then time:

select array_dims(array_agg(c)) from manycomplex;

On HEAD, this takes about 295-300 msec on my machine in a non-cassert
build.  With the patch I sent previously, the time goes to 495-500 msec.
Ouch.

The other test case is

create or replace function timearrayset(n int) returns void language plpgsql as
$$
declare a complex[];
begin
a := array[row(1,2), row(3,4), row(5,6)];
for i in 1..$1 loop
   a[2] := row(5,6)::complex;
end loop;
end;
$$;

select timearrayset(100);

This goes from circa 1250 ms in HEAD to around 1680 ms.

Some poking around with oprofile says that much of the added time is
coming from added syscache and typcache lookups, as I suspected.

I did some further work on the patch to make it possible to cache
the element tuple descriptor across calls for these two functions.
With the attached patch, the first test case comes down to about 335 ms
and the second to about 1475 ms (plpgsql is still leaving some extra
cache lookups on the table).  More could be done with some further API
changes, but I think this is about as much as is safe to back-patch.

At least in the accumArrayResult test case, it would be possible to
bring the runtime back to very close to HEAD if we were willing to
abandon the principle that compressed fields within a tuple should
always get decompressed when the tuple goes into a larger structure.
That would allow flatten_composite_element to quit early if the
tuple doesn't have the HEAP_HASEXTERNAL infomask bit set.  I'm not
sure that this would be a good tradeoff however: if we fail to remove nested
compression, we could end up paying for that with double compression.
On the other hand, it's unclear that that case would come up often,
while the overhead of disassembling the tuple is definitely going to
happen every time we assign to an array-of-composites element.  (Also,
there is no question here of regression relative to previous releases,
since the existing code fails to prevent nested compression.)

Comments?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]-attlen == -1 
!  att[i]-attalign == 'd' 
!  att[i]-attndims == 0 
!  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]-atttypid,
! 	  att[i]-atttypmod);
! 		}
  	}
  
  	/*
--- 649,661 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
*** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-20 Thread Noah Misch
I lack time to give this a solid review, but here's a preliminary reaction:

On Sun, Apr 20, 2014 at 03:38:23PM -0400, Tom Lane wrote:
 On HEAD, this takes about 295-300 msec on my machine in a non-cassert
 build.  With the patch I sent previously, the time goes to 495-500 msec.

 This goes from circa 1250 ms in HEAD to around 1680 ms.
 
 Some poking around with oprofile says that much of the added time is
 coming from added syscache and typcache lookups, as I suspected.
 
 I did some further work on the patch to make it possible to cache
 the element tuple descriptor across calls for these two functions.
 With the attached patch, the first test case comes down to about 335 ms
 and the second to about 1475 ms (plpgsql is still leaving some extra
 cache lookups on the table).  More could be done with some further API
 changes, but I think this is about as much as is safe to back-patch.

That particular performance drop seems acceptable.  As I hinted upthread, the
large performance risk arises for test cases that do not visit a toast table
today but will do so after the change.

 At least in the accumArrayResult test case, it would be possible to
 bring the runtime back to very close to HEAD if we were willing to
 abandon the principle that compressed fields within a tuple should
 always get decompressed when the tuple goes into a larger structure.
 That would allow flatten_composite_element to quit early if the
 tuple doesn't have the HEAP_HASEXTERNAL infomask bit set.  I'm not
 sure that this would be a good tradeoff however: if we fail to remove nested
 compression, we could end up paying for that with double compression.
 On the other hand, it's unclear that that case would come up often,
 while the overhead of disassembling the tuple is definitely going to
 happen every time we assign to an array-of-composites element.  (Also,
 there is no question here of regression relative to previous releases,
 since the existing code fails to prevent nested compression.)

I wonder how it would work out to instead delay this new detoast effort until
toast_insert_or_update().  That timing has the major advantage of not adding
any toast table visits beyond those actually needed to avoid improperly
writing an embedded toast pointer to disk.  It would give us the flexibility
to detoast array elements more lazily than we do today, though I don't propose
any immediate change there.  To reduce syscache traffic, make the TupleDesc
record whether any column requires recursive detoast work.  Perhaps also use
the TupleDesc as a place to cache the recursive-detoast syscache lookups for
tables that do need them.  Thoughts?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Interesting bug.
 On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
 I think we might be better off to get rid of toast_flatten_tuple_attribute
 and instead insist that composite Datums never contain any external toast
 pointers in the first place.

 Performance is the dominant issue here; the hacker-centric considerations you
 outlined vanish next to it.

I'm not exactly convinced by that line of argument, especially when it's
entirely unsupported by any evidence as to which implementation approach
will actually perform worse.

However, I have done some investigation into what it'd take to keep the
current approach and teach the array code to detoast composite-type array
elements properly.  The attached draft patch only fixes arrays; ranges
need work too, and I'm not sure what else might need adjustment.  But this
patch does fix the test case provided by Jan Pecek.

The main problem with this patch, as I see it, is that it'll introduce
extra syscache lookup overhead even when there are no toasted fields
anywhere.  I've not really tried to quantify how much, since that would
require first agreeing on a benchmark case --- anybody have a thought
about what that ought to be?  But in at least some of these functions,
we'll be paying a cache lookup or two per array element, which doesn't
sound promising.

This could be alleviated by caching the lookup results over longer
intervals, but I don't see any way to do that without invasive changes
to the APIs of a number of exported array functions, which doesn't seem
like a good thing for a bug fix that we need to back-patch.  I think the
back-branch fix would have to be pretty much what you see here, even if
we then make changes to reduce the costs in HEAD.

Comments?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]-attlen == -1 
!  att[i]-attalign == 'd' 
!  att[i]-attndims == 0 
!  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]-atttypid,
! 	  att[i]-atttypmod);
! 		}
  	}
  
  	/*
--- 649,661 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
*** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]-attlen == -1 
!  att[i]-attalign == 'd' 
!  att[i]-attndims == 0 
!  !VARATT_IS_EXTENDED(values[i]))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]-atttypid,
! 	  att[i]-atttypmod);
! 		}
  	}
  
  	/*
--- 1390,1402 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..2d1a922 100644
*** a/src/backend/access/heap/tuptoaster.c
--- 

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-03-29 Thread Noah Misch
Interesting bug.

On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
 I think we might be better off to get rid of toast_flatten_tuple_attribute
 and instead insist that composite Datums never contain any external toast
 pointers in the first place.  That is, places that call heap_form_tuple
 to make a composite Datum (rather than a tuple that's due to be stored
 to disk) would be responsible for detoasting any fields with external
 values first.  We could make a wrapper routine for heap_form_tuple to
 take care of this rather than duplicating the code in lots of places.
 
 From a performance standpoint this is probably a small win.  In cases
 where a composite Datum is formed and ultimately saved to disk, it should
 be a win, since we'd have to detoast those fields anyway, and we can avoid
 the overhead of an extra disassembly and reassembly of the composite
 value.  If the composite value is never sent to disk, it's a bit harder
 to tell: we lose if the specific field value is never extracted from the
 composite, but on the other hand we win if it's extracted more than once.

Performance is the dominant issue here; the hacker-centric considerations you
outlined vanish next to it.  Adding a speculative detoast can regress by a
million-fold the performance of a query that passes around, without ever
dereferencing, toast pointers to max-size values.  Passing around a record
without consulting all fields is a credible thing to do, so I'd scarcely
consider taking the performance risk of more-aggressively detoasting every
composite.  That other cases win is little comfort.  Today's PostgreSQL
applications either suffer little enough to care or have already taken
measures to avoid duplicate detoasts.  Past discussions have examined general
ways to avoid repetitive detoast traffic; we'll have something good when it
can do that without imposing open-ended penalties on another usage pattern.

Making the array construction functions use toast_flatten_tuple_attribute()
carries a related performance risk, more elusive yet just as costly when
encountered.  That much risk seems tolerable for 9.4, though.  I won't worry
about performance regressions for range types; using a range to marshal huge
values you'll never detoast is a stretch.

 In any case, adding the extra syscache lookups involved in doing
 toast_flatten_tuple_attribute in lots more places isn't appealing.

True; alas.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-03-28 Thread Tom Lane
Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598,
we established a coding rule that it was okay for composite Datums
to contain external (out-of-line) toasted fields, as long as such
toasting didn't go more than one level deep in any tuple.  This meant
that heap_form_tuple had to go through nontrivial pushups to maintain
that invariant: each composite field has to be inspected to see if any
of its component fields are external datums.  Surprisingly, no one has
complained about the cost of the lookups that are required to see
whether fields are composite in the first place.

However, in view of today's bug report from Jan Pecek, I'm wondering
if we shouldn't rethink this.  Jan pointed out that the array code was
failing to prevent composites-with-external-fields from getting into
arrays, and after a bit of looking around I'm afraid there are more such
bugs elsewhere.  One example is in the planner's evaluate_expr(), which
supposes that just PG_DETOAST_DATUM() is enough to make a value safe for
long-term storage in a plan tree.  Range types are making the same sort
of assumption as arrays (hm, can you have a range over a composite type?
Probably, considering we have sort operators for composites).  And there
are places in the index AMs that seem to assume likewise, which is an
issue for AMs in which an indexed value could be composite.

I think we might be better off to get rid of toast_flatten_tuple_attribute
and instead insist that composite Datums never contain any external toast
pointers in the first place.  That is, places that call heap_form_tuple
to make a composite Datum (rather than a tuple that's due to be stored
to disk) would be responsible for detoasting any fields with external
values first.  We could make a wrapper routine for heap_form_tuple to
take care of this rather than duplicating the code in lots of places.

From a performance standpoint this is probably a small win.  In cases
where a composite Datum is formed and ultimately saved to disk, it should
be a win, since we'd have to detoast those fields anyway, and we can avoid
the overhead of an extra disassembly and reassembly of the composite
value.  If the composite value is never sent to disk, it's a bit harder
to tell: we lose if the specific field value is never extracted from the
composite, but on the other hand we win if it's extracted more than once.
In any case, adding the extra syscache lookups involved in doing
toast_flatten_tuple_attribute in lots more places isn't appealing.

From a code correctness standpoint, the question really is whether we can
find all the places that build composite datums more easily than we can
find all the places that ought to be calling toast_flatten_tuple_attribute
and aren't.  I have to admit I'm not sure about that.  There seem to be
basically two places to fix in the main executor (ExecEvalRow and
ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in
the various PLs, but I'm not sure I've not missed some cases.

One thing we could do to try to flush out missed cases is to remove
heap_form_tuple's setting of the tuple-Datum header fields, pushing
that functionality into the new wrapper routine.  Then, any un-updated
code would generate clearly invalid composite datums, rather than only
failing in infrequent corner cases.

Another issue is what about third-party code.  There seems to be risk
either way, but it would accrue to completely different code depending
on which way we try to fix this.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-03-28 Thread Merlin Moncure
On Fri, Mar 28, 2014 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598,
 we established a coding rule that it was okay for composite Datums
 to contain external (out-of-line) toasted fields, as long as such
 toasting didn't go more than one level deep in any tuple.  This meant
 that heap_form_tuple had to go through nontrivial pushups to maintain
 that invariant: each composite field has to be inspected to see if any
 of its component fields are external datums.  Surprisingly, no one has
 complained about the cost of the lookups that are required to see
 whether fields are composite in the first place.

 However, in view of today's bug report from Jan Pecek, I'm wondering
 if we shouldn't rethink this.  Jan pointed out that the array code was
 failing to prevent composites-with-external-fields from getting into
 arrays, and after a bit of looking around I'm afraid there are more such
 bugs elsewhere.  One example is in the planner's evaluate_expr(), which
 supposes that just PG_DETOAST_DATUM() is enough to make a value safe for
 long-term storage in a plan tree.  Range types are making the same sort
 of assumption as arrays (hm, can you have a range over a composite type?
 Probably, considering we have sort operators for composites).  And there
 are places in the index AMs that seem to assume likewise, which is an
 issue for AMs in which an indexed value could be composite.

 I think we might be better off to get rid of toast_flatten_tuple_attribute
 and instead insist that composite Datums never contain any external toast
 pointers in the first place.  That is, places that call heap_form_tuple
 to make a composite Datum (rather than a tuple that's due to be stored
 to disk) would be responsible for detoasting any fields with external
 values first.  We could make a wrapper routine for heap_form_tuple to
 take care of this rather than duplicating the code in lots of places.

 From a performance standpoint this is probably a small win.  In cases
 where a composite Datum is formed and ultimately saved to disk, it should
 be a win, since we'd have to detoast those fields anyway, and we can avoid
 the overhead of an extra disassembly and reassembly of the composite
 value.  If the composite value is never sent to disk, it's a bit harder
 to tell: we lose if the specific field value is never extracted from the
 composite, but on the other hand we win if it's extracted more than once.
 In any case, adding the extra syscache lookups involved in doing
 toast_flatten_tuple_attribute in lots more places isn't appealing.

 From a code correctness standpoint, the question really is whether we can
 find all the places that build composite datums more easily than we can
 find all the places that ought to be calling toast_flatten_tuple_attribute
 and aren't.  I have to admit I'm not sure about that.  There seem to be
 basically two places to fix in the main executor (ExecEvalRow and
 ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in
 the various PLs, but I'm not sure I've not missed some cases.

 One thing we could do to try to flush out missed cases is to remove
 heap_form_tuple's setting of the tuple-Datum header fields, pushing
 that functionality into the new wrapper routine.  Then, any un-updated
 code would generate clearly invalid composite datums, rather than only
 failing in infrequent corner cases.

 Another issue is what about third-party code.  There seems to be risk
 either way, but it would accrue to completely different code depending
 on which way we try to fix this.

 Thoughts?

Trying to follow along here.  Am I correct in saying that if you make
these changes any SQL level functionality (say, creating a composite
type containing a large array) that used to work will continue to
work?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-03-28 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 Trying to follow along here.  Am I correct in saying that if you make
 these changes any SQL level functionality (say, creating a composite
 type containing a large array) that used to work will continue to
 work?

This wouldn't change any SQL-level functionality ... as long as we don't
introduce new bugs :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers