[HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
While looking at the recently-noticed problem that HashAggregate nodes
store more columns of the input than they need to, I couldn't help
noticing how much of the hashtable space goes into HeapTuple header
overhead.  A couple months ago we were able to get a useful improvement
in sorting by not storing unnecessary header fields in sort files, and
I'm strongly tempted to do the same in tuple hash tables.

Unlike the case with sort temp files, it's important to be able to
access the stored data without moving/copying it.  So, not wishing to
duplicate all the tuple access machinery we have already, I'm
envisioning a compromise design that leaves a couple bytes on the table
but looks enough like a standard tuple to be directly usable.
Specifically, something like this:

typedef struct TruncatedTupleData
{
uint32t_len;/* length of tuple */

char  pad[...]; /* see below */

int16t_natts;   /* number of attributes */
... the rest matching HeapTupleHeaderData ...
}

The padding would be chosen such that the offset of t_natts would have
the same value modulo MAXIMUM_ALIGNOF as it has in HeapTupleHeaderData.
This ensures that if a TruncatedTuple is stored starting on a MAXALIGN
boundary, data within it is correctly aligned the same as it would be
in a normal tuple.  With the current struct definitions, 2 bytes of
padding would be needed on all supported platforms, and a
TruncatedTuple would be 16 bytes shorter than a regular tuple.
However, because we are also eliminating a HeapTupleData struct, the
total savings in tuple hash tables would be 36 to 40 bytes per tuple.

To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
struct with its t_data field pointing 16 bytes before the start of the
TruncatedTuple.  As long as the code using it never tries to access any
of the missing fields (t_xmin through t_ctid), this would work exactly
like a normal HeapTuple.

Going forward, we'd have to be careful to preserve the existing
field-ordering separation between transaction status fields and data
content fields in tuple headers, but that's just a matter of adding some
comments to htup.h.

It's tempting to think about using this same representation for tuples
stored in memory by tuplesort.c and tuplestore.c.  That'd probably
require some changes in their APIs, but I think it's doable.

Comments?  Anyone think this is too ugly for words?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Martijn van Oosterhout
On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
 While looking at the recently-noticed problem that HashAggregate nodes
 store more columns of the input than they need to, I couldn't help
 noticing how much of the hashtable space goes into HeapTuple header
 overhead.  A couple months ago we were able to get a useful improvement
 in sorting by not storing unnecessary header fields in sort files, and
 I'm strongly tempted to do the same in tuple hash tables.
 
 Unlike the case with sort temp files, it's important to be able to
 access the stored data without moving/copying it.  So, not wishing to
 duplicate all the tuple access machinery we have already, I'm
 envisioning a compromise design that leaves a couple bytes on the table
 but looks enough like a standard tuple to be directly usable.

I considered this, but ran into the problem that heap_getattr fell back
to fastgetattr, which wouldn't know what kind of tuple it was given.
Now, if you're going to add a special heap_getattr for these tuples,
then ofcourse there's no problem.

Maybe create a version of heap_getattr that takes the fallback function
as a parameter?

Anyway, I think it's a good idea. Most places in the backend after the
SeqScan/IndexScan node really don't care about most of the header
fields and being able to drop them would be nice.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
 Unlike the case with sort temp files, it's important to be able to
 access the stored data without moving/copying it.  So, not wishing to
 duplicate all the tuple access machinery we have already, I'm
 envisioning a compromise design that leaves a couple bytes on the table
 but looks enough like a standard tuple to be directly usable.

 I considered this, but ran into the problem that heap_getattr fell back
 to fastgetattr, which wouldn't know what kind of tuple it was given.
 Now, if you're going to add a special heap_getattr for these tuples,
 then ofcourse there's no problem.

No, I'm specifically *not* going to do that.  AFAICS the proposed
representation requires no changes in heap_getattr; if it did, it'd
be too invasive to contemplate :-(.  It should look exactly like any
other HeapTuple structure, except that the transaction status fields
will contain garbage.  Do you see something I missed?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Bort, Paul
Tom Lane said:
 
 To make use of a TruncatedTuple, we'd set up a temporary HeapTupleData
 struct with its t_data field pointing 16 bytes before the start of the
 TruncatedTuple.  As long as the code using it never tries to 
 access any
 of the missing fields (t_xmin through t_ctid), this would work exactly
 like a normal HeapTuple.
 

This sounds like a security risk. What's the worst thing that could be
in
those 16 bytes, and could that be used to bite (sorry) us? 

If those 16 bytes could be user data in another tuple, there might be an
attack there.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
Bort, Paul [EMAIL PROTECTED] writes:
 Tom Lane said:
 As long as the code using it never tries to access any
 of the missing fields (t_xmin through t_ctid), this would work exactly
 like a normal HeapTuple.

 This sounds like a security risk.

No more than any other wild-pointer problem.  At the level of C code
it's always trivial to break anything ;-).  The reason we don't need
to worry is that in the upper levels of the executor there is no such
thing as access to system columns --- any attempt to fetch a system
column is converted to a Var reference that appears in the initial
table-scan node, and thereafter it's an ordinary user column.  This
must be so because trying to keep the system columns in their normal
privileged positions breaks down as soon as you consider a join; there
would only be room for one, and the query might well be trying to fetch
(say) ctid from more than one table.  So any code that was trying to
fetch these fields would be buggy anyway.

In the case of the TupleHashTable code, the only access that need be
provided is through a TupleTableSlot.  We could get a little bit of
protection against programming mistakes by using the virtual tuple
feature of slots to disallow attempts to fetch any system columns from
a truncated tuple.  I'm not sure if this would be feasible for tuplesort
though; haven't looked at how it's used yet.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Simon Riggs
On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
 On Mon, Jun 26, 2006 at 10:36:00AM -0400, Tom Lane wrote:
  While looking at the recently-noticed problem that HashAggregate nodes
  store more columns of the input than they need to, I couldn't help
  noticing how much of the hashtable space goes into HeapTuple header
  overhead.  A couple months ago we were able to get a useful improvement
  in sorting by not storing unnecessary header fields in sort files, and
  I'm strongly tempted to do the same in tuple hash tables.

 Anyway, I think it's a good idea. Most places in the backend after the
 SeqScan/IndexScan node really don't care about most of the header
 fields and being able to drop them would be nice.

I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
is it possible to extend this further across the executor as Martijn
suggests? That would be useful, even if it is slightly harder to measure
the benefit than it is with the can-spill-to-disk cases.

IMHO it would be better to call the format TupleWithoutVisibilityData so
its very clear that accessing the visibility fields aren't present and
any attempt to access them would be a mistake. TruncatedTuple isn't
clear about what's missing or its consequences.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
 Anyway, I think it's a good idea. Most places in the backend after the
 SeqScan/IndexScan node really don't care about most of the header
 fields and being able to drop them would be nice.

 I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
 is it possible to extend this further across the executor as Martijn
 suggests? That would be useful, even if it is slightly harder to measure
 the benefit than it is with the can-spill-to-disk cases.

There isn't any benefit except where we collect lots of tuples, which is
to say tuplesort/tuplestore/tuplehashtable.  In other places in the
executor, there's basically only one transient tuple in existence per
plan node; jumping through hoops to save 16 bytes per plan node is just
silly.  (What's more, as of 8.1 most of those tuples will be in virtual
tuple format anyway, and so the optimization wouldn't make any
difference at all...)

 IMHO it would be better to call the format TupleWithoutVisibilityData so
 its very clear that accessing the visibility fields aren't present and
 any attempt to access them would be a mistake. TruncatedTuple isn't
 clear about what's missing or its consequences.

I'm not wedded to TruncatedTuple, but I don't like your suggestion
better; it presumes too much about what the difference might be between
truncated and full tuples.  Even today, I don't think
TupleWithoutVisibilityData makes it clear that t_ctid is missing;
and down the road there might be other header fields that we don't need
to have in in-memory tuples.  Another small problem is that given our
naming conventions for structs vs pointers to structs, using Data as
the last word of a struct name is a bad idea --- for consistency,
pointers to it would be typedef TupleWithoutVisibility, which seems a
bit weird.  For consistency with the existing struct names, I think we
need to choose a name of the form adjectiveTuple.

I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
that seems too generic.  Any other thoughts?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Simon Riggs
On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2006-06-26 at 16:48 +0200, Martijn van Oosterhout wrote:
  Anyway, I think it's a good idea. Most places in the backend after the
  SeqScan/IndexScan node really don't care about most of the header
  fields and being able to drop them would be nice.
 
  I understood Tom meant to do this only for HashAgg and Tuplestore. Tom,
  is it possible to extend this further across the executor as Martijn
  suggests? That would be useful, even if it is slightly harder to measure
  the benefit than it is with the can-spill-to-disk cases.
 
 There isn't any benefit 

OK, see that... 

 I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
 that seems too generic.  Any other thoughts?

I like MemoryTuple but since we only use it when we go to disk...

ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple

Pick one: I'm sorry I opined.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-06-26 at 14:36 -0400, Tom Lane wrote:
 I thought for awhile about MemoryTuple (as contrasted to HeapTuple) but
 that seems too generic.  Any other thoughts?

 I like MemoryTuple but since we only use it when we go to disk...

 ExecutorTuple, MinimalTuple, DataOnlyTuple, MultTuple, TempFileTuple

MinimalTuple seems good to me ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
I wrote:
 There isn't any benefit except where we collect lots of tuples, which is
 to say tuplesort/tuplestore/tuplehashtable.  In other places in the
 executor, there's basically only one transient tuple in existence per
 plan node; jumping through hoops to save 16 bytes per plan node is just
 silly.  (What's more, as of 8.1 most of those tuples will be in virtual
 tuple format anyway, and so the optimization wouldn't make any
 difference at all...)

After further study of the code, here's my hit-list of places that could
make worthwhile use of MinimalTuples:

tuplesort.c (in-memory, on-disk case done already)
tuplestore.c (in-memory and on-disk)
TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan)
hash joins (in-memory hash table and tuple batch files)
analyze.c (tuples collected in-memory for stats analysis)

It looks like there is actually not anyplace else in the executor where
we materialize tuples anymore, except for execMain.c's INSERT/UPDATE
code, which of course is going to want full tuples it can stash on disk.
Everything else is dealing in TupleTableSlots that probably contain
virtual tuples.

So in one sense this *is* all across the executor.  But the amount of
code to touch seems pretty limited.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Simon Riggs
On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
 I wrote:
  There isn't any benefit except where we collect lots of tuples, which is
  to say tuplesort/tuplestore/tuplehashtable.  In other places in the
  executor, there's basically only one transient tuple in existence per
  plan node; jumping through hoops to save 16 bytes per plan node is just
  silly.  (What's more, as of 8.1 most of those tuples will be in virtual
  tuple format anyway, and so the optimization wouldn't make any
  difference at all...)
 
 After further study of the code, here's my hit-list of places that could
 make worthwhile use of MinimalTuples:
 
   tuplesort.c (in-memory, on-disk case done already)
   tuplestore.c (in-memory and on-disk)
   TupleHashTable (execGrouping.c --- used by nodeAgg and nodeSubplan)
   hash joins (in-memory hash table and tuple batch files)

Thats the list I thought you meant.

   analyze.c (tuples collected in-memory for stats analysis)

Do we save enough there for us to care?

Will that allow us to increase the sample size for larger tables?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Truncated tuples for tuple hash tables

2006-06-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-06-26 at 16:43 -0400, Tom Lane wrote:
 analyze.c (tuples collected in-memory for stats analysis)

 Do we save enough there for us to care?

Possibly not --- it's certainly low-priority, but I listed it for
completeness.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq