Re: [HACKERS] Terrible performance on wide selects

2003-02-14 Thread Bruce Momjian

Added to TODO:

* Cache last known per-tuple offsets to speed long tuple access


---

Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
  keep count of how many are actually valid,
 
  Additionally, this should also make repeted determining of NULL fields
  faster - just put a NULL-pointer in and voila - no more bit-shifting and
  AND-ing to find out if the field is null.
 
 Right, the output of the operation would be a pair of arrays: Datum
 values and is-null flags.  (NULL pointers don't work for pass-by-value
 datatypes.)
 
 I like the idea of keeping track of a last-known-column position and
 incrementally extending that as needed.
 
 I think the way to manage this is to add the overhead data (the output
 arrays and last-column state) to TupleTableSlots.  Then we'd have
 a routine similar to heap_getattr except that it takes a TupleTableSlot
 and makes use of the extra state data.  The infrastructure to manage
 the state data is already in place: for example, ExecStoreTuple would
 reset the last-known-column to 0, ExecSetSlotDescriptor would be
 responsible for allocating the output arrays using the natts value from
 the provided tupdesc, etc.
 
 This wouldn't help for accesses that are not in the context of a slot,
 but certainly all the ones from ExecEvalVar are.  The executor always
 works with tuples stored in slots, so I think we could fix all the
 high-traffic cases this way.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Hannu Krosing
Tom Lane kirjutas N, 23.01.2003 kell 02:18:
 Dann Corbit [EMAIL PROTECTED] writes:
  Why not waste a bit of memory and make the row buffer the maximum
  possible length?
  E.g. for varchar(2000) allocate 2000 characters + size element and point
  to the start of that thing.
 
 Surely you're not proposing that we store data on disk that way.
 
 The real issue here is avoiding overhead while extracting columns out of
 a stored tuple.  We could perhaps use a different, less space-efficient
 format for temporary tuples in memory than we do on disk, but I don't
 think that will help a lot.  The nature of O(N^2) bottlenecks is you
 have to kill them all --- for example, if we fix printtup and don't do
 anything with ExecEvalVar, we can't do more than double the speed of
 Steve's example, so it'll still be slow.  So we must have a solution for
 the case where we are disassembling a stored tuple, anyway.
 
 I have been sitting here toying with a related idea, which is to use the
 heap_deformtuple code I suggested before to form an array of pointers to
 Datums in a specific tuple (we could probably use the TupleTableSlot
 mechanisms to manage the memory for these).  Then subsequent accesses to
 individual columns would just need an array-index operation, not a
 nocachegetattr call.  The trick with that would be that if only a few
 columns are needed out of a row, it might be a net loss to compute the
 Datum values for all columns.  How could we avoid slowing that case down
 while making the wide-tuple case faster?

make the pointer array incrementally for O(N) performance:

i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
keep count of how many are actually valid,

so the first call to get col[5] will fill first 5 positions in the array
save said nr 5 and then access tuple[ptrarray[5]]

next call to get col[75] will start form col[5] and fill up to col[75]

next call to col[76] will start form col[75] and fill up to col[76]

next call to col[60] will just get tuple[ptrarray[60]]

the above description assumes 1-based non-C arrays ;)

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Hannu Krosing
Dann Corbit kirjutas N, 23.01.2003 kell 02:22:
 [snip]
  So (for instance) if you have 12 variable fields, you would 
  store 12 integers at the start of the record.
 
 Additionally, you could implicitly size the integers from the properties
 of the column.  A varchar(255) would only need an unsigned char to store
 the offset, but a varchar(8) would require an unsigned int.

I guess that the pointer could always be 16-bit, as the offset inside a
tuple will never be more (other issues constrain max page size to 32K)

varchar(8) will use TOAST (another file) anyway, but this will be
hidden inside the field storage in the page)

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Hannu Krosing
Tom Lane kirjutas N, 23.01.2003 kell 02:04:
 Dann Corbit [EMAIL PROTECTED] writes:
  Maybe I don't really understand the problem, but it seems simple enough
  to do it once for the whole query.
 
 We already do cache column offsets when they are fixed.  The code that's
 the problem executes when there's a variable-width column in the table
 --- which means that all columns to its right are not at fixed offsets,
 and have to be scanned for separately in each tuple, AFAICS.

Not only varlen columns, but also NULL columns forbid knowing the
offsets beforehand.

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Hannu Krosing
Hannu Krosing kirjutas N, 23.01.2003 kell 12:11:

 make the pointer array incrementally for O(N) performance:
 
 i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
 keep count of how many are actually valid,

Additionally, this should also make repeted determining of NULL fields
faster - just put a NULL-pointer in and voila - no more bit-shifting and
AND-ing to find out if the field is null.

One has to watch the NULL bitmap on fist pass anyway.

 so the first call to get col[5] will fill first 5 positions in the array
 save said nr 5 and then access tuple[ptrarray[5]]
 
 next call to get col[75] will start form col[5] and fill up to col[75]
 
 next call to col[76] will start form col[75] and fill up to col[76]
 
 next call to col[60] will just get tuple[ptrarray[60]]
 
 the above description assumes 1-based non-C arrays ;)
-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Daniel Kalchev
Hannu Krosing said:
  Tom Lane kirjutas N, 23.01.2003 kell 02:04:
   We already do cache column offsets when they are fixed.  The code that's
   the problem executes when there's a variable-width column in the table
   --- which means that all columns to its right are not at fixed offsets,
   and have to be scanned for separately in each tuple, AFAICS.
  
  Not only varlen columns, but also NULL columns forbid knowing the
  offsets beforehand.

Does this mean, that constructing tables where fixed length fields are 
'before' variable lenght fields and 'possibly null' fields might increase 
performance?

Daniel


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
 keep count of how many are actually valid,

 Additionally, this should also make repeted determining of NULL fields
 faster - just put a NULL-pointer in and voila - no more bit-shifting and
 AND-ing to find out if the field is null.

Right, the output of the operation would be a pair of arrays: Datum
values and is-null flags.  (NULL pointers don't work for pass-by-value
datatypes.)

I like the idea of keeping track of a last-known-column position and
incrementally extending that as needed.

I think the way to manage this is to add the overhead data (the output
arrays and last-column state) to TupleTableSlots.  Then we'd have
a routine similar to heap_getattr except that it takes a TupleTableSlot
and makes use of the extra state data.  The infrastructure to manage
the state data is already in place: for example, ExecStoreTuple would
reset the last-known-column to 0, ExecSetSlotDescriptor would be
responsible for allocating the output arrays using the natts value from
the provided tupdesc, etc.

This wouldn't help for accesses that are not in the context of a slot,
but certainly all the ones from ExecEvalVar are.  The executor always
works with tuples stored in slots, so I think we could fix all the
high-traffic cases this way.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Terrible performance on wide selects

2003-01-23 Thread Tom Lane
Daniel Kalchev [EMAIL PROTECTED] writes:
 Does this mean, that constructing tables where fixed length fields are 
 'before' variable lenght fields and 'possibly null' fields might increase 
 performance?

There'd have to be no nulls, period, to get any useful performance
difference --- but yes, in theory putting fixed-length columns before
variable-length ones is a win.

I wouldn't bother going out to rearrange your schemas though ... at
least not before you do some tests to prove that it's worthwhile.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Tom Lane
Steve Crawford sent me some profiling results for queries involving wide
tuples (hundreds of columns).

 Done, results attached. nocachegetattr seems to be the likely suspect.

Yipes, you can say that again.

  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 93.38 26.8126.81   885688 0.03 0.03  nocachegetattr

0.000.00   1/885688  heapgettup [159]
0.000.00   1/885688  CatalogCacheComputeTupleHashValue 
[248]
0.000.00   5/885688  SearchCatCache [22]
   13.400.00  442840/885688  ExecEvalVar [20]
   13.400.00  442841/885688  printtup [12]
[11]93.4   26.810.00  885688 nocachegetattr [11]


Half of the calls are coming from printtup(), which seems relatively easy
to fix.

/*
 * send the attributes of this tuple
 */
for (i = 0; i  natts; ++i)
{
...
origattr = heap_getattr(tuple, i + 1, typeinfo, isnull);
...
}

The trouble here is that in the presence of variable-width fields,
heap_getattr requires a linear scan over the tuple --- and so the total
time spent in it is O(N^2) in the number of fields.

What we could do is reinstitute heap_deformtuple() as the inverse of
heap_formtuple() --- but make it extract Datums for all the columns in
a single pass over the tuple.  This would reduce the time in printtup()
from O(N^2) to O(N), which would pretty much wipe out that part of the
problem.

The other half of the calls are coming from ExecEvalVar, which is a
harder problem to solve, since those calls are scattered all over the
place.  It's harder to see how to get them to share work.  Any ideas
out there?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Dann Corbit
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]] 
 Sent: Wednesday, January 22, 2003 3:15 PM
 To: Steve Crawford
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Terrible performance on wide selects 
 
 
 Steve Crawford sent me some profiling results for queries 
 involving wide tuples (hundreds of columns).
 
  Done, results attached. nocachegetattr seems to be the 
 likely suspect.
 
 Yipes, you can say that again.
 
   %   cumulative   self  self total   
  time   seconds   secondscalls  ms/call  ms/call  name
  93.38 26.8126.81   885688 0.03 0.03  nocachegetattr
 
 0.000.00   1/885688  heapgettup [159]
 0.000.00   1/885688  
 CatalogCacheComputeTupleHashValue [248]
 0.000.00   5/885688  SearchCatCache [22]
13.400.00  442840/885688  ExecEvalVar [20]
13.400.00  442841/885688  printtup [12]
 [11]93.4   26.810.00  885688 nocachegetattr [11]
 
 
 Half of the calls are coming from printtup(), which seems 
 relatively easy to fix.
 
   /*
* send the attributes of this tuple
*/
   for (i = 0; i  natts; ++i)
   {
   ...
   origattr = heap_getattr(tuple, i + 1, typeinfo, 
 isnull);
   ...
   }
 
 The trouble here is that in the presence of variable-width 
 fields, heap_getattr requires a linear scan over the tuple 
 --- and so the total time spent in it is O(N^2) in the number 
 of fields.
 
 What we could do is reinstitute heap_deformtuple() as the inverse of
 heap_formtuple() --- but make it extract Datums for all the 
 columns in a single pass over the tuple.  This would reduce 
 the time in printtup() from O(N^2) to O(N), which would 
 pretty much wipe out that part of the problem.
 
 The other half of the calls are coming from ExecEvalVar, 
 which is a harder problem to solve, since those calls are 
 scattered all over the place.  It's harder to see how to get 
 them to share work.  Any ideas out there?

Is it possible that the needed information could be retrieved by
querying the system metadata to collect the column information?

Once the required tuple attributes are described, it could form a
binding list that allocates a buffer of sufficient size with pointers to
the required column start points.

Maybe I don't really understand the problem, but it seems simple enough
to do it once for the whole query.

If this is utter stupidity, please disregard and have a hearty laugh at
my expense.
;-)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Tom Lane
Dann Corbit [EMAIL PROTECTED] writes:
 Maybe I don't really understand the problem, but it seems simple enough
 to do it once for the whole query.

We already do cache column offsets when they are fixed.  The code that's
the problem executes when there's a variable-width column in the table
--- which means that all columns to its right are not at fixed offsets,
and have to be scanned for separately in each tuple, AFAICS.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Dann Corbit
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]] 
 Sent: Wednesday, January 22, 2003 4:04 PM
 To: Dann Corbit
 Cc: Steve Crawford; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Terrible performance on wide selects 
 
 
 Dann Corbit [EMAIL PROTECTED] writes:
  Maybe I don't really understand the problem, but it seems simple 
  enough to do it once for the whole query.
 
 We already do cache column offsets when they are fixed.  The 
 code that's the problem executes when there's a 
 variable-width column in the table
 --- which means that all columns to its right are not at 
 fixed offsets, and have to be scanned for separately in each 
 tuple, AFAICS.

Why not waste a bit of memory and make the row buffer the maximum
possible length?
E.g. for varchar(2000) allocate 2000 characters + size element and point
to the start of that thing.

If we have 64K rows, even at that it is a pittance.  If someone designs
10,000 row tables, then it will allocate an annoyingly large block of
memory, but bad designs are always going to cause a fuss.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Tom Lane
Dann Corbit [EMAIL PROTECTED] writes:
 Why not waste a bit of memory and make the row buffer the maximum
 possible length?
 E.g. for varchar(2000) allocate 2000 characters + size element and point
 to the start of that thing.

Surely you're not proposing that we store data on disk that way.

The real issue here is avoiding overhead while extracting columns out of
a stored tuple.  We could perhaps use a different, less space-efficient
format for temporary tuples in memory than we do on disk, but I don't
think that will help a lot.  The nature of O(N^2) bottlenecks is you
have to kill them all --- for example, if we fix printtup and don't do
anything with ExecEvalVar, we can't do more than double the speed of
Steve's example, so it'll still be slow.  So we must have a solution for
the case where we are disassembling a stored tuple, anyway.

I have been sitting here toying with a related idea, which is to use the
heap_deformtuple code I suggested before to form an array of pointers to
Datums in a specific tuple (we could probably use the TupleTableSlot
mechanisms to manage the memory for these).  Then subsequent accesses to
individual columns would just need an array-index operation, not a
nocachegetattr call.  The trick with that would be that if only a few
columns are needed out of a row, it might be a net loss to compute the
Datum values for all columns.  How could we avoid slowing that case down
while making the wide-tuple case faster?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Dann Corbit
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]] 
 Sent: Wednesday, January 22, 2003 4:18 PM
 To: Dann Corbit
 Cc: Steve Crawford; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Terrible performance on wide selects 
 
 
 Dann Corbit [EMAIL PROTECTED] writes:
  Why not waste a bit of memory and make the row buffer the maximum 
  possible length? E.g. for varchar(2000) allocate 2000 characters + 
  size element and point to the start of that thing.
 
 Surely you're not proposing that we store data on disk that way.
 
 The real issue here is avoiding overhead while extracting 
 columns out of a stored tuple.  We could perhaps use a 
 different, less space-efficient format for temporary tuples 
 in memory than we do on disk, but I don't think that will 
 help a lot.  The nature of O(N^2) bottlenecks is you have to 
 kill them all --- for example, if we fix printtup and don't 
 do anything with ExecEvalVar, we can't do more than double 
 the speed of Steve's example, so it'll still be slow.  So we 
 must have a solution for the case where we are disassembling 
 a stored tuple, anyway.
 
 I have been sitting here toying with a related idea, which is 
 to use the heap_deformtuple code I suggested before to form 
 an array of pointers to Datums in a specific tuple (we could 
 probably use the TupleTableSlot mechanisms to manage the 
 memory for these).  Then subsequent accesses to individual 
 columns would just need an array-index operation, not a 
 nocachegetattr call.  The trick with that would be that if 
 only a few columns are needed out of a row, it might be a net 
 loss to compute the Datum values for all columns.  How could 
 we avoid slowing that case down while making the wide-tuple 
 case faster?

For the disk case, why not have the start of the record contain an array
of offsets to the start of the data for each column?  It would only be
necessary to have a list for variable fields.

So (for instance) if you have 12 variable fields, you would store 12
integers at the start of the record.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Terrible performance on wide selects

2003-01-22 Thread Dann Corbit
[snip]
 So (for instance) if you have 12 variable fields, you would 
 store 12 integers at the start of the record.

Additionally, you could implicitly size the integers from the properties
of the column.  A varchar(255) would only need an unsigned char to store
the offset, but a varchar(8) would require an unsigned int.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster