Re: [HACKERS] avoiding tuple copying in btree index builds

2014-07-01 Thread Robert Haas
On Tue, Jun 17, 2014 at 10:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 16, 2014 at 8:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On a micro-optimization level, it might be worth passing the TID as
 ItemPointer not ItemPointerData (ie, pass a pointer until we get to
 the point of actually inserting the TID into the index tuple).
 I'm not sure that copying odd-size structs should be assumed to be
 efficient.

 Yeah, true.  Checking existing precedent, it looks like we usually
 pass ItemPointer rather than ItemPointerData, so it's probably a good
 idea to do this that way too for reasons of style if nothing else.  I
 kind of wonder whether it's really more efficient to pass an 8-byte
 pointer to a 6-byte structure than to just pass the structure itself,
 but it might be.

 The pointer will certainly be passed in a register, or whatever passes for
 registers on the particular machine architecture.  Weird-size structs,
 though, tend to have arcane and not-so-efficient rules for being passed
 by value.  It's not unlikely that what the compiler will do under the hood
 is pass a pointer anyhow, and then do a memcpy to make a local copy in
 the called function.

OK, committed with the suggested changes.

(Thanks to Abhijit for pinging me about this, and more generally for
his active and effective management of this CommitFest!)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-17 Thread Robert Haas
On Mon, Jun 16, 2014 at 8:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On further review, this is definitely the way to go: it's a
 straight-up win.  The isnull array is never more than one element in
 length, so testing the single element is quite trivial.   The
 attached, revised patch provides a modest but useful speedup for both
 hash and btree index builds.

 Anyone see any reason NOT to do this?  So far it looks like a
 slam-dunk from here.

 If index_form_tuple leaks any memory in the sort context, this would be
 not so much a slam-dunk win as a complete disaster.  I'm not sure that
 no-leak is a safe assumption, especially in cases where we do toast
 compression of the index datums.  (In fact, I'm pretty sure that the
 reason it was coded like this originally was exactly to avoid that
 assumption.)

Yes, that would be bad.  The reason I think it's probably OK is that
index_form_tuple seems to have had, from ancient times (cf.
f67e79045), code of non-trivial to free any memory that's allocated as
a result of de-TOASTing, which we presumably would not bother to do if
we were counting on memory context resets to clean things up.  It
calls toast_compress_datum(), which is similarly careful.  The most
complicated code path appears to be the VARATT_IS_EXTERNAL() case,
where we call heap_tuple_fetch_attr(), which calls
toast_fetch_datum().  I'm kind of wondering whether that can really
happen, though, because I didn't think indexes could use TOAST
storage.  Even if it does happen I don't see any obvious reason why it
should leak, but it'd be harder to be certain that it doesn't.

 Assuming that you inspect the code and determine it's safe, the patch
 had better decorate index_form_tuple with dire warnings about not leaking
 memory; because even if it's safe today, somebody might break it tomorrow.

 On a micro-optimization level, it might be worth passing the TID as
 ItemPointer not ItemPointerData (ie, pass a pointer until we get to
 the point of actually inserting the TID into the index tuple).
 I'm not sure that copying odd-size structs should be assumed to be
 efficient.

Yeah, true.  Checking existing precedent, it looks like we usually
pass ItemPointer rather than ItemPointerData, so it's probably a good
idea to do this that way too for reasons of style if nothing else.  I
kind of wonder whether it's really more efficient to pass an 8-byte
pointer to a 6-byte structure than to just pass the structure itself,
but it might be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 16, 2014 at 8:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On a micro-optimization level, it might be worth passing the TID as
 ItemPointer not ItemPointerData (ie, pass a pointer until we get to
 the point of actually inserting the TID into the index tuple).
 I'm not sure that copying odd-size structs should be assumed to be
 efficient.

 Yeah, true.  Checking existing precedent, it looks like we usually
 pass ItemPointer rather than ItemPointerData, so it's probably a good
 idea to do this that way too for reasons of style if nothing else.  I
 kind of wonder whether it's really more efficient to pass an 8-byte
 pointer to a 6-byte structure than to just pass the structure itself,
 but it might be.

The pointer will certainly be passed in a register, or whatever passes for
registers on the particular machine architecture.  Weird-size structs,
though, tend to have arcane and not-so-efficient rules for being passed
by value.  It's not unlikely that what the compiler will do under the hood
is pass a pointer anyhow, and then do a memcpy to make a local copy in
the called function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-16 Thread Robert Haas
On Tue, Jun 3, 2014 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, May 6, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
  Today, I discovered that when building a btree index, the btree code
  uses index_form_tuple() to create an index tuple from the heap tuple,
  calls tuplesort_putindextuple() to copy that tuple into the sort's
  memory context, and then frees the original one it built.  This seemed
  inefficient, so I wrote a patch to eliminate the tuple copying.  It
  works by adding a function tuplesort_putindextuplevalues(), which
  builds the tuple in the sort's memory context and thus avoids the need
  for a separate copy.  I'm not sure if that's the best approach, but
  the optimization seems wortwhile.
 
  Hm. It looks like we could quite easily just get rid of
  tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I glanced at that, but it wasn't obvious to me how to convert the hash
 usage.  If you have an idea, I'm all ears.

 I also think it's possible to have similar optimization for hash index
 incase it has to spool the tuple for sorting.

 In function hashbuildCallback(), when buildstate-spool is true, we
 can avoid to form index tuple. To check for nulls before calling

 _h_spool(), we can traverse the isnull array.

 Hmm, that might work.  Arguably it's less efficient, but on the other
 hand if it avoids forming the tuple sometimes it might be MORE
 efficient.  And anyway the difference might not be enough to matter.

On further review, this is definitely the way to go: it's a
straight-up win.  The isnull array is never more than one element in
length, so testing the single element is quite trivial.   The
attached, revised patch provides a modest but useful speedup for both
hash and btree index builds.

Anyone see any reason NOT to do this?  So far it looks like a
slam-dunk from here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 7abb7a4..c30c6f9 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -142,26 +142,23 @@ hashbuildCallback(Relation index,
 	HashBuildState *buildstate = (HashBuildState *) state;
 	IndexTuple	itup;
 
-	/* form an index tuple and point it at the heap tuple */
-	itup = _hash_form_tuple(index, values, isnull);
-	itup-t_tid = htup-t_self;
-
 	/* Hash indexes don't index nulls, see notes in hashinsert */
-	if (IndexTupleHasNulls(itup))
-	{
-		pfree(itup);
+	if (isnull[0])
 		return;
-	}
 
 	/* Either spool the tuple for sorting, or just put it into the index */
 	if (buildstate-spool)
-		_h_spool(itup, buildstate-spool);
+		_h_spool(buildstate-spool, htup-t_self, values, isnull);
 	else
+	{
+		/* form an index tuple and point it at the heap tuple */
+		itup = _hash_form_tuple(index, values, isnull);
+		itup-t_tid = htup-t_self;
 		_hash_doinsert(index, itup);
+		pfree(itup);
+	}
 
 	buildstate-indtuples += 1;
-
-	pfree(itup);
 }
 
 /*
@@ -184,10 +181,6 @@ hashinsert(PG_FUNCTION_ARGS)
 #endif
 	IndexTuple	itup;
 
-	/* generate an index tuple */
-	itup = _hash_form_tuple(rel, values, isnull);
-	itup-t_tid = *ht_ctid;
-
 	/*
 	 * If the single index key is null, we don't insert it into the index.
 	 * Hash tables support scans on '='. Relational algebra says that A = B
@@ -197,11 +190,12 @@ hashinsert(PG_FUNCTION_ARGS)
 	 * NOTNULL scans, but that's an artifact of the strategy map architecture
 	 * chosen in 1986, not of the way nulls are handled here.
 	 */
-	if (IndexTupleHasNulls(itup))
-	{
-		pfree(itup);
+	if (isnull[0])
 		PG_RETURN_BOOL(false);
-	}
+
+	/* generate an index tuple */
+	itup = _hash_form_tuple(rel, values, isnull);
+	itup-t_tid = *ht_ctid;
 
 	_hash_doinsert(rel, itup);
 
diff --git a/src/backend/access/hash/hashsort.c b/src/backend/access/hash/hashsort.c
index c0d6fec..3a70bc1 100644
--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -90,9 +90,10 @@ _h_spooldestroy(HSpool *hspool)
  * spool an index entry into the sort file.
  */
 void
-_h_spool(IndexTuple itup, HSpool *hspool)
+_h_spool(HSpool *hspool, ItemPointerData self, Datum *values, bool *isnull)
 {
-	tuplesort_putindextuple(hspool-sortstate, itup);
+	tuplesort_putindextuplevalues(hspool-sortstate, hspool-index,
+  self, values, isnull);
 }
 
 /*
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 36dc6c2..1ea4a2c 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -171,28 +171,21 @@ btbuildCallback(Relation index,
 void *state)
 {
 	BTBuildState *buildstate = (BTBuildState *) state;
-	IndexTuple	itup;
-
-	/* form an index tuple and point it at the heap tuple 

Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On further review, this is definitely the way to go: it's a
 straight-up win.  The isnull array is never more than one element in
 length, so testing the single element is quite trivial.   The
 attached, revised patch provides a modest but useful speedup for both
 hash and btree index builds.

 Anyone see any reason NOT to do this?  So far it looks like a
 slam-dunk from here.

If index_form_tuple leaks any memory in the sort context, this would be
not so much a slam-dunk win as a complete disaster.  I'm not sure that
no-leak is a safe assumption, especially in cases where we do toast
compression of the index datums.  (In fact, I'm pretty sure that the
reason it was coded like this originally was exactly to avoid that
assumption.)

Assuming that you inspect the code and determine it's safe, the patch
had better decorate index_form_tuple with dire warnings about not leaking
memory; because even if it's safe today, somebody might break it tomorrow.

On a micro-optimization level, it might be worth passing the TID as
ItemPointer not ItemPointerData (ie, pass a pointer until we get to
the point of actually inserting the TID into the index tuple).
I'm not sure that copying odd-size structs should be assumed to be
efficient.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-08 Thread Amit Kapila
On Wed, Jun 4, 2014 at 2:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila amit.kapil...@gmail.com
wrote:

  I also think it's possible to have similar optimization for hash index
  incase it has to spool the tuple for sorting.
 
  In function hashbuildCallback(), when buildstate-spool is true, we
  can avoid to form index tuple. To check for nulls before calling
 
  _h_spool(), we can traverse the isnull array.

 Hmm, that might work.  Arguably it's less efficient, but on the other
 hand if it avoids forming the tuple sometimes it might be MORE
 efficient.  And anyway the difference might not be enough to matter.

Considering the fact that for hash indexes, this optimization would
only get triggered for large indexes (atleast greater than shared buffer's),
I agree that it will not be as useful as it will be for btree indexes.  The
another minor advantage could have been that we can remove

tuplesort_putindextuple() from code, if this optimization is done for hash

indexes.  However we can leave it as it is for now as there doesn't seem

to be any gain by doing so.


You seem to have removed puttuple_common() call from function

tuplesort_putindextuple(), is there any reason for doing so?


Apart from that patch looks good to me.


Below performance data on various size of indexes shows that this

patch can improve performance upto ~7%.  The performance

difference becomes lesser when the index size is too big and I think

that is probably due to the reason that we have to write all the data

at end of operation, so if the data is big the improvement is not shown

due to large I/O.  The performance improvement is shown considering

median value of 3 runs and time is taken by enabling \timing option

of psql.


Performance Data

---

Configuration:

IBM POWER-7 16 cores, 64 hardware threads

RAM = 64GB

shared_buffers=8GB


pgbench_accounts

No. of Records -10million

Index - on integer

Operation - Reindex


Master

16043.500 ms

16058.723 ms

15941.057 ms


Patch

15525.054 ms

15551.935 ms

15492.879 ms


Perf Improvement: 3.43%


pgbench_accounts

No. of Records -30million

Index - on integer

Operation - Reindex


Master

51258.338 ms

50520.328 ms

50562.022 ms


Patch

49610.710 ms

49302.571 ms

49301.390 ms


Perf Improvement: 2.41%


table (c1 int, c2 char(10))

No. of records = 300,000

Index -  on char(10)

Operation - Reindex


Master

443.584 ms

444.798 ms

452.888 ms


Patch

421.554 ms

430.528 ms

447.558 ms


Performance Improvement: 3.2%


table (c1 int, c2 char(10))

No. of records = 500,000

Index -  on char(10)

Operation - Reindex


Master

663.621 ms

661.299 ms

657.754 ms


Patch

652.325 ms

644.782 ms

643.218 ms


Performance Improvement: 2.5%


table (c1 int, c2 char(10))

No. of records = 1000,000

Index -  on char(10)

Operation - Reindex


Master

16554.076 ms

16686.528 ms

16571.129 ms


Patch

16556.852 ms

16513.543 ms

16610.615 ms


Performance Improvement: less than 1%



table (c1 int, c2 char(20))

No. of records = 300,000

Index -  on char(20)

Operation - Reindex


Master

429.670 ms

441.445 ms

411.539 ms


Patch

401.801 ms

412.716 ms

395.002 ms


Performance Improvement: 6.48%


table (c1 int, c2 char(20))

No. of records = 500,000

Index -  on char(20)

Operation - Reindex


Master

724.541 ms

731.582 ms

704.934 ms


Patch

686.004 ms

677.361 ms

686.915 ms


Performance Improvement: 5.31%


table (c1 int, c2 char(20))

No. of records = 1000,000

Index -  on char(20)

Operation - Reindex


Master

20728.758 ms

20665.289 ms

20656.375 ms


Patch

20594.022 ms

20617.383 ms

20628.181 ms


Performance Improvement: 0.2%

Apart from this, I have taken data for unlogged tables and much
larger indexes and all the data shows similar results as above.
So I think above data is quite good representation of value for
this patch, however if you or anybody else still feels that more
data is required for any other configuration, please do let me
know about the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-03 Thread Robert Haas
On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, May 6, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
  Today, I discovered that when building a btree index, the btree code
  uses index_form_tuple() to create an index tuple from the heap tuple,
  calls tuplesort_putindextuple() to copy that tuple into the sort's
  memory context, and then frees the original one it built.  This seemed
  inefficient, so I wrote a patch to eliminate the tuple copying.  It
  works by adding a function tuplesort_putindextuplevalues(), which
  builds the tuple in the sort's memory context and thus avoids the need
  for a separate copy.  I'm not sure if that's the best approach, but
  the optimization seems wortwhile.
 
  Hm. It looks like we could quite easily just get rid of
  tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I glanced at that, but it wasn't obvious to me how to convert the hash
 usage.  If you have an idea, I'm all ears.

 I also think it's possible to have similar optimization for hash index
 incase it has to spool the tuple for sorting.

 In function hashbuildCallback(), when buildstate-spool is true, we
 can avoid to form index tuple. To check for nulls before calling

 _h_spool(), we can traverse the isnull array.

Hmm, that might work.  Arguably it's less efficient, but on the other
hand if it avoids forming the tuple sometimes it might be MORE
efficient.  And anyway the difference might not be enough to matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-01 Thread Amit Kapila
On Tue, May 6, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com
wrote:
  On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
  Today, I discovered that when building a btree index, the btree code
  uses index_form_tuple() to create an index tuple from the heap tuple,
  calls tuplesort_putindextuple() to copy that tuple into the sort's
  memory context, and then frees the original one it built.  This seemed
  inefficient, so I wrote a patch to eliminate the tuple copying.  It
  works by adding a function tuplesort_putindextuplevalues(), which
  builds the tuple in the sort's memory context and thus avoids the need
  for a separate copy.  I'm not sure if that's the best approach, but
  the optimization seems wortwhile.
 
  Hm. It looks like we could quite easily just get rid of
  tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I glanced at that, but it wasn't obvious to me how to convert the hash
 usage.  If you have an idea, I'm all ears.

I also think it's possible to have similar optimization for hash index
incase it has to spool the tuple for sorting.

In function hashbuildCallback(), when buildstate-spool is true, we
can avoid to form index tuple. To check for nulls before calling

_h_spool(), we can traverse the isnull array.
It seems converting hash index usage is not as straightforward as
btree index, but doesn't look too complex either.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Andres Freund
Hi,

On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
 Today, I discovered that when building a btree index, the btree code
 uses index_form_tuple() to create an index tuple from the heap tuple,
 calls tuplesort_putindextuple() to copy that tuple into the sort's
 memory context, and then frees the original one it built.  This seemed
 inefficient, so I wrote a patch to eliminate the tuple copying.  It
 works by adding a function tuplesort_putindextuplevalues(), which
 builds the tuple in the sort's memory context and thus avoids the need
 for a separate copy.  I'm not sure if that's the best approach, but
 the optimization seems wortwhile.

Hm. It looks like we could quite easily just get rid of
tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I tested it by repeatedly executing REINDEX INDEX
 pgbench_accounts_pkey on a PPC64 machine.  pgbench_accounts contains
 10 million records.  With unpatched master as of
 b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
 amount of work involved.

Yes, that's certainly worthwile. Nice.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Robert Haas
On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
 Today, I discovered that when building a btree index, the btree code
 uses index_form_tuple() to create an index tuple from the heap tuple,
 calls tuplesort_putindextuple() to copy that tuple into the sort's
 memory context, and then frees the original one it built.  This seemed
 inefficient, so I wrote a patch to eliminate the tuple copying.  It
 works by adding a function tuplesort_putindextuplevalues(), which
 builds the tuple in the sort's memory context and thus avoids the need
 for a separate copy.  I'm not sure if that's the best approach, but
 the optimization seems wortwhile.

 Hm. It looks like we could quite easily just get rid of
 tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

I glanced at that, but it wasn't obvious to me how to convert the hash
usage.  If you have an idea, I'm all ears.

 I tested it by repeatedly executing REINDEX INDEX
 pgbench_accounts_pkey on a PPC64 machine.  pgbench_accounts contains
 10 million records.  With unpatched master as of
 b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
 amount of work involved.

 Yes, that's certainly worthwile. Nice.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers