Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-03-02 Thread Heikki Linnakangas

On 20.02.2012 10:54, Alexander Korotkov wrote:

On Wed, Feb 15, 2012 at 7:28 PM, Tom Lanet...@sss.pgh.pa.us  wrote:


Alexander Korotkovaekorot...@gmail.com  writes:

On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

So, I think we should go with your original fix and simply do nothing in
gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.



I agree.


OK, I won't object.


So, I think we can just commit first version of fix now.


Thanks, committed. Sorry for the delay..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-20 Thread Alexander Korotkov
On Wed, Feb 15, 2012 at 7:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas 
  heikki.linnakan...@enterprisedb.com wrote:
  So, I think we should go with your original fix and simply do nothing in
  gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.

  I agree.

 OK, I won't object.


So, I think we can just commit first version of fix now.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistbuildbuffers.c
--- b/src/backend/access/gist/gistbuildbuffers.c
***
*** 607,617  gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
  	if (!found)
  	{
  		/*
! 		 * Node buffer should exist at this point. If it didn't exist before,
! 		 * the insertion that caused the page to split should've created it.
  		 */
! 		elog(ERROR, node buffer of page being split (%u) does not exist,
! 			 blocknum);
  	}
  
  	/*
--- 607,616 
  	if (!found)
  	{
  		/*
! 		 * Page without buffer could be produced by split of root page. So
! 		 * we've just nothing to do here when there is no buffer.
  		 */
! 		return;
  	}
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Alexander Korotkov
On Wed, Feb 15, 2012 at 2:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  ITSM, I found the problem. This piece of code is triggering an error. It
  assumes each page of corresponding to have initialized buffer. That
 should
  be true because we're inserting index tuples from up to down while
  splits propagate from down to up.
  But this assumptions becomes false we turn buffer off in the root page.
 So,
  root page can produce pages without initialized buffers when splits.

 Hmm ... can we tighten the error check rather than just remove it?  It
 feels less than safe to assume that a hash-entry-not-found condition
 *must* reflect a corner-case situation like that.  At the very least
 I'd like to see it verify that we'd turned off buffering before deciding
 this is OK.  Better, would it be practical to make dummy entries in the
 hash table even after turning buffers off, so that the logic here
 becomes

if (!found) error;
else if (entry is dummy) return without doing anything;
else proceed;

regards, tom lane


Ok, there is another patch fixes this problem. Instead of error triggering
remove it adds empty buffers on root page split if needed.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistbuild.c
--- b/src/backend/access/gist/gistbuild.c
***
*** 668,677  gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
  	if (is_split  BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO)
  	{
  		GISTBufferingInsertStack *oldroot = gfbb-rootitem;
! 		Page		page = BufferGetPage(buffer);
! 		ItemId		iid;
! 		IndexTuple	idxtuple;
! 		BlockNumber leftmostchild;
  
  		gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc(
  			gfbb-context, sizeof(GISTBufferingInsertStack));
--- 668,678 
  	if (is_split  BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO)
  	{
  		GISTBufferingInsertStack *oldroot = gfbb-rootitem;
! 		Page		 page = BufferGetPage(buffer);
! 		ItemId		 iid;
! 		IndexTuple	 idxtuple;
! 		BlockNumber  leftmostchild;
! 		OffsetNumber maxoff, i;
  
  		gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc(
  			gfbb-context, sizeof(GISTBufferingInsertStack));
***
*** 694,699  gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
--- 695,719 
  		oldroot-parent = gfbb-rootitem;
  		oldroot-blkno = leftmostchild;
  		oldroot-downlinkoffnum = InvalidOffsetNumber;
+ 		
+ 		/* 
+ 		 * If root page split produce new pages on leven which have buffers
+ 		 * then initialize empty buffers there.
+ 		 */
+ 		if (LEVEL_HAS_BUFFERS(oldroot-level, gfbb))
+ 		{
+ 			maxoff = PageGetMaxOffsetNumber(page);
+ 			for (i = FirstOffsetNumber; i = maxoff; i = OffsetNumberNext(i))
+ 			{
+ iid = PageGetItemId(page, i);
+ idxtuple = (IndexTuple) PageGetItem(page, iid);
+ gistGetNodeBuffer(gfbb,
+   buildstate-giststate,
+   ItemPointerGetBlockNumber((idxtuple-t_tid)),
+   i,
+   gfbb-rootitem);
+ 			}
+ 		}
  	}
  
  	if (splitinfo)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Heikki Linnakangas

On 15.02.2012 10:18, Alexander Korotkov wrote:

On Wed, Feb 15, 2012 at 2:54 AM, Tom Lanet...@sss.pgh.pa.us  wrote:


Alexander Korotkovaekorot...@gmail.com  writes:

ITSM, I found the problem. This piece of code is triggering an error. It
assumes each page of corresponding to have initialized buffer. That

should

be true because we're inserting index tuples from up to down while
splits propagate from down to up.
But this assumptions becomes false we turn buffer off in the root page.

So,

root page can produce pages without initialized buffers when splits.


Hmm ... can we tighten the error check rather than just remove it?  It
feels less than safe to assume that a hash-entry-not-found condition
*must* reflect a corner-case situation like that.  At the very least
I'd like to see it verify that we'd turned off buffering before deciding
this is OK.  Better, would it be practical to make dummy entries in the
hash table even after turning buffers off, so that the logic here
becomes

if (!found) error;
else if (entry is dummy) return without doing anything;
else proceed;

regards, tom lane



Ok, there is another patch fixes this problem. Instead of error triggering
remove it adds empty buffers on root page split if needed.


Actually, I think it made sense to simply do nothing if the buffer 
doesn't exist. The algorithm doesn't require that all the buffers must 
exist at all times. It is quite accidental that whenever we call 
gistRelocateBuildBuffersOnSplit(), the page must already have its buffer 
created (and as we found out, the assumption doesn't hold after a root 
split, anyway). Also, we talked earlier that it would be good to destroy 
buffers that become completely empty, to save memory. If we do that, 
we'd have to remove that check anyway.


So, I think we should go with your original fix and simply do nothing in 
gistRelocateBuildBuffersOnSplit() if the page doesn't have a buffer. 
Moreover, if the page has a buffer but it's empty, 
gistRelocateBuildBuffersOnSplit() doesn't need to create buffers for the 
new sibling pages. In the final emptying phase, that's a waste of time, 
the buffers we create will never be used, and even before that I think 
it's better to create the buffers lazily.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Alexander Korotkov
On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Actually, I think it made sense to simply do nothing if the buffer doesn't
 exist. The algorithm doesn't require that all the buffers must exist at all
 times. It is quite accidental that whenever we call
 gistRelocateBuildBuffersOnSpli**t(), the page must already have its
 buffer created (and as we found out, the assumption doesn't hold after a
 root split, anyway). Also, we talked earlier that it would be good to
 destroy buffers that become completely empty, to save memory. If we do
 that, we'd have to remove that check anyway.

 So, I think we should go with your original fix and simply do nothing in
 gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.
 Moreover, if the page has a buffer but it's empty,
 gistRelocateBuildBuffersOnSpli**t() doesn't need to create buffers for
 the new sibling pages. In the final emptying phase, that's a waste of time,
 the buffers we create will never be used, and even before that I think it's
 better to create the buffers lazily.


I agree.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
 So, I think we should go with your original fix and simply do nothing in
 gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.

 I agree.

OK, I won't object.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-14 Thread Alexander Korotkov
ITSM, I found the problem. This piece of code is triggering an error. It
assumes each page of corresponding to have initialized buffer. That should
be true because we're inserting index tuples from up to down while
splits propagate from down to up.

if (!found)
{
/*
 * Node buffer should exist at this point. If it didn't exist before,
 * the insertion that caused the page to split should've created it.
 */
elog(ERROR, node buffer of page being split (%u) does not exist,
 blocknum);
}

But this assumptions becomes false we turn buffer off in the root page. So,
root page can produce pages without initialized buffers when splits.

/*
 * Does specified level have buffers? (Beware of multiple evaluation of
 * arguments.)
 */
#define LEVEL_HAS_BUFFERS(nlevel, gfbb) \
((nlevel) != 0  (nlevel) % (gfbb)-levelStep == 0  \
 (nlevel) != (gfbb)-rootitem-level)

So, I think we should just do silent return from the function instead of
triggering error. Patch is attached.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistbuildbuffers.c
--- b/src/backend/access/gist/gistbuildbuffers.c
***
*** 607,617  gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
  	if (!found)
  	{
  		/*
! 		 * Node buffer should exist at this point. If it didn't exist before,
! 		 * the insertion that caused the page to split should've created it.
  		 */
! 		elog(ERROR, node buffer of page being split (%u) does not exist,
! 			 blocknum);
  	}
  
  	/*
--- 607,616 
  	if (!found)
  	{
  		/*
! 		 * Page without buffer could be produced by split of root page. So
! 		 * we've just nothing to do here when there is no buffer.
  		 */
! 		return;
  	}
  
  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-14 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 ITSM, I found the problem. This piece of code is triggering an error. It
 assumes each page of corresponding to have initialized buffer. That should
 be true because we're inserting index tuples from up to down while
 splits propagate from down to up.
 But this assumptions becomes false we turn buffer off in the root page. So,
 root page can produce pages without initialized buffers when splits.

Hmm ... can we tighten the error check rather than just remove it?  It
feels less than safe to assume that a hash-entry-not-found condition
*must* reflect a corner-case situation like that.  At the very least
I'd like to see it verify that we'd turned off buffering before deciding
this is OK.  Better, would it be practical to make dummy entries in the
hash table even after turning buffers off, so that the logic here
becomes

if (!found) error;
else if (entry is dummy) return without doing anything;
else proceed;

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-13 Thread Robert Haas
On Thu, Feb 9, 2012 at 3:37 PM, Jay Levitt jay.lev...@gmail.com wrote:
 So my pre-built 9.1.2 takes 434s, my source-built 9.2 takes 509s, and
 (probably both of our) 9.1-HEAD takes 1918s... is that something to worry
 about, and if so, are there any tests I can run to assist? That bug doesn't
 affect me personally, but y'know, community and all that.  Also, I wonder if
 it's something like 9.2 got way faster doing X, but meanwhile, HEAD got way
 slower doing Y., and this is a canary in the coal mine.

This might be a lame hypothesis, but... is it possible that you built
your 9.1-tip binaries with --enable-cassert?  Or with different
optimization options?

There's been some work done on GiST in 9.2, which as Alexander
Korotkov who did the work mentioned upthread, might have some issue.
But I can't see how there can be a 4x regression between minor
releases, though maybe it wouldn't hurt to test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-13 Thread Robert Haas
On Mon, Feb 13, 2012 at 7:45 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 9, 2012 at 3:37 PM, Jay Levitt jay.lev...@gmail.com wrote:
 So my pre-built 9.1.2 takes 434s, my source-built 9.2 takes 509s, and
 (probably both of our) 9.1-HEAD takes 1918s... is that something to worry
 about, and if so, are there any tests I can run to assist? That bug doesn't
 affect me personally, but y'know, community and all that.  Also, I wonder if
 it's something like 9.2 got way faster doing X, but meanwhile, HEAD got way
 slower doing Y., and this is a canary in the coal mine.

 This might be a lame hypothesis, but... is it possible that you built
 your 9.1-tip binaries with --enable-cassert?  Or with different
 optimization options?

 There's been some work done on GiST in 9.2, which as Alexander
 Korotkov who did the work mentioned upthread, might have some issue.
 But I can't see how there can be a 4x regression between minor
 releases, though maybe it wouldn't hurt to test.

So I tested.  On my MacBook Pro, your test script builds the index in
just over 25 s on both REL9_1_2 and this morning's REL9_1_STABLE.
This is with the following non-default configuration settings:

shared_buffers = 400MB
maintenance_work_mem = 1GB
checkpoint_segments = 30
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
checkpoint_warning = 60s

I then tested with master, which also showed similar performance.
Based on this comment from your original email:

 [***] never completed after 10-20 minutes; nothing in server.log at default 
 logging levels, postgres process consuming about 1 CPU in IOWAIT, 
 checkpoints every 7-8 seconds

...I wonder if you have left checkpoint_segments set to the default
value of 3, which would account for the very frequent checkpoints.

At any rate, I can't measure a difference between the branches on this
test.  That doesn't mean there isn't one, but in my test setup I'm not
seeing it.  As an afterthought, I also retested with wal_level=archive
added to the config, but I still don't see any significant difference
between 9.1.2, 9.1-stable, and 9.2-devel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-13 Thread Jay Levitt

Robert Haas wrote:

On Mon, Feb 13, 2012 at 7:45 AM, Robert Haasrobertmh...@gmail.com  wrote:

On Thu, Feb 9, 2012 at 3:37 PM, Jay Levittjay.lev...@gmail.com  wrote:

So my pre-built 9.1.2 takes 434s, my source-built 9.2 takes 509s, and
(probably both of our) 9.1-HEAD takes 1918s... is that something to worry
about, and if so, are there any tests I can run to assist? That bug doesn't
affect me personally, but y'know, community and all that.  Also, I wonder if
it's something like 9.2 got way faster doing X, but meanwhile, HEAD got way
slower doing Y., and this is a canary in the coal mine.

This might be a lame hypothesis, but... is it possible that you built
your 9.1-tip binaries with --enable-cassert?  Or with different
optimization options?


No, I think I/O just varies more than my repeated tests on 1M rows 
indicated.  I ran the 10M-row test four times on the same server, 
alternating between packaged 9.1.2 and source-built 9.1.2 (default configure 
options), and saw these times:


INSERT  INDEX
apt 76  578
source  163 636
apt 73  546
source  80  473

EBS has no performance guarantees at all; you share your disks with an 
arbitrary number of other users, so if someone in the neighborhood decides 
to do some heavy disk I/O, you lose. Let this be a lesson to me: run 
benchmarks locally!



So I tested.  On my MacBook Pro, your test script builds the index in
just over 25 s on both REL9_1_2 and this morning's REL9_1_STABLE.


I think that's the 1-million version I emailed; try adding a zero and see if 
it doesn't take a little longer.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-09 Thread Jay Levitt

Tom Lane wrote:

Jay Levittjay.lev...@gmail.com  writes:

[Posted at Andres's request]
TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in
various builds.



1. In 9.1.2, inserting 10x rows takes 19x the time.
 - 9.1-HEAD and 9.2 fix this; it now slows down linearly
 - but: 10s  8s  5s!
 - but: comparing Ubuntu binary w/vanilla source build on virtual disks,
might not be significant


FWIW, I find it really hard to believe that there is any real difference
between 9.1.2 and 9.1 branch tip on this.  There have been no
significant changes in either the gist or contrib/cube code in that
branch.  I suspect you have a measurement issue there.


I suspect you're right, given that five runs in a row produced times from 7s 
to 10s.  I just wanted to include it for completeness and in case it 
triggered any a-ha moments.



4. 9.1-HEAD never successfully indexes 10 million rows (never = at least
20 minutes on two runs; I will follow up in a few hours)


Works for me (see above), though it's slower than you might've expected.


So my pre-built 9.1.2 takes 434s, my source-built 9.2 takes 509s, and 
(probably both of our) 9.1-HEAD takes 1918s... is that something to worry 
about, and if so, are there any tests I can run to assist? That bug doesn't 
affect me personally, but y'know, community and all that.  Also, I wonder if 
it's something like 9.2 got way faster doing X, but meanwhile, HEAD got way 
slower doing Y., and this is a canary in the coal mine.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Tom Lane
Jay Levitt jay.lev...@gmail.com writes:
 [Posted at Andres's request]
 TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in 
 various builds.

Not sure yet about most of these, but I know the reason for this one:

 2. In both 9.1 and 9.2, there is a long delay before CREATE INDEX realizes 
 it can't work on an unlogged table

That error is thrown in gistbuildempty, which is not called until after
we have finished building the main-fork index.  This is a tad unfriendly
when the table already contains lots of data.

ISTM there are two ways we could fix this:

1. Introduce a duplicative test at the start of gistbuild().

2. Rearrange the order of operations in index_build() so that the init
fork is made first.

Both of these are kinda ugly, but #2 puts the ugliness into someplace
that shouldn't have to know about it, and furthermore someplace that's
unlikely to get reverted if/when gist is fixed to not have this problem.
So I think I favor #1.  Other opinions anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Robert Haas
On Wed, Feb 8, 2012 at 2:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jay Levitt jay.lev...@gmail.com writes:
 [Posted at Andres's request]
 TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in
 various builds.

 Not sure yet about most of these, but I know the reason for this one:

 2. In both 9.1 and 9.2, there is a long delay before CREATE INDEX realizes
 it can't work on an unlogged table

 That error is thrown in gistbuildempty, which is not called until after
 we have finished building the main-fork index.  This is a tad unfriendly
 when the table already contains lots of data.

 ISTM there are two ways we could fix this:

 1. Introduce a duplicative test at the start of gistbuild().

 2. Rearrange the order of operations in index_build() so that the init
 fork is made first.

 Both of these are kinda ugly, but #2 puts the ugliness into someplace
 that shouldn't have to know about it, and furthermore someplace that's
 unlikely to get reverted if/when gist is fixed to not have this problem.
 So I think I favor #1.  Other opinions anyone?

I don't think I understand your object to #2.  It appears to be a
trivial rearrangement?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 8, 2012 at 2:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ISTM there are two ways we could fix this:
 
 1. Introduce a duplicative test at the start of gistbuild().
 
 2. Rearrange the order of operations in index_build() so that the init
 fork is made first.
 
 Both of these are kinda ugly, but #2 puts the ugliness into someplace
 that shouldn't have to know about it, and furthermore someplace that's
 unlikely to get reverted if/when gist is fixed to not have this problem.
 So I think I favor #1.  Other opinions anyone?

 I don't think I understand your object to #2.  It appears to be a
 trivial rearrangement?

Yeah, but then we are wiring into index_build the idea that ambuildempty
is more important, or more likely to throw an error, or something, than
ambuild is.  It seems weird.  And fragile, since somebody could decide
to re-order those two steps again for reasons unrelated to gist.
Basically, I think this problem is gist's to deal with and so that's
where the fix should be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Robert Haas
On Wed, Feb 8, 2012 at 2:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 8, 2012 at 2:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ISTM there are two ways we could fix this:

 1. Introduce a duplicative test at the start of gistbuild().

 2. Rearrange the order of operations in index_build() so that the init
 fork is made first.

 Both of these are kinda ugly, but #2 puts the ugliness into someplace
 that shouldn't have to know about it, and furthermore someplace that's
 unlikely to get reverted if/when gist is fixed to not have this problem.
 So I think I favor #1.  Other opinions anyone?

 I don't think I understand your object to #2.  It appears to be a
 trivial rearrangement?

 Yeah, but then we are wiring into index_build the idea that ambuildempty
 is more important, or more likely to throw an error, or something, than
 ambuild is.  It seems weird.  And fragile, since somebody could decide
 to re-order those two steps again for reasons unrelated to gist.

I guess.  I think the compelling reason to do ambuildempty first is
that it's fast.  So might as well.  I think you'e just going to end up
hard-wiring the assumption that ambuild happens before ambuildempty,
which doesn't seem any better than the other way around, but I don't
care enough to argue about if you feel strongly about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Alexander Korotkov
On Tue, Feb 7, 2012 at 11:26 PM, Jay Levitt jay.lev...@gmail.com wrote:

 [*] psql:slowcube.sql:20: ERROR:  node buffer of page being split (121550)
 does not exist

This looks like a bug in buffering GiST index build I've implemented during
my GSoC 2011 project. It looks especially strange with following setting:

 effective_cache_size = 3734MB

because buffering GiST index build just shouldn't turn on in this case when
index fits to cache. I'm goint to take a detailed look on this.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I guess.  I think the compelling reason to do ambuildempty first is
 that it's fast.  So might as well.  I think you'e just going to end up
 hard-wiring the assumption that ambuild happens before ambuildempty,

Well, no, because I'm proposing that both functions throw this error.

 which doesn't seem any better than the other way around, but I don't
 care enough to argue about if you feel strongly about it.

What's ugly about this solution is the duplicative ereport calls.  But
at least the ugliness is confined to its source, ie gist.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-08 Thread Tom Lane
Jay Levitt jay.lev...@gmail.com writes:
 [Posted at Andres's request]
 TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in 
 various builds.

 1. In 9.1.2, inserting 10x rows takes 19x the time.
 - 9.1-HEAD and 9.2 fix this; it now slows down linearly
 - but: 10s  8s  5s!
 - but: comparing Ubuntu binary w/vanilla source build on virtual disks, 
 might not be significant

FWIW, I find it really hard to believe that there is any real difference
between 9.1.2 and 9.1 branch tip on this.  There have been no
significant changes in either the gist or contrib/cube code in that
branch.  I suspect you have a measurement issue there.

On my not-at-all-virtual Fedora 16 workstation, with 9.1 tip, your test
case shows index build times of
10 rows 3650 ms
100 rows48400 ms
1000 rows   1917800 ms
which confirms the nonlinear scaling in 9.1, though I'm not sure it's
not just running out of RAM and having to do a lot of I/O in the last
case.  (This is an entirely untuned debug build, which probably doesn't
help.)  It's hard to guess how much available RAM you were working with
on your box -- mine's got 4GB.

 2. In both 9.1 and 9.2, there is a long delay before CREATE INDEX realizes 
 it can't work on an unlogged table

Fixed.

 3. In 9.2, creating the 10-million-row index always fails

As Alexander noted, this is probably a bug in his recent patch.  We'll
look at it.  (I duplicated it here, so it's plenty real.)

 4. 9.1-HEAD never successfully indexes 10 million rows (never = at least 
 20 minutes on two runs; I will follow up in a few hours)

Works for me (see above), though it's slower than you might've expected.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-07 Thread Jay Levitt

[Posted at Andres's request]

TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in 
various builds.


NOTABLE PROBLEMS

1. In 9.1.2, inserting 10x rows takes 19x the time.
   - 9.1-HEAD and 9.2 fix this; it now slows down linearly
   - but: 10s  8s  5s!
   - but: comparing Ubuntu binary w/vanilla source build on virtual disks, 
might not be significant


2. In both 9.1 and 9.2, there is a long delay before CREATE INDEX realizes 
it can't work on an unlogged table

3. In 9.2, creating the 10-million-row index always fails
4. 9.1-HEAD never successfully indexes 10 million rows (never = at least 
20 minutes on two runs; I will follow up in a few hours)


DETAILS

Times are in seconds, single run.

+---+-+-+--+--+
| Platform  | 1m rows | 1m rows | 10m rows | 10m rows |
|   | INSERT  | CR NDX  | INSERT   | CR NDX   |
+---+-+-+--+--+
| 9.1.2 logged  | 5   | 35  | 98   | 434  |
| 9.1.2 unlogged| 2   | 34[**]  | 22   | 374[**]  |
| 9.1-HEAD logged   | 10  | 65  | 89   | [***]|
| 9.1-HEAD unlogged | 2   | 39  | 20   | 690[**]  |
| 9.2 logged| 8   | 57  | 87   | 509[*]   |
| 9.2 unlogged  | 2   | 33[**]  | 21   | 327[*]   |
+---+-+-+--+--+

[*] psql:slowcube.sql:20: ERROR:  node buffer of page being split (121550) 
does not exist

[**] psql:slowcube.sql:21: ERROR:  unlogged GiST indexes are not supported
[***] never completed after 10-20 minutes; nothing in server.log at default 
logging levels, postgres process consuming about 1 CPU in IOWAIT, 
checkpoints every 7-8 seconds


VARIABILITY

A few runs in a row on 9.1-HEAD, 1 million rows, logged:

++--+
| INSERT | CREATE INDEX |
++--+
| 10 |   65 |
|  8 |   61 |
|  7 |   59 |
|  8 |   61 |
|  7 |   55 |
++--+

SYSTEM SPECS

Amazon EC2, EBS-backed, m1.large
7.5GB RAM, 2 cores
Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz

shared_buffers = 1867MB
checkpoint_segments = 32
effective_cache_size = 3734MB

9.1.2: installed binaries from Ubuntu's oneiric repo
9.1-HEAD: REL9_1_STABLE, ef19c9dfaa99a2b78ed0f78aa4a44ed31636fdc4, built 
with simple configure/make/make install
9.2: master, 1631598ea204a3b05104f25d008b510ff5a5c94a, built with simple 
configure/make/make install


9.1.2 and 9.1-HEAD were run on different (but identically configured) 
instances.  9.1-HEAD and 9.2 were run on the same instance, but EBS 
performance is unpredictable. YMMV.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-07 Thread Jay Levitt

Jay Levitt wrote:

[Posted at Andres's request]

TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in
various builds.


And I bet you'll want the test script... sigh.  attached.
\c postgres
drop database if exists slowcube;
create database slowcube;
\c slowcube
\timing
create schema slowcube;
set search_path to slowcube;

create extension cube;

set work_mem to '1GB';
set maintenance_work_mem to '1GB';

create table cubetest (
  position cube
  );

insert into cubetest (position)
  select cube(array[random() * 1000, random() * 1000, random() * 1000]) from 
generate_series(1,100);
select now();
create index q on cubetest using gist(position);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers