Re: [HACKERS] Relation extension scalability

2016-04-08 Thread Dilip Kumar
On Fri, Apr 8, 2016 at 11:38 AM, Robert Haas  wrote:

> Yeah.  I've committed the patch now, with some cosmetic cleanup.
>

Thanks Robert !!!


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-04-08 Thread Robert Haas
On Tue, Apr 5, 2016 at 1:05 PM, Peter Geoghegan  wrote:
> On Tue, Apr 5, 2016 at 10:02 AM, Robert Haas  wrote:
>> So the first thing here is that the patch seems to be a clear win in
>> this test.  For a single copy, it seems to be pretty much a wash.
>> When running 4 copies in parallel, it is about 20-25% faster with both
>> logged and unlogged tables.  The second thing that is interesting is
>> that we are getting super-linear scalability even without the patch:
>> if 1 copy takes 20 seconds, you might expect 4 to take 80 seconds, but
>> it really takes 60 unpatched or 45 patched.  If 1 copy takes 30
>> seconds, you might expect 4 to take 120 seconds, but in really takes
>> 105 unpatched or 80 patched.  So we're not actually I/O constrained on
>> this test, I think, perhaps because this machine has an SSD.
>
> It's not unusual for COPY to not be I/O constrained, I believe.

Yeah.  I've committed the patch now, with some cosmetic cleanup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-04-05 Thread Peter Geoghegan
On Tue, Apr 5, 2016 at 10:02 AM, Robert Haas  wrote:
> So the first thing here is that the patch seems to be a clear win in
> this test.  For a single copy, it seems to be pretty much a wash.
> When running 4 copies in parallel, it is about 20-25% faster with both
> logged and unlogged tables.  The second thing that is interesting is
> that we are getting super-linear scalability even without the patch:
> if 1 copy takes 20 seconds, you might expect 4 to take 80 seconds, but
> it really takes 60 unpatched or 45 patched.  If 1 copy takes 30
> seconds, you might expect 4 to take 120 seconds, but in really takes
> 105 unpatched or 80 patched.  So we're not actually I/O constrained on
> this test, I think, perhaps because this machine has an SSD.

It's not unusual for COPY to not be I/O constrained, I believe.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-04-05 Thread Robert Haas
On Thu, Mar 31, 2016 at 9:03 AM, Dilip Kumar  wrote:
> If you need some more information please let me know ?

I repeated the testing described in
http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
on a MacBook Pro (OS X 10.8.5, 2.4 GHz Intel Core i7, 8GB, early 2013)
and got the following results.  Note that I did not adjust
*_flush_delay in this test because that's always 0, apparently, on
MacOS.

master, unlogged tables, 1 copy: 0m18.928s, 0m20.276s, 0m18.040s
patched, unlogged tables, 1 copy: 0m20.499s, 0m20.879s, 0m18.912s
master, unlogged tables, 4 parallel copies: 0m57.301s, 0m58.045s, 0m57.556s
patched, unlogged tables, 4 parallel copies: 0m47.994s, 0m45.586s, 0m44.440s

master, logged tables, 1 copy: 0m29.353s, 0m29.693s, 0m31.840s
patched, logged tables, 1 copy: 0m30.837s, 0m31.567s, 0m36.843s
master, logged tables, 4 parallel copies: 1m45.691s, 1m53.085s, 1m35.674s
patched, logged tables, 4 parallel copies: 1m21.137s, 1m20.678s, 1m22.419s

So the first thing here is that the patch seems to be a clear win in
this test.  For a single copy, it seems to be pretty much a wash.
When running 4 copies in parallel, it is about 20-25% faster with both
logged and unlogged tables.  The second thing that is interesting is
that we are getting super-linear scalability even without the patch:
if 1 copy takes 20 seconds, you might expect 4 to take 80 seconds, but
it really takes 60 unpatched or 45 patched.  If 1 copy takes 30
seconds, you might expect 4 to take 120 seconds, but in really takes
105 unpatched or 80 patched.  So we're not actually I/O constrained on
this test, I think, perhaps because this machine has an SSD.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Dilip Kumar
On Thu, Mar 31, 2016 at 4:28 PM, Robert Haas  wrote:

> Yeah, kind of.  But obviously if we could make the limit smaller
> without hurting performance, that would be better.
>
> Per my note yesterday about performance degradation with parallel
> COPY, I wasn't able to demonstrate that this patch gives a consistent
> performance benefit on hydra - the best result I got was speeding up a
> 9.5 minute load to 8 minutes where linear scalability would have been
> 2 minutes.  And I found cases where it was actually slower with the
> patch.  Now maybe hydra is just a crap machine, but I'm feeling
> nervous.
>


I see the performance gain when  either
 "*complete data is in SSD*"
or *"data on MD and WAL on SSD"*
or *unlogged table*.


What machines did you use to test this?  Have you tested really large
> data loads, like many MB or even GB of data?
>


With INSERT Script within 2 mins run data size is  18GB I am running 5-10
Mins means at least 85GB data.
(Inserts 1000 1KB tuples in each transactions)

With COPY Script within 2 mins run data size is 23GB and I am running 5-10
Mins means at least 100GB data.
(Inserts 1 tuples in each transactions (tuple size is 1byte to 5 bytes)

Machine Details
---
I tested in 8 socket NUMA machine with 64 core.
Machine Details:
--
[dilip.kumar@cthulhu ~]$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8
NUMA node(s):  8
Vendor ID: GenuineIntel
CPU family:6
Model: 47
Model name:Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:  2
CPU MHz:   1064.000
BogoMIPS:  4266.62


If you need some more information please let me know ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 4:28 PM, Robert Haas  wrote:

> On Thu, Mar 31, 2016 at 12:59 AM, Dilip Kumar 
> wrote:
> > On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
> > wrote:
> >> Yes, that makes sense.  One more point is that if the reason for v13
> >> giving better performance is extra blocks (which we believe in certain
> cases
> >> can leak till the time Vacuum updates the FSM tree), do you think it
> makes
> >> sense to once test by increasing lockWaiters * 20 limit to may be
> >> lockWaiters * 25 or lockWaiters * 30.
> >
> >
> > I tested COPY 1 record, by increasing the number of blocks just to
> find
> > out why we are not as good as V13
> >  with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..
> >
> > COPY 1
> > 
> > Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
> >    -
> > 16 752
> > 32 708
> >
> > This proves that main reason of v13 being better is its adding extra
> blocks
> > without control.
> > though v13 is better than these results, I think we can get that also by
> > changing multiplier and max limit .
> >
> > But I think we are ok with the max size as 4MB (512 blocks) right?
>
> Yeah, kind of.  But obviously if we could make the limit smaller
> without hurting performance, that would be better.
>
> Per my note yesterday about performance degradation with parallel
> COPY, I wasn't able to demonstrate that this patch gives a consistent
> performance benefit on hydra - the best result I got was speeding up a
> 9.5 minute load to 8 minutes where linear scalability would have been
> 2 minutes.


Is this test for unlogged tables?  As far as I understand this patch will
show benefit if Data and WAL are on separate disks or if you test them with
unlogged tables, otherwise the WAL contention supersedes the benefit of
this patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Robert Haas
On Thu, Mar 31, 2016 at 12:59 AM, Dilip Kumar  wrote:
> On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
> wrote:
>> Yes, that makes sense.  One more point is that if the reason for v13
>> giving better performance is extra blocks (which we believe in certain cases
>> can leak till the time Vacuum updates the FSM tree), do you think it makes
>> sense to once test by increasing lockWaiters * 20 limit to may be
>> lockWaiters * 25 or lockWaiters * 30.
>
>
> I tested COPY 1 record, by increasing the number of blocks just to find
> out why we are not as good as V13
>  with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..
>
> COPY 1
> 
> Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
>    -
> 16 752
> 32 708
>
> This proves that main reason of v13 being better is its adding extra blocks
> without control.
> though v13 is better than these results, I think we can get that also by
> changing multiplier and max limit .
>
> But I think we are ok with the max size as 4MB (512 blocks) right?

Yeah, kind of.  But obviously if we could make the limit smaller
without hurting performance, that would be better.

Per my note yesterday about performance degradation with parallel
COPY, I wasn't able to demonstrate that this patch gives a consistent
performance benefit on hydra - the best result I got was speeding up a
9.5 minute load to 8 minutes where linear scalability would have been
2 minutes.  And I found cases where it was actually slower with the
patch.  Now maybe hydra is just a crap machine, but I'm feeling
nervous.

What machines did you use to test this?  Have you tested really large
data loads, like many MB or even GB of data?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 10:29 AM, Dilip Kumar  wrote:

>
> On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
> wrote:
>
>> Yes, that makes sense.  One more point is that if the reason for v13
>> giving better performance is extra blocks (which we believe in certain
>> cases can leak till the time Vacuum updates the FSM tree), do you think it
>> makes sense to once test by increasing lockWaiters * 20 limit to may
>> be lockWaiters * 25 or lockWaiters * 30.
>
>
> I tested COPY 1 record, by increasing the number of blocks just to
> find out why we are not as good as V13
>  with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..
>
> COPY 1
> 
> Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
>    -
> 16 752
> 32 708
>
> This proves that main reason of v13 being better is its adding extra
> blocks without control.
> though v13 is better than these results, I think we can get that also by
> changing multiplier and max limit .
>
> But I think we are ok with the max size as 4MB (512 blocks) right?.
>
>
This shows that there is performance increase of ~26% (599 to 752) at 16
client count if we raise the limit of max extend size from 4MB to 16MB
which is a good boost, but not sure if it is worth extending the relation
for cases where the newly added pages won't get used.  I think it should be
okay to go for 4MB as a limit for now and then if during beta testing or in
future, there are use cases where tuning this max limit helps, then we can
come back to it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-30 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
wrote:

> Yes, that makes sense.  One more point is that if the reason for v13
> giving better performance is extra blocks (which we believe in certain
> cases can leak till the time Vacuum updates the FSM tree), do you think it
> makes sense to once test by increasing lockWaiters * 20 limit to may
> be lockWaiters * 25 or lockWaiters * 30.


I tested COPY 1 record, by increasing the number of blocks just to find
out why we are not as good as V13
 with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..

COPY 1

Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
   -
16 752
32 708

This proves that main reason of v13 being better is its adding extra blocks
without control.
though v13 is better than these results, I think we can get that also by
changing multiplier and max limit .

But I think we are ok with the max size as 4MB (512 blocks) right?.

Does this test make sense ?

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Wed, Mar 30, 2016 at 7:51 AM, Dilip Kumar  wrote:

> + if (lockWaiters)
> + /*
> + * Here we are using same freespace for all the Blocks, but that
> + * is Ok, because all are newly added blocks and have same freespace
> + * And even some block which we just added to FreespaceMap above, is
> + * used by some backend and now freespace is not same, will not harm
> + * anything, because actual freespace will be calculated by user
> + * after getting the page.
> + */
> + UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);
>
>
> Does this look good ?


Done in better way..

+ lockWaiters = RelationExtensionLockWaiterCount(relation);
+
+ if (lockWaiters == 0)
+ return;
+


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Wed, Mar 30, 2016 at 7:19 AM, Robert Haas  wrote:

Thanks for review and better comments..


> hio.c: In function ‘RelationGetBufferForTuple’:
> hio.c:231:20: error: ‘freespace’ may be used uninitialized in this
> function [-Werror=uninitialized]
> hio.c:185:7: note: ‘freespace’ was declared here
> hio.c:231:20: error: ‘blockNum’ may be used uninitialized in this
> function [-Werror=uninitialized]
> hio.c:181:14: note: ‘blockNum’ was declared here
>

I have fixed those in v20



>
> There's nothing whatsoever to prevent RelationExtensionLockWaiterCount
> from returning 0.
>
> It's also rather ugly that the call to UpdateFreeSpaceMap() assumes
> that the last value returned by PageGetHeapFreeSpace() is as good as
> any other, but maybe we can just install a comment explaining that
> point; there's not an obviously better approach that I can see.
>

Added comments..

+ if (lockWaiters)
+ /*
+ * Here we are using same freespace for all the Blocks, but that
+ * is Ok, because all are newly added blocks and have same freespace
+ * And even some block which we just added to FreespaceMap above, is
+ * used by some backend and now freespace is not same, will not harm
+ * anything, because actual freespace will be calculated by user
+ * after getting the page.
+ */
+ UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);


Does this look good ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v20.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Robert Haas
On Tue, Mar 29, 2016 at 1:29 PM, Dilip Kumar  wrote:
> Attaching new version v18
>
> - Some cleanup work on v17.
> - Improved  UpdateFreeSpaceMap function.
> - Performance and space utilization are same as V17

Looks better.  Here's a v19 that I hacked on a bit.

Unfortunately, one compiler I tried this with had a pretty legitimate complaint:

hio.c: In function ‘RelationGetBufferForTuple’:
hio.c:231:20: error: ‘freespace’ may be used uninitialized in this
function [-Werror=uninitialized]
hio.c:185:7: note: ‘freespace’ was declared here
hio.c:231:20: error: ‘blockNum’ may be used uninitialized in this
function [-Werror=uninitialized]
hio.c:181:14: note: ‘blockNum’ was declared here

There's nothing whatsoever to prevent RelationExtensionLockWaiterCount
from returning 0.

It's also rather ugly that the call to UpdateFreeSpaceMap() assumes
that the last value returned by PageGetHeapFreeSpace() is as good as
any other, but maybe we can just install a comment explaining that
point; there's not an obviously better approach that I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..aeed40b 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,69 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * Extend a relation by multiple blocks to avoid future contention on the
+ * relation extension lock.  Our goal is to pre-extend the relation by an
+ * amount which ramps up as the degree of contention ramps up, but limiting
+ * the result to some sane overall value.
+ */
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	BlockNumber	blockNum,
+firstBlock;
+	int			extraBlocks = 0;
+	int			lockWaiters = 0;
+	Size		freespace;
+	Buffer		buffer;
+
+	/*
+	 * We use the length of the lock wait queue to judge how much to extend.
+	 * It might seem like multiplying the number of lock waiters by as much
+	 * as 20 is too aggressive, but benchmarking revealed that smaller numbers
+	 * were insufficient.  512 is just an arbitrary cap to prevent pathological
+	 * results (and excessive wasted disk space).
+	 */
+	lockWaiters = RelationExtensionLockWaiterCount(relation);
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	firstBlock = InvalidBlockNumber;
+
+	while (extraBlocks-- >= 0)
+	{
+		/* Ouch - an unnecessary lseek() each time through the loop! */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		/* Extend by one page. */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+		freespace = PageGetHeapFreeSpace(page);
+		UnlockReleaseBuffer(buffer);
+
+		if (firstBlock == InvalidBlockNumber)
+			firstBlock = blockNum;
+
+		/*
+		 * Immediately update the bottom level of the FSM.  This has a good
+		 * chance of making this page visible to other concurrently inserting
+		 * backends, and we want that to happen without delay.
+		 */
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+
+	/*
+	 * Updating the upper levels of the free space map is too expensive
+	 * to do for every block, but it's worth doing once at the end to make
+	 * sure that subsequent insertion activity sees all of those nifty free
+	 * pages we just inserted.
+	 */
+	UpdateFreeSpaceMap(relation, firstBlock, blockNum, freespace);
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,8 +296,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
 otherBlock;
 	bool		needLock;
@@ -308,6 +371,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -440,10 +504,46 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	 */
 	needLock = !RELATION_IS_LOCAL(relation);
 
+	/*
+	 * If we need the lock but are not able to acquire it immediately, we'll
+	 * consider extending the relation by multiple blocks at a time to manage
+	 * contention on the relation extension lock.  However, this only makes
+	 * sense if we're using the FSM; otherwise, there's no point.
+	 */
 	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
+	{
+		if (!use_fsm)
+			LockRelationForExtension(relation, ExclusiveLock);
+		else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))
+		{
+			/* Couldn't get the lock immediately; wait for it. */
+			LockRelationForExtension(relation, ExclusiveLock);
+
+			/*
+			 * Check if some other backend has 

Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 2:09 PM, Dilip Kumar  wrote:

>
> Attaching new version v18

- Some cleanup work on v17.
- Improved  *UpdateFreeSpaceMap *function.
- Performance and space utilization are same as V17


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-29 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 7:26 AM, Robert Haas  wrote:

> Well, it's less important in that case, but I think it's still worth
> doing.  Some people are going to do just plain GetPageWithFreeSpace().
>

I am attaching new version v17.

Its like this...

In *RelationAddExtraBlocks*
{
-- Add Block one by one to FSM.

-- Update FSM tree all the way to root
}

In *RelationGetBufferForTuple*
 ---  Same as v14, search the FSM tree from root.  GetPageWithFreeSpace

*Summary:*
*--*
1. By Adding block to FSM tree one by one it solves the problem of unused
blocks in V14.
2. It Update the FSM tree all they up to root, so anybody search from root
can get the block.
3. It also search the block from root, so it don't have problem like v15
has(Exactly identifying which two FSM page to search).
4. This solves the performance problem of V14 by some optimizations in
logic of updating FSM tree till root.

*Performance Data*:
--
Client  base  v17
  
1 117 120
2 111 123
4 51 124
8 43 135
16 40 145
32 35 144
64 -- 140
Client  base v17
---   ---  --
1 118 117
2 217 220
4 210 379
8 166 574
16 145 594
32 124 599
64  609


Notes:
-
If I do some change in this patch in strategy of searching the block,
performance remains almost the same.
 1. Like search in two block like v15 or v17 does.
 2. Search first using target block and if don't get then search from top.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Amit Kapila
On Tue, Mar 29, 2016 at 3:21 AM, Petr Jelinek  wrote:

> On 28/03/16 14:46, Dilip Kumar wrote:
>
>>
>> Conclusion:
>> ---
>> 1. I think v15 is solving the problem exist with v13 and performance
>> is significantly high compared to base, and relation size is also
>> stable, So IMHO V15 is winner over other solution, what other thinks ?
>>
>>
> It seems so, do you have ability to reasonably test with 64 clients? I am
> mostly wondering if we see the performance going further down or just
> plateau.
>
>
Yes, that makes sense.  One more point is that if the reason for v13 giving
better performance is extra blocks (which we believe in certain cases can
leak till the time Vacuum updates the FSM tree), do you think it makes
sense to once test by increasing lockWaiters * 20 limit to may
be lockWaiters * 25 or lockWaiters * 30.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Robert Haas
On Mon, Mar 28, 2016 at 8:46 AM, Dilip Kumar  wrote:
> Found one problem with V15, so sending the new version.
> In V15 I am taking prev_blkno as targetBlock instead it should be the last
> block of the relation at that time. Attaching new patch.

 BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+prev_blkno = RelationGetNumberOfBlocks(relation);

Absolutely not.  There is no way it's acceptable to introduce an
unconditional call to RelationGetNumberOfBlocks() into every call to
RelationGetBufferForTuple().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Robert Haas
On Sun, Mar 27, 2016 at 9:51 PM, Dilip Kumar  wrote:
> I think this is better option, Since we will search last two pages of FSM
> tree, then no need to update the upper level of the FSM tree. Right ?

Well, it's less important in that case, but I think it's still worth
doing.  Some people are going to do just plain GetPageWithFreeSpace().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Petr Jelinek

On 28/03/16 14:46, Dilip Kumar wrote:


Conclusion:
---
1. I think v15 is solving the problem exist with v13 and performance
is significantly high compared to base, and relation size is also
stable, So IMHO V15 is winner over other solution, what other thinks ?



It seems so, do you have ability to reasonably test with 64 clients? I 
am mostly wondering if we see the performance going further down or just 
plateau.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 3:02 PM, Dilip Kumar  wrote:

> 1. Relation Size : No change in size, its same as base and v13
>
> 2. INSERT 1028 Byte 1000 tuple performance
> ---
> Client base v13 v15
> 1 117 124 122
> 2 111 126 123
> 4 51 128 125
> 8 43 149 135
> 16 40 217 147
> 32 35 263 141
>
> 3. COPY 1 Tuple performance.
> --
> Client base v13 v15
> 1 118 147 155
> 2 217 276 273
> 4 210 421 457
> 8 166 630 643
> 16 145 813 595
> 32 124 985 598
>
> Conclusion:
> ---
> 1. I think v15 is solving the problem exist with v13 and performance is
> significantly high compared to base, and relation size is also stable, So
> IMHO V15 is winner over other solution, what other thinks ?
>
> 2. And no point in thinking that V13 is better than V15 because, V13 has
> bug of sometime extending more than expected pages and that is uncontrolled
> and same can be the reason also of v13 performing better.
>

Found one problem with V15, so sending the new version.
In V15 I am taking prev_blkno as targetBlock instead it should be the last
block of the relation at that time. Attaching new patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 7:21 AM, Dilip Kumar  wrote:

> I agree with that conclusion.  I'm not quite sure where that leaves
>> us, though.  We can go back to v13, but why isn't that producing extra
>> pages?  It seems like it should: whenever a bulk extend rolls over to
>> a new FSM page, concurrent backends will search either the old or the
>> new one but not both.
>>
>
Our open question was why V13 is not producing extra pages, I tried to
print some logs and debug. It seems to me that,
when blocks are spilling to next FSM pages, that time all backends who are
waiting on lock will not get the block because searching in old FSM page.
But the backend which is extending the pages will set
 RelationSetTargetBlock to latest blocks, and that will make new FSM page
available for search by new requesters.

1. So this is why v13  (in normal cases*1) not producing unused pages.
2. But it will produce extra pages (which will be consumed by new
requesters), because all waiter will come one by one and extend 512 pages.


*1 : Above I have mentioned normal case, I mean there is some case exist
where V13 can leave unused page, Like one by one each waiter Get the lock
and extend the page, but no one go down till RelationSetTargetBlock so till
now new pages are not available by new requester, and time will come when
blocks will spill to third FSM page, now one by one all backends go down
and set RelationSetTargetBlock, and suppose last one set it to the block
which is in 3rd FSM page, in this case, pages in second FSM pages are
unused.


Maybe we could do this - not sure if it's what you were suggesting above:
>>
>> 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each
>> one.
>> 2. After inserting them all, go back and update the upper levels of
>> the FSM tree up the root.
>>
> I think this is better option, Since we will search last two pages of FSM
> tree, then no need to update the upper level of the FSM tree. Right ?
>
> I will test and post the result with this option.
>

I have created this patch and results are as below.

* All test scripts are same attached upthread

1. Relation Size : No change in size, its same as base and v13

2. INSERT 1028 Byte 1000 tuple performance
---
Client base v13 v15
1 117 124 122
2 111 126 123
4 51 128 125
8 43 149 135
16 40 217 147
32 35 263 141

3. COPY 1 Tuple performance.
--
Client base v13 v15
1 118 147 155
2 217 276 273
4 210 421 457
8 166 630 643
16 145 813 595
32 124 985 598

Conclusion:
---
1. I think v15 is solving the problem exist with v13 and performance is
significantly high compared to base, and relation size is also stable, So
IMHO V15 is winner over other solution, what other thinks ?

2. And no point in thinking that V13 is better than V15 because, V13 has
bug of sometime extending more than expected pages and that is uncontrolled
and same can be the reason also of v13 performing better.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-28 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 11:00 AM, Amit Kapila 
wrote:

> I have not debugged the flow, but by looking at v13 code, it looks like it
> will search both old and new.   In
> function 
> GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(),
> the basic idea of search is: Start the search from the target slot.  At
> every step, move one
> node to the right, then climb up to the parent.  Stop when we reach a node
> with enough free space (as we must, since the root has enough space).
> So shouldn't it be able to find the new FSM page where the bulk extend
> rolls over?
>

This is actually multi level tree, So each FSM page contain one slot tree.

So fsm_search_avail() is searching only the slot tree, inside one FSM page.
But we want to go to next FSM page.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Amit Kapila
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas  wrote:
>
> On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar 
wrote:
>
> > Results:
> > --
> > 1. With this performance is little less than v14 but the problem of
extra
> > relation size is solved.
> > 2. With this we can conclude that extra size of relation in v14 is
because
> > some while extending the pages, its not immediately available and at end
> > some of the pages are left unused.
>
> I agree with that conclusion.  I'm not quite sure where that leaves
> us, though.  We can go back to v13, but why isn't that producing extra
> pages?  It seems like it should: whenever a bulk extend rolls over to
> a new FSM page, concurrent backends will search either the old or the
> new one but not both.
>

I have not debugged the flow, but by looking at v13 code, it looks like it
will search both old and new.   In
function 
GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(),
the basic idea of search is: Start the search from the target slot.  At
every step, move one
node to the right, then climb up to the parent.  Stop when we reach a node
with enough free space (as we must, since the root has enough space).
So shouldn't it be able to find the new FSM page where the bulk extend
rolls over?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas  wrote:

> > One is like below-->
> > -
> > In AddExtraBlock
> > {
> >I add page to FSM one by one like v13 does.
> >then update the full FSM tree up till root
> > }
>
> Not following this.  Did you attach this version?
>

No I did not attached this.. During rough experiment, tried this, did not
produced any patch, I will send this.


> > Results:
> > --
> > 1. With this performance is little less than v14 but the problem of extra
> > relation size is solved.
> > 2. With this we can conclude that extra size of relation in v14 is
> because
> > some while extending the pages, its not immediately available and at end
> > some of the pages are left unused.
>
> I agree with that conclusion.  I'm not quite sure where that leaves
> us, though.  We can go back to v13, but why isn't that producing extra
> pages?  It seems like it should: whenever a bulk extend rolls over to
> a new FSM page, concurrent backends will search either the old or the
> new one but not both.
>
> Maybe we could do this - not sure if it's what you were suggesting above:
>
> 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each
> one.
> 2. After inserting them all, go back and update the upper levels of
> the FSM tree up the root.
>

Yes same, I wanted to explained the same above.


> Another idea is:
>
> If ConditionalLockRelationForExtension fails to get the lock
> immediately, search the last *two* pages of the FSM for a free page.
>
> Just brainstorming here.


I think this is better option, Since we will search last two pages of FSM
tree, then no need to update the upper level of the FSM tree. Right ?

I will test and post the result with this option.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Robert Haas
On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar  wrote:
> Relation Size
> -
> INSERT : 16000 transaction from 32 Client
>
>  Base  v13 v14_1
> -  -  
> TPS 37  255 112
> Rel Size   17GB 17GB 18GB
>
> COPY: 32000 transaction from 32 client
>  Base  v13v14_1
>---
> TPS 121   823427
> Rel Size   11GB11GB  11GB
>
>
> Script are attached in the mail
> =
> test_size_ins.sh  --> run insert test and calculate relation size.
> test_size_copy   --> run copy test and relation size
> copy_script  -> copy pg_bench script used by test_size_copy.sh
> insert_script --> insert pg_bench script used by test_size_ins.sh
> multi_extend_v14_poc_v1.patch --> modified patch of v14.
>
> I also tried modifying v14 from different different angle.
>
> One is like below-->
> -
> In AddExtraBlock
> {
>I add page to FSM one by one like v13 does.
>then update the full FSM tree up till root
> }

Not following this.  Did you attach this version?

> Results:
> --
> 1. With this performance is little less than v14 but the problem of extra
> relation size is solved.
> 2. With this we can conclude that extra size of relation in v14 is because
> some while extending the pages, its not immediately available and at end
> some of the pages are left unused.

I agree with that conclusion.  I'm not quite sure where that leaves
us, though.  We can go back to v13, but why isn't that producing extra
pages?  It seems like it should: whenever a bulk extend rolls over to
a new FSM page, concurrent backends will search either the old or the
new one but not both.

Maybe we could do this - not sure if it's what you were suggesting above:

1. Add the pages one at a time, and do RecordPageWithFreeSpace after each one.
2. After inserting them all, go back and update the upper levels of
the FSM tree up the root.

Another idea is:

If ConditionalLockRelationForExtension fails to get the lock
immediately, search the last *two* pages of the FSM for a free page.

Just brainstorming here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar  wrote:

> We could go further still and have GetPageWithFreeSpace() always
>> search the last, say, two pages of the FSM in all cases.  But that
>> might be expensive. The extra call to RelationGetNumberOfBlocks seems
>> cheap enough here because the alternative is to wait for a contended
>> heavyweight lock.
>>
>
> I will try the test with this also and post the results.
>

**Something went wrong in last mail, seems like become separate thread,  so
resending the same mail **

I have changed v14 as per this suggestion and results are same as v14.

I have again measured the relation size, this time directly using size
function so results are better understandable.

Relation Size
-
INSERT : 16000 transaction from 32 Client

 Base  v13 v14_1
-  -  
TPS 37  255 112
Rel Size   17GB 17GB 18GB

COPY: 32000 transaction from 32 client
 Base  v13v14_1
   ---
TPS 121   823427
Rel Size   11GB11GB  11GB


Script are attached in the mail
=
test_size_ins.sh  --> run insert test and calculate relation size.
test_size_copy   --> run copy test and relation size
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh
multi_extend_v14_poc_v1.patch --> modified patch of v14.

I also tried modifying v14 from different different angle.

One is like below-->
-
In AddExtraBlock
{
   I add page to FSM one by one like v13 does.
   then update the full FSM tree up till root
}

Results:
--
1. With this performance is little less than v14 but the problem of extra
relation size is solved.
2. With this we can conclude that extra size of relation in v14 is because
some while extending the pages, its not immediately available and at end
some of the pages are left unused.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


copy_script
Description: Binary data


insert_script
Description: Binary data


multi_extend_v14_poc_v1.patch
Description: Binary data


test_size_copy.sh
Description: Bourne shell script


test_size_ins.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar  wrote:

> search the last, say, two pages of the FSM in all cases.  But that
>> might be expensive. The extra call to RelationGetNumberOfBlocks seems
>> cheap enough here because the alternative is to wait for a contended
>> heavyweight lock.
>>
>
> I will try the test with this also and post the results.
>

I have changed v14 as per this suggestion and results are same as v14.

I have again measured the relation size, this time directly using size
function so results are better understandable.

Relation Size
-
INSERT : 16000 transaction from 32 Client

 Base  v13 v14_1
-  -  
TPS  37  255 112
Rel Size   17GB 17GB 18GB

COPY: 32000 transaction from 32 client
 Base  v13v14_1
   ---
TPS  121   823 427
Rel Size   11GB11GB   11GB


Script are attached in the mail
=
test_size_ins.sh  --> run insert test and calculate relation size.
test_size_copy   --> run copy test and relation size
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh
multi_extend_v14_poc_v1.patch --> modified patch of v14.

I also tried modifying v14 from different different angle.

One is like below-->
-
In AddExtraBlock
{
   I add page to FSM one by one like v13 does.
   then update the full FSM tree up till root
}

Results:
--
1. With this performance is little less than v14 but the problem of extra
relation size is solved.
2. With this we can conclude that extra size of relation in v14 is because
some while extending the pages, its not immediately available and at end
some of the pages are left unused.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


test_size_copy.sh
Description: Bourne shell script


test_size_ins.sh
Description: Bourne shell script


copy_script
Description: Binary data


insert_script
Description: Binary data


multi_extend_v14_poc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-26 Thread Dilip Kumar
On Sat, Mar 26, 2016 at 8:07 AM, Robert Haas  wrote:

> I think we need to start testing these patches not only in terms of
> how *fast* they are but how *large* the relation ends up being when
> we're done.  A patch that inserts the rows slower but the final
> relation is smaller may be better overall.  Can you retest v13, v14,
> and master, and post not only the timings but the relation size
> afterwards?  And maybe post the exact script you are using?
>


I have tested the size and performance, scripts are attached in the mail.

COPY 1-10 bytes tuple from 32 Clients
Base V13 V14
      -
 -
TPS 123 874 446
No. Of Tuples 14827  104998 53637
Relpages 656089 4652593 2485482
INSERT 1028 bytes Tuples From 16 Clients
Base V13 V14
     
 -
TPS 42 211 120
No. Of Tuples 5149000 25343000 14524000
Rel Pages 735577 3620765 2140612

As per above results If we calculate the tuple number of tuples and
respective relpages, then neither in v13 nor v14 there are extra unused
pages.

As per my calculation for INSERT (1028 byte tuple) each page contain 7
tuples so
number of pages required
Base: 5149000/7 = 735571  (from relpages we can see 6 pages are extra)
V13 :  25343000/7= 3620428 (from relpages we can see ~300 pages are extra).
V14 :  14524000/7= 2074857 (from relpages we can see ~7 pages are
extra).

With V14 we have found max pages number of extra pages, I expected V13 to
have max unused pages, but it's v14 and I tested it in multiple runs and
v13 is always the winner. I tested with multiple client count also like 8,
32 and v13 always have only ~60-300 extra pages out of total ~2-4 Million
Pages.


Attached files:
---
test_size_ins.sh  --> automated script to run insert test and calculate
tuple and relpages.
test_size_copy   --> automated script to run copy test and calculate tuple
and relpages.
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh



> Maybe something like this would help:
>
> if (needLock)
> {
> if (!use_fsm)
> LockRelationForExtension(relation, ExclusiveLock);
> else if (!ConditionLockRelationForExtension(relation,
> ExclusiveLock))
> {
> BlockNumber last_blkno =
> RelationGetNumberOfBlocks(relation);
>
> targetBlock = GetPageWithFreeSpaceExtended(relation,
> last_blkno, len + saveFreeSpace);
> if (targetBlock != InvalidBlockNumber)
> goto loop;
>
> LockRelationForExtension(relation, ExclusiveLock);
> targetBlock = GetPageWithFreeSpace(relation, len +
> saveFreeSpace);
> if (targetBlock != InvalidBlockNumber)
> {
> UnlockRelationForExtension(relation, ExclusiveLock);
> goto loop;
> }
> RelationAddExtraBlocks(relation, bistate);
> }
> }
>
> I think this is better than what you had before with lastValidBlock,
> because we're actually interested in searching the free space map at
> the *very end* of the relation, not wherever the last target block
> happens to have been.
>
> We could go further still and have GetPageWithFreeSpace() always
> search the last, say, two pages of the FSM in all cases.  But that
> might be expensive. The extra call to RelationGetNumberOfBlocks seems
> cheap enough here because the alternative is to wait for a contended
> heavyweight lock.
>

I will try the test with this also and post the results.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


copy_script
Description: Binary data


insert_script
Description: Binary data


test_size_copy.sh
Description: Bourne shell script


test_size_ins.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-25 Thread Robert Haas
On Fri, Mar 25, 2016 at 1:05 PM, Dilip Kumar  wrote:
> On Fri, Mar 25, 2016 at 3:04 AM, Robert Haas  wrote:
>> 1. Callers who use GetPageWithFreeSpace() rather than
>> GetPageFreeSpaceExtended() will fail to find the new pages if the
>> upper map levels haven't been updated by VACUUM.
>>
>> 2. Even callers who use GetPageFreeSpaceExtended() may fail to find
>> the new pages.  This can happen in two separate ways, namely (a) the
>
> Yeah, that's the issue, if extended pages spills to next FSM page, then
> other waiters will not find those page, and one by one all waiters will end
> up adding extra pages.
> for example,  if there are ~30 waiters then
> total blocks extended = (25(25+1)/2) *20 =~ 6500 pages.
>
> This is not the case every time but whenever heap block go to new FSM page
> this will happen.

I think we need to start testing these patches not only in terms of
how *fast* they are but how *large* the relation ends up being when
we're done.  A patch that inserts the rows slower but the final
relation is smaller may be better overall.  Can you retest v13, v14,
and master, and post not only the timings but the relation size
afterwards?  And maybe post the exact script you are using?

> performance of patch v14 is significantly low compared to v13, mainly I
> guess below reasons
> 1. As per above calculation v13 extend ~6500 block (after every 8th extend),
> and that's why it's performing well.

That should be completely unnecessary, though.  I mean, if the problem
is that it's expensive to repeatedly acquire and release the relation
extension lock, then bulk-extending even 100 blocks at a time should
be enough to fix that, because you've reduced the number of times that
has to be done by 99%. There's no way we should need to extend by
thousands of blocks to get good performance.

Maybe something like this would help:

if (needLock)
{
if (!use_fsm)
LockRelationForExtension(relation, ExclusiveLock);
else if (!ConditionLockRelationForExtension(relation, ExclusiveLock))
{
BlockNumber last_blkno = RelationGetNumberOfBlocks(relation);

targetBlock = GetPageWithFreeSpaceExtended(relation,
last_blkno, len + saveFreeSpace);
if (targetBlock != InvalidBlockNumber)
goto loop;

LockRelationForExtension(relation, ExclusiveLock);
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
if (targetBlock != InvalidBlockNumber)
{
UnlockRelationForExtension(relation, ExclusiveLock);
goto loop;
}
RelationAddExtraBlocks(relation, bistate);
}
}

I think this is better than what you had before with lastValidBlock,
because we're actually interested in searching the free space map at
the *very end* of the relation, not wherever the last target block
happens to have been.

We could go further still and have GetPageWithFreeSpace() always
search the last, say, two pages of the FSM in all cases.  But that
might be expensive. The extra call to RelationGetNumberOfBlocks seems
cheap enough here because the alternative is to wait for a contended
heavyweight lock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-25 Thread Dilip Kumar
On Fri, Mar 25, 2016 at 3:04 AM, Robert Haas  wrote:

> 1. Callers who use GetPageWithFreeSpace() rather than
> GetPageFreeSpaceExtended() will fail to find the new pages if the
> upper map levels haven't been updated by VACUUM.
>
> 2. Even callers who use GetPageFreeSpaceExtended() may fail to find
> the new pages.  This can happen in two separate ways, namely (a) the
>

Yeah, that's the issue, if extended pages spills to next FSM page, then
other waiters will not find those page, and one by one all waiters will end
up adding extra pages.
for example,  if there are ~30 waiters then
total blocks extended = (25(25+1)/2) *20 =~ 6500 pages.

This is not the case every time but whenever heap block go to new FSM page
this will happen.

- FSM page case hold 4096 heap blk info, so after every 8th extend (assume
512 block will extend in one time), it will extend ~6500 pages

- Any new request to RelationGetBufferForTuple will be able to find those
page, because by that time the backend which is extending the page would
have set new block using RelationSetTargetBlock.
(there are still chances that some blocks can be completely left unused,
until vacuum comes).

I have changed the patch as per the suggestion (just POC because
performance number are not that great)

Below is the performance number comparison of base, previous patch(v13) and
latest patch (v14).

performance of patch v14 is significantly low compared to v13, mainly I
guess below reasons
1. As per above calculation v13 extend ~6500 block (after every 8th
extend), and that's why it's performing well.

2. In v13 as soon as we extend the block we add to FSM so immediately
available for new requester,  (In this patch also I tried to add one by one
to FSM and updated fsm tree till root after all pages added to FSM, but no
significant improvement).

3.  fsm_update_recursive doesn't seems like problem to me. does it ?


Copy 1 tuples, of 4 bytes each..
-
Client base   patch v13   patch  v14
1   118  147126
2   217  276269
4   210  421347
8   166  630375
16 145  813415
32 124  985451
64 974455

Insert 1000 tuples of 1K size each.

Clientbase  patch v13 patch v14
1  117 124119
2  111 126119
4   51  128124
8   43  149131
16 40  217120
32   263115
64   248109

Note: I think one thread number can be just run to run variance..

Does anyone see problem in updating the FSM tree, I have debugged and saw
that we are able to get the pages properly from tree and same is visible in
performance number of v14 compared to base.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..9ac1eb7 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,62 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * Extend a relation by multiple blocks to avoid future contention on the
+ * relation extension lock.  Our goal is to pre-extend the relation by an
+ * amount which ramps up as the degree of contention ramps up, but limiting
+ * the result to some sane overall value.
+ */
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	BlockNumber blockNum,
+firstBlock,
+lastBlock;
+	int			extraBlocks = 0;
+	int			lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * We use the length of the lock wait queue to judge how much to extend.
+	 * It might seem like multiplying the number of lock waiters by as much
+	 * as 20 is too aggressive, but benchmarking revealed that smaller numbers
+	 * were insufficient.  512 is just an arbitrary cap to prevent pathological
+	 * results (and excessive wasted disk space).
+	 */
+	lockWaiters = RelationExtensionLockWaiterCount(relation);
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	firstBlock = lastBlock = InvalidBlockNumber;
+
+	while (extraBlocks-- >= 0)
+	{
+		/* Ouch - an unnecessary lseek() each time through the loop! */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		/* Extend by one page. */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+		UnlockReleaseBuffer(buffer);
+
+		if (firstBlock == InvalidBlockNumber)
+			firstBlock = blockNum;
+
+		lastBlock = blockNum;
+	}
+
+	/*
+	 * Put the page in the freespace map so other backends can find it.
+	 * This is what will keep those 

Re: [HACKERS] Relation extension scalability

2016-03-24 Thread Robert Haas
On Thu, Mar 24, 2016 at 7:17 AM, Dilip Kumar  wrote:
>> Yet another possibility could be to call it as
>> GetPageWithFreeSpaceExtended and call it from GetPageWithFreeSpace with
>> value of oldPage as InvalidBlockNumber.
>
> Yes I like this.. Changed the same.

After thinking about this some more, I don't think this is the right
approach.  I finally understand what's going on here:
RecordPageWithFreeSpace updates the FSM lazily, only adjusting the
leaves and not the upper levels.  It relies on VACUUM to update the
upper levels.  This seems like it might be a bad policy in general,
because VACUUM on a very large relation may be quite infrequent, and
you could lose track of a lot of space for a long time, leading to a
lot of extra bloat.  However, it's a particularly bad policy for bulk
relation extension, because you're stuffing a large number of totally
free pages in there in a way that doesn't make them particularly easy
for anybody else to discover.  There are two ways we can fail here:

1. Callers who use GetPageWithFreeSpace() rather than
GetPageFreeSpaceExtended() will fail to find the new pages if the
upper map levels haven't been updated by VACUUM.

2. Even callers who use GetPageFreeSpaceExtended() may fail to find
the new pages.  This can happen in two separate ways, namely (a) the
lastValidBlock saved by RelationGetBufferForTuple() can be in the
middle of the relation someplace rather than near the end, or (b) the
bulk-extension performed by some other backend can have overflowed
onto some new FSM page that won't be searched even though a relatively
plausible lastValidBlock was passed.

It seems to me that since we're adding a whole bunch of empty pages at
once, it's worth the effort to update the upper levels of the FSM.
This isn't a case of discovering a single page with an extra few bytes
of storage available due to a HOT prune or something - this is a case
of putting at least 20 and plausibly hundreds of extra pages into the
FSM.   The extra effort to update the upper FSM pages is trivial by
comparison with the cost of extending the relation by many blocks.

So, I suggest adding a new function FreeSpaceMapBulkExtend(BlockNumber
first_block, BlockNumber last_block) which sets all the FSM entries
for pages between first_block and last_block to 255 and then bubbles
that up to the higher levels of the tree and all the way to the root.
Have the bulk extend code use that instead of repeatedly calling
RecordPageWithFreeSpace.  That should actually be much more efficient,
because it can call fsm_readbuf(), LockBuffer(), and
UnlockReleaseBuffer() just once per FSM page instead of once per FSM
page *per byte modified*.  Maybe that makes no difference in practice,
but it can't hurt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-24 Thread Dilip Kumar
On Thu, Mar 24, 2016 at 6:13 PM, Amit Kapila 
wrote:

>
> 1.
> +GetPageWithFreeSpaceUsingOldPage(Relation rel, BlockNumber oldPage,
> + Size spaceNeeded)
> {
> ..
> + /*
> + * If fsm_set_and_search found a suitable new block, return that.
> + * Otherwise, search as usual.
> + */
> ..
> }
>
> In the above comment, you are referring wrong function.
>

Fixed

>
> 2.
> +static int
> +fsm_search_from_addr(Relation rel, FSMAddress addr, uint8 minValue)
> +{
> + Buffer buf;
> + int newslot = -1;
> +
> + buf = fsm_readbuf(rel, addr, true);
> + LockBuffer(buf, BUFFER_LOCK_SHARE);
> +
> + if (minValue != 0)
> + {
> + /* Search while we still hold the lock */
> + newslot = fsm_search_avail(buf, minValue,
> +   addr.level == FSM_BOTTOM_LEVEL,
> +   false);
>
> In this new API, I don't understand why we need minValue != 0 check,
> basically if user of API doesn't want to search for space > 0, then what is
> the need of calling this API?  I think this API should use Assert for
> minValue!=0 unless you see reason for not doing so.
>
>
Agree, it should be assert.


> >
> > GetNearestPageWithFreeSpace? (although not sure that's accurate
> description, maybe Nearby would be better)
> >
>
> Better than what is used in patch.
>
> Yet another possibility could be to call it as
> GetPageWithFreeSpaceExtended and call it from GetPageWithFreeSpace with
> value of oldPage as InvalidBlockNumber.
>

Yes I like this.. Changed the same.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-24 Thread Amit Kapila
On Thu, Mar 24, 2016 at 1:48 PM, Petr Jelinek  wrote:
>
> On 24/03/16 07:04, Dilip Kumar wrote:
>>
>>
>> On Thu, Mar 24, 2016 at 10:44 AM, Robert Haas > > wrote:
>>
>> On Wed, Mar 23, 2016 at 9:43 PM, Amit Kapila
>> > wrote:
>> > RecordAndGetPageWithFreeSpace() tries to search from the oldPage
passed to
>> > it, rather than from top, so even if  RecordPageWithFreeSpace()
doesn't
>> > update till root, it will be able to search the newly added page.
I agree
>> > with whatever you have said in another mail that we should
introduce a new
>> > API to do a more targeted search for such cases.
>>
>> OK, let's do that, then.
>>
>>
>> Ok, I have added new API which just find the free block from and start
>> search from last given page.
>>
>> 1. I have named the new function as GetPageWithFreeSpaceUsingOldPage, I
>> don't like this name, but I could not come up with any better, Please
>> suggest one.
>>
>

1.
+GetPageWithFreeSpaceUsingOldPage(Relation rel, BlockNumber oldPage,
+ Size spaceNeeded)
{
..
+ /*
+ * If fsm_set_and_search found a suitable new block, return that.
+ * Otherwise, search as usual.
+ */
..
}

In the above comment, you are referring wrong function.

2.
+static int
+fsm_search_from_addr(Relation rel, FSMAddress addr, uint8 minValue)
+{
+ Buffer buf;
+ int newslot = -1;
+
+ buf = fsm_readbuf(rel, addr, true);
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+
+ if (minValue != 0)
+ {
+ /* Search while we still hold the lock */
+ newslot = fsm_search_avail(buf, minValue,
+   addr.level == FSM_BOTTOM_LEVEL,
+   false);

In this new API, I don't understand why we need minValue != 0 check,
basically if user of API doesn't want to search for space > 0, then what is
the need of calling this API?  I think this API should use Assert for
minValue!=0 unless you see reason for not doing so.

>
> GetNearestPageWithFreeSpace? (although not sure that's accurate
description, maybe Nearby would be better)
>

Better than what is used in patch.

Yet another possibility could be to call it as GetPageWithFreeSpaceExtended
and call it from GetPageWithFreeSpace with value of oldPage
as InvalidBlockNumber.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-24 Thread Petr Jelinek

On 24/03/16 07:04, Dilip Kumar wrote:


On Thu, Mar 24, 2016 at 10:44 AM, Robert Haas > wrote:

On Wed, Mar 23, 2016 at 9:43 PM, Amit Kapila
> wrote:
> RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed to
> it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
> update till root, it will be able to search the newly added page.  I agree
> with whatever you have said in another mail that we should introduce a new
> API to do a more targeted search for such cases.

OK, let's do that, then.


Ok, I have added new API which just find the free block from and start
search from last given page.

1. I have named the new function as GetPageWithFreeSpaceUsingOldPage, I
don't like this name, but I could not come up with any better, Please
suggest one.



GetNearestPageWithFreeSpace? (although not sure that's accurate 
description, maybe Nearby would be better)


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-24 Thread Dilip Kumar
On Thu, Mar 24, 2016 at 10:44 AM, Robert Haas  wrote:

> On Wed, Mar 23, 2016 at 9:43 PM, Amit Kapila 
> wrote:
> > RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed
> to
> > it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
> > update till root, it will be able to search the newly added page.  I
> agree
> > with whatever you have said in another mail that we should introduce a
> new
> > API to do a more targeted search for such cases.
>
> OK, let's do that, then.


Ok, I have added new API which just find the free block from and start
search from last given page.

1. I have named the new function as GetPageWithFreeSpaceUsingOldPage, I
don't like this name, but I could not come up with any better, Please
suggest one.

And also body of GetPageWithFreeSpaceUsingOldPage looks almost similar to
RecordAndGetPageWithFreeSpace, I tried to merge these two but for that we
need to pass extra parameter to the function.

2. I also had to write one more function *fsm_search_from_addr *instead of
using* fsm_set_and_search. *So that we can find block without updating the
other slot.

I have done performance test just to ensure the result. And performance is
same as old. with both COPY and INSERT.

3. I have also run pgbench read-write what amit suggested upthread.. No
regression or improvement with pgbench workload.

Client   basePatch
1 899 914
8 5397 5413
32 18170 18052
64 29850 29941

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 9:43 PM, Amit Kapila  wrote:
> RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed to
> it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
> update till root, it will be able to search the newly added page.  I agree
> with whatever you have said in another mail that we should introduce a new
> API to do a more targeted search for such cases.

OK, let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Amit Kapila
On Thu, Mar 24, 2016 at 12:09 AM, Robert Haas  wrote:
>
> On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek 
wrote:
>
> I've read this over several times and looked at
> RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
> if the lock was acquired by some other backend which did
> RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
> whole point.
>

It doesn't update the FSM uptill root in some cases, as per comments on top
of RecordPageWithFreeSpace and the code as well.

>
>   Second, if the other backend extended the relation in
> some other manner and did not extend the FSM, how does calling
> RecordAndGetPageWithFreeSpace help?  As far as I can see,
> GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
> searching the FSM, so if one is stymied the other will be too.  What
> am I missing?
>

RecordAndGetPageWithFreeSpace() tries to search from the oldPage passed to
it, rather than from top, so even if  RecordPageWithFreeSpace() doesn't
update till root, it will be able to search the newly added page.  I agree
with whatever you have said in another mail that we should introduce a new
API to do a more targeted search for such cases.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:43, Robert Haas wrote:

On Wed, Mar 23, 2016 at 3:33 PM, Petr Jelinek  wrote:

Btw thinking about it some more, ISTM that not finding the block and just
doing the extension if the FSM wasn't extended correctly previously is
probably cleaner behavior than what we do now. The reasoning for that
opinion is that if the FSM wasn't extended, we'll fix it by doing relation
extension since we know we do both in this code path and also if we could
not find page before we'll most likely not find one even on retry and if
there was page added at the end by extension that we might reuse partially
here then there is no harm in adding new one anyway as the whole point of
this patch is that it does bigger extension that strictly necessary so
insisting on page reuse for something that seems like only theoretical
possibility that does not even exist in current code does not seem right.


I'm not sure I completely follow this.  The fact that the last
sentence is 9 lines long may be related.  :-)



I tend to do that sometimes :)


I think it's pretty clearly important to re-check the FSM after
acquiring the extension lock.  Otherwise, imagine that 25 backends
arrive at the exact same time.  The first one gets the lock and
extends the relation 500 pages; the next one, 480, and so on.  In
total, they extend the relation by 6500 pages, which is a bit rich.
Rechecking the FSM after acquiring the lock prevents that from
happening, and that's a very good thing.  We'll do one 500-page
extension and that's it.



Right, but that would only happen if all the backends did it using 
different code which does not do the FSM extension because the current 
code does FSM extension and the point of using 
RecordAndGetPageWithFreeSpace seems to be "just in case" somebody is 
doing extension differently (at least I don't see other reason). So 
basically I am not saying we shouldn't do the search but that I agree 
GetPageWithFreeSpace should be enough as the worst that can happen is 
that we overextend the relation in case some theoretical code from 
somewhere else also did extension of relation without extending FSM 
(afaics).


But maybe Dilip had some other reason for using the 
RecordAndGetPageWithFreeSpace that is not documented.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 3:33 PM, Petr Jelinek  wrote:
> Btw thinking about it some more, ISTM that not finding the block and just
> doing the extension if the FSM wasn't extended correctly previously is
> probably cleaner behavior than what we do now. The reasoning for that
> opinion is that if the FSM wasn't extended, we'll fix it by doing relation
> extension since we know we do both in this code path and also if we could
> not find page before we'll most likely not find one even on retry and if
> there was page added at the end by extension that we might reuse partially
> here then there is no harm in adding new one anyway as the whole point of
> this patch is that it does bigger extension that strictly necessary so
> insisting on page reuse for something that seems like only theoretical
> possibility that does not even exist in current code does not seem right.

I'm not sure I completely follow this.  The fact that the last
sentence is 9 lines long may be related.  :-)

I think it's pretty clearly important to re-check the FSM after
acquiring the extension lock.  Otherwise, imagine that 25 backends
arrive at the exact same time.  The first one gets the lock and
extends the relation 500 pages; the next one, 480, and so on.  In
total, they extend the relation by 6500 pages, which is a bit rich.
Rechecking the FSM after acquiring the lock prevents that from
happening, and that's a very good thing.  We'll do one 500-page
extension and that's it.

However, I don't think using RecordAndGetPageWithFreeSpace rather than
GetPageWithFreeSpace is appropriate.  We've already recorded free
space on that page, and recording it again is a bad idea.  It's quite
possible that by the time we get the lock our old value is totally
inaccurate.  If there's some advantage to searching in the more
targeted way that RecordAndGetPageWithFreeSpace does over
GetPageWithFreeSpace then we need a new API into the freespace stuff
that does the more targeted search without updating anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:19, Petr Jelinek wrote:

On 23/03/16 20:01, Robert Haas wrote:

On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek 
wrote:

Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.



I am bit confused as to what exactly you are saying, but what will
happen is we get back to the while cycle and try again so eventually we
should find either block with enough free space or add new one (not sure
if this would actually ever happen in practice in heavily concurrent
workload where the FSM would not be correctly extended during relation
extension though, we might just loop here forever).



Btw thinking about it some more, ISTM that not finding the block and 
just doing the extension if the FSM wasn't extended correctly previously 
is probably cleaner behavior than what we do now. The reasoning for that 
opinion is that if the FSM wasn't extended, we'll fix it by doing 
relation extension since we know we do both in this code path and also 
if we could not find page before we'll most likely not find one even on 
retry and if there was page added at the end by extension that we might 
reuse partially here then there is no harm in adding new one anyway as 
the whole point of this patch is that it does bigger extension that 
strictly necessary so insisting on page reuse for something that seems 
like only theoretical possibility that does not even exist in current 
code does not seem right.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 20:01, Robert Haas wrote:

On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek  wrote:

Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.



I am bit confused as to what exactly you are saying, but what will 
happen is we get back to the while cycle and try again so eventually we 
should find either block with enough free space or add new one (not sure 
if this would actually ever happen in practice in heavily concurrent 
workload where the FSM would not be correctly extended during relation 
extension though, we might just loop here forever).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Wed, Mar 23, 2016 at 2:52 PM, Petr Jelinek  wrote:
>> Second, if the other backend extended the relation in
>> some other manner and did not extend the FSM, how does calling
>> RecordAndGetPageWithFreeSpace help?  As far as I can see,
>> GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
>> searching the FSM, so if one is stymied the other will be too.  What
>> am I missing?
>>
>
> The RecordAndGetPageWithFreeSpace will extend FSM as it calls
> fsm_set_and_search which in turn calls fsm_readbuf with extend = true.

So how does that help?  If I'm reading this right, the new block will
be all zeroes which means no space available on any of those pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Petr Jelinek

On 23/03/16 19:39, Robert Haas wrote:

On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek  wrote:

I also think the code simplicity makes this worth it.


Agreed.  I went over this patch and did a cleanup pass today.   I
discovered that the LockWaiterCount() function was broken if you try
to tried to use it on a lock that you didn't hold or a lock that you
held in any mode other than exclusive, so I tried to fix that.  I
rewrote a lot of the comments and tightened some other things up.  The
result is attached.

I'm baffled by the same code Amit asked about upthread, even though
there's now a comment:

+   /*
+* Here we are calling
RecordAndGetPageWithFreeSpace
+* instead of GetPageWithFreeSpace,
because other backend
+* who have got the lock might have
added extra blocks in
+* the FSM and its possible that FSM
tree might not have
+* been updated so far and we will not
get the page by
+* searching from top using
GetPageWithFreeSpace, so we
+* need to search the leaf page
directly using our last
+* valid target block.
+*
+* XXX. I don't understand what is
happening here. -RMH
+*/

I've read this over several times and looked at
RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
if the lock was acquired by some other backend which did
RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
whole point.


That's good point, maybe this coding is bit too defensive.


Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?



The RecordAndGetPageWithFreeSpace will extend FSM as it calls 
fsm_set_and_search which in turn calls fsm_readbuf with extend = true.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-23 Thread Robert Haas
On Tue, Mar 22, 2016 at 1:12 PM, Petr Jelinek  wrote:
> I also think the code simplicity makes this worth it.

Agreed.  I went over this patch and did a cleanup pass today.   I
discovered that the LockWaiterCount() function was broken if you try
to tried to use it on a lock that you didn't hold or a lock that you
held in any mode other than exclusive, so I tried to fix that.  I
rewrote a lot of the comments and tightened some other things up.  The
result is attached.

I'm baffled by the same code Amit asked about upthread, even though
there's now a comment:

+   /*
+* Here we are calling
RecordAndGetPageWithFreeSpace
+* instead of GetPageWithFreeSpace,
because other backend
+* who have got the lock might have
added extra blocks in
+* the FSM and its possible that FSM
tree might not have
+* been updated so far and we will not
get the page by
+* searching from top using
GetPageWithFreeSpace, so we
+* need to search the leaf page
directly using our last
+* valid target block.
+*
+* XXX. I don't understand what is
happening here. -RMH
+*/

I've read this over several times and looked at
RecordAndGetPageWithFreeSpace() and I'm still confused.  First of all,
if the lock was acquired by some other backend which did
RelationAddExtraBlocks(), it *will* have updated the FSM - that's the
whole point.  Second, if the other backend extended the relation in
some other manner and did not extend the FSM, how does calling
RecordAndGetPageWithFreeSpace help?  As far as I can see,
GetPageWithFreeSpace and RecordAndGetPageWithFreeSpace are both just
searching the FSM, so if one is stymied the other will be too.  What
am I missing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..3ab911f 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,55 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * Extend a relation by multiple blocks to avoid future contention on the
+ * relation extension lock.  Our goal is to pre-extend the relation by an
+ * amount which ramps up as the degree of contention ramps up, but limiting
+ * the result to some sane overall value.
+ */
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int			lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * We use the length of the lock wait queue to judge how much to extend.
+	 * It might seem like multiplying the number of lock waiters by as much
+	 * as 20 is too aggressive, but benchmarking revealed that smaller numbers
+	 * were insufficient.  512 is just an arbitrary cap to prevent pathological
+	 * results (and excessive wasted disk space).
+	 */
+	lockWaiters = RelationExtensionLockWaiterCount(relation);
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	while (extraBlocks-- >= 0)
+	{
+		/* Ouch - an unnecessary lseek() each time through the loop! */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		/* Extend by one page. */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+		UnlockReleaseBuffer(buffer);
+
+		/*
+		 * Put the page in the freespace map so other backends can find it.
+		 * This is what will keep those other backends from also queueing up
+		 * on the relation extension lock.
+		 */
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +282,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +358,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -388,6 +439,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  otherBlock, targetBlock, vmbuffer_other,
  vmbuffer);
 
+		lastValidBlock = targetBlock;
+
 		/*
 		 * Now we can 

Re: [HACKERS] Relation extension scalability

2016-03-22 Thread Petr Jelinek

On 22/03/16 10:15, Dilip Kumar wrote:


On Mon, Mar 21, 2016 at 8:10 PM, Amit Kapila > wrote:
11.
+{
+/*
+* First try to get the lock in no-wait mode, if succeed extend one
+
  * block, else get the lock in normal mode and after we get the lock
-LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+buffer =
RelationAddOneBlock(relation, otherBuffer, bistate);

Won't this cause one extra block addition after backend extends the
relation for multiple blocks, what is the need of same?


This is the block for this backend, Extra extend for future request and
already added to FSM. I could have added this count along with extra
block in RelationAddExtraBlocks, But In that case I need to put some
extra If for saving one buffer for this bakend and then returning that
the specific buffer to caller, and In caller also need to distinguish
between who wants to add one block or who have got one block added in
along with extra block.

I think this way code is simple.. That everybody comes down will add one
block for self use. and all other functionality and logic is above, i.e.
wether to take lock or not, whether to add extra blocks or not..




I also think the code simplicity makes this worth it.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-22 Thread Dilip Kumar
On Mon, Mar 21, 2016 at 8:10 PM, Amit Kapila 
wrote:

> Review comments:
>
>
Thanks for the review, Please find my response inline..


> 1.
>  /*
> + * RelationAddOneBlock
> + *
> + * Extend relation by one block and lock the buffer
> + */
> +static Buffer
> +RelationAddOneBlock(Relation relation, Buffer otherBuffer,
> BulkInsertState bistate)
>
> Shall we mention in comments that this API returns locked buffer and it's
> the responsibility of caller to unlock it.
>

Fixed


> 2.
> + /* To avoid the cases when there are huge number of lock waiters, and
> + * extend file size by big amount at a
> time, put some limit on the
>
> first line in multi-line comments should not contain anything.
>

Fixed


>
> 3.
> + extraBlocks = Min(512, lockWaiters * 20);
>
> I think we should explain in comments about the reason of choosing 20 in
> above calculation.
>

Fixed, Just check explanation is enough or we need to add something more ?


>
> 4. Sometime back [1], you agreed on doing some analysis for the overhead
> that XLogFlush can cause during buffer eviction,  but I don't see the
> results of same, are you planing to complete the same?
>

Ok, I will test this..


>
> 5.
> + if (LOCKACQUIRE_OK
> + != RelationExtensionLockConditional(relation,
> ExclusiveLock))
>
> I think the coding style is to keep constant on right side of condition,
> did you see any other place in code which uses the check in a similar way?
>

Fixed,

Not sure about any other place. (This is what I used to follow to keep
constant on left side to avoid the cases, where instead == by mistake if we
have given =, then it will do assignment instead throwing error).

But In PG style constant should be on right, so I will take care.


>
> 6.
> - /*
> - * Now acquire lock on the new page.
> + /* For all case we need to add at least one block to satisfy
> our
> + * own request.
>   */
>
> Same problem as in point 2.
>

Fixed.

>
> 7.
> @@ -472,6 +581,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
>   if (needLock)
>
> UnlockRelationForExtension(relation, ExclusiveLock);
>
> +
> +
>
> Spurious newline addition.
>

Fixed

>
> 8.
> +int
> +RelationExtensionLockWaiter(Relation relation)
>
> How about naming this function as RelationExtensionLockWaiterCount?
>
>
Done



> 9.
> + /* Other waiter has extended the block for us*/
>
> Provide an extra space before ending the comment.
>

Fixed

>
> 10.
> + if (use_fsm)
> + {
> + if (lastValidBlock !=
> InvalidBlockNumber)
> + {
> + targetBlock =
> RecordAndGetPageWithFreeSpace(relation,
> +
> lastValidBlock,
> +
> pageFreeSpace,
> +
> len + saveFreeSpace);
> + }
>
> Are you using RecordAndGetPageWithFreeSpace() instead of
> GetPageWithFreeSpace() to get the page close to the previous target page?
> If yes, then do you see enough benefit of the same that it can compensate
> the additional write operation which Record* API might cause due to
> additional dirtying of buffer?
>

Here we are calling RecordAndGetPageWithFreeSpace instead of
GetPageWithFreeSpace, because other backend who have got the lock might
have added extra block  in the FSM and its possible that FSM tree might not
have been updated so far and we will not get the page by searching from top
using GetPageWithFreeSpace, so we need to search the leaf page directly
using our last valid target block.

Explained same in the comments...


> 11.
> + {
> + /*
> + * First try to get the lock in no-wait mode, if succeed extend one
> +
>  * block, else get the lock in normal mode and after we get the lock
> - LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
> + buffer =
> RelationAddOneBlock(relation, otherBuffer, bistate);
>
> Won't this cause one extra block addition after backend extends the
> relation for multiple blocks, what is the need of same?
>

This is the block for this backend, Extra extend for future request and
already added to FSM. I could have added this count along with extra block
in RelationAddExtraBlocks, But In that case I need to put some extra If for
saving one buffer for this bakend and then returning that the specific
buffer to caller, and In caller also need to distinguish between who wants
to add one block or who have got one block added in along with extra block.

I think this way code is simple.. That everybody comes down will add one
block for self use. and all other functionality and logic is above, i.e.
wether to take lock or not, whether to add extra blocks or not..


>
> 12. I think it is good to once test pgbench read-write tests to ensure
> that this doesn't introduce any new regression.
>

I will test this and post the results..


Latest patch attached..


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-21 Thread Amit Kapila
On Fri, Mar 18, 2016 at 2:38 PM, Dilip Kumar  wrote:
>
>
> On Thu, Mar 17, 2016 at 1:31 PM, Petr Jelinek 
wrote:
>>
>> Great.
>>
>> Just small notational thing, maybe this would be simpler?:
>> extraBlocks = Min(512, lockWaiters * 20);
>
>
> Done, new patch attached.
>

Review comments:

1.
 /*
+ * RelationAddOneBlock
+ *
+ * Extend relation by one block and lock the buffer
+ */
+static Buffer
+RelationAddOneBlock(Relation relation, Buffer otherBuffer, BulkInsertState
bistate)

Shall we mention in comments that this API returns locked buffer and it's
the responsibility of caller to unlock it.

2.
+ /* To avoid the cases when there are huge number of lock waiters, and
+ * extend file size by big amount at a
time, put some limit on the

first line in multi-line comments should not contain anything.

3.
+ extraBlocks = Min(512, lockWaiters * 20);

I think we should explain in comments about the reason of choosing 20 in
above calculation.

4. Sometime back [1], you agreed on doing some analysis for the overhead
that XLogFlush can cause during buffer eviction,  but I don't see the
results of same, are you planing to complete the same?

5.
+ if (LOCKACQUIRE_OK
+ != RelationExtensionLockConditional(relation,
ExclusiveLock))

I think the coding style is to keep constant on right side of condition,
did you see any other place in code which uses the check in a similar way?

6.
- /*
- * Now acquire lock on the new page.
+ /* For all case we need to add at least one block to satisfy
our
+ * own request.
  */

Same problem as in point 2.

7.
@@ -472,6 +581,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  if (needLock)

UnlockRelationForExtension(relation, ExclusiveLock);

+
+

Spurious newline addition.

8.
+int
+RelationExtensionLockWaiter(Relation relation)

How about naming this function as RelationExtensionLockWaiterCount?

9.
+ /* Other waiter has extended the block for us*/

Provide an extra space before ending the comment.

10.
+ if (use_fsm)
+ {
+ if (lastValidBlock !=
InvalidBlockNumber)
+ {
+ targetBlock =
RecordAndGetPageWithFreeSpace(relation,
+
lastValidBlock,
+
pageFreeSpace,
+
len + saveFreeSpace);
+ }

Are you using RecordAndGetPageWithFreeSpace() instead of
GetPageWithFreeSpace() to get the page close to the previous target page?
If yes, then do you see enough benefit of the same that it can compensate
the additional write operation which Record* API might cause due to
additional dirtying of buffer?

11.
+ {
+ /*
+ * First try to get the lock in no-wait mode, if succeed extend one
+
 * block, else get the lock in normal mode and after we get the lock
+ * extend some extra blocks, extra
blocks will be added to satisfy
+ * request of other waiters and avoid future contention. Here instead
+
* of directly taking lock we try no-wait mode, this is to handle the
+ * case, when there is no
contention - it should not find the lock
+ * waiter and execute extra instructions.
+ */
+
if (LOCKACQUIRE_OK
+ != RelationExtensionLockConditional(relation, ExclusiveLock))
+
{
+ LockRelationForExtension(relation, ExclusiveLock);
+
+ if (use_fsm)
+
{
+ if (lastValidBlock != InvalidBlockNumber)
+
{
+ targetBlock = RecordAndGetPageWithFreeSpace(relation,
+
lastValidBlock,
+
pageFreeSpace,
+
len + saveFreeSpace);
+
}
+
+ /* Other waiter has extended the block for us*/
+
if (targetBlock != InvalidBlockNumber)
+ {
+
UnlockRelationForExtension(relation, ExclusiveLock);
+ goto loop;
+
}
+
+ RelationAddExtraBlocks(relation, bistate);
+ }
+
}
+ }

- /*
- * Now acquire lock on the new page.
+ /* For all case we need to add at least one block to
satisfy our
+ * own request.
  */
- LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+ buffer =
RelationAddOneBlock(relation, otherBuffer, bistate);

Won't this cause one extra block addition after backend extends the
relation for multiple blocks, what is the need of same?

12. I think it is good to once test pgbench read-write tests to ensure that
this doesn't introduce any new regression.


[1] -
http://www.postgresql.org/message-id/caa4ek1lonxz4qa_dquqbanspxsctjxrkexjii8h3gnd9z8u...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-19 Thread Dilip Kumar
On Mon, Mar 14, 2016 at 8:26 AM, Petr Jelinek  wrote:

> Well any value we choose will be very arbitrary. If we look at it from the
> point of maximum absolute disk space we allocate for relation at once,
> the 4MB limit would represent 2.5 orders of magnitude change. That sounds
> like enough for one release cycle, I think we can further tune it if the
> need arises in next one. (with my love for round numbers I would have
> suggested 8MB as that's 3 orders of magnitude, but I am fine with 4MB as
> well)
>

I have modified the patch, this contains the max limit on extra pages,
512(4MB) pages is the max limit.

I have measured the performance also and that looks equally good.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..fcd42b9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,94 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * RelationAddOneBlock
+ *
+ * Extend relation by one block and lock the buffer
+ */
+static Buffer
+RelationAddOneBlock(Relation relation, Buffer otherBuffer, BulkInsertState bistate)
+{
+	Buffer buffer;
+	/*
+	 * XXX This does an lseek - rather expensive - but at the moment it is the
+	 * only way to accurately determine how many blocks are in a relation.  Is
+	 * it worth keeping an accurate file length in shared memory someplace,
+	 * rather than relying on the kernel to do it for us?
+	 */
+	buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+	/*
+	 * We can be certain that locking the otherBuffer first is OK, since
+	 * it must have a lower page number.
+	 */
+	if (otherBuffer != InvalidBuffer)
+		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Now acquire lock on the new page.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	return buffer;
+}
+
+/*
+ * RelationAddExtraBlocks
+ *
+ * Extend extra blocks for the relations to avoid the future contention
+ * on the relation extension lock.
+ */
+
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int 		lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * For calculating number of extra blocks to extend, find the level
+	 * of contention on this lock, by getting the requester of this lock
+	 * and add extra blocks in multiple of waiters.
+	 */
+	lockWaiters = RelationExtensionLockWaiter(relation);
+
+	extraBlocks = lockWaiters * 20;
+
+	/* To avoid the cases when there are huge number of lock waiters, and
+	 * extend file size by big amount at a time, put some limit on the
+	 * max number of pages to be extended at a time.
+	 */
+	if (extraBlocks > 512)
+		extraBlocks = 512;
+
+	while (extraBlocks--)
+	{
+		/*
+		 * Here we are adding extra blocks to the relation after
+		 * adding each block update the information in FSM so that
+		 * other backend running parallel can find the block.
+		 */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+
+		UnlockReleaseBuffer(buffer);
+
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +321,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +397,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -388,6 +478,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  otherBlock, targetBlock, vmbuffer_other,
  vmbuffer);
 
+		lastValidBlock = targetBlock;
+
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
 		 * we're done.
@@ -441,27 +533,47 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	needLock = !RELATION_IS_LOCAL(relation);
 
 	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
-
-	/*
-	 * XXX This does an lseek - rather expensive - but at the moment it is the
-	 * only way to accurately determine how many blocks are in a relation.  Is
-	 * it worth keeping an accurate file length in shared memory someplace,
-	 * rather than relying on the kernel to do it for us?
-	 */
-	buffer = ReadBufferBI(relation, P_NEW, bistate);
-
-	/*
-	 * 

Re: [HACKERS] Relation extension scalability

2016-03-19 Thread Dilip Kumar
On Thu, Mar 17, 2016 at 1:31 PM, Petr Jelinek  wrote:

> Great.
>
> Just small notational thing, maybe this would be simpler?:
> extraBlocks = Min(512, lockWaiters * 20);
>

Done, new patch attached.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..a9e18dc 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,91 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * RelationAddOneBlock
+ *
+ * Extend relation by one block and lock the buffer
+ */
+static Buffer
+RelationAddOneBlock(Relation relation, Buffer otherBuffer, BulkInsertState bistate)
+{
+	Buffer buffer;
+	/*
+	 * XXX This does an lseek - rather expensive - but at the moment it is the
+	 * only way to accurately determine how many blocks are in a relation.  Is
+	 * it worth keeping an accurate file length in shared memory someplace,
+	 * rather than relying on the kernel to do it for us?
+	 */
+	buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+	/*
+	 * We can be certain that locking the otherBuffer first is OK, since
+	 * it must have a lower page number.
+	 */
+	if (otherBuffer != InvalidBuffer)
+		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Now acquire lock on the new page.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	return buffer;
+}
+
+/*
+ * RelationAddExtraBlocks
+ *
+ * Extend extra blocks for the relations to avoid the future contention
+ * on the relation extension lock.
+ */
+
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int 		lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * For calculating number of extra blocks to extend, find the level
+	 * of contention on this lock, by getting the requester of this lock
+	 * and add extra blocks in multiple of waiters.
+	 */
+	lockWaiters = RelationExtensionLockWaiter(relation);
+
+	/* To avoid the cases when there are huge number of lock waiters, and
+	 * extend file size by big amount at a time, put some limit on the
+	 * max number of pages to be extended at a time.
+	 */
+	extraBlocks = Min(512, lockWaiters * 20);
+
+	while (extraBlocks--)
+	{
+		/*
+		 * Here we are adding extra blocks to the relation after
+		 * adding each block update the information in FSM so that
+		 * other backend running parallel can find the block.
+		 */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+
+		UnlockReleaseBuffer(buffer);
+
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +318,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +394,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -388,6 +475,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  otherBlock, targetBlock, vmbuffer_other,
  vmbuffer);
 
+		lastValidBlock = targetBlock;
+
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
 		 * we're done.
@@ -441,27 +530,47 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	needLock = !RELATION_IS_LOCAL(relation);
 
 	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
-
-	/*
-	 * XXX This does an lseek - rather expensive - but at the moment it is the
-	 * only way to accurately determine how many blocks are in a relation.  Is
-	 * it worth keeping an accurate file length in shared memory someplace,
-	 * rather than relying on the kernel to do it for us?
-	 */
-	buffer = ReadBufferBI(relation, P_NEW, bistate);
-
-	/*
-	 * We can be certain that locking the otherBuffer first is OK, since it
-	 * must have a lower page number.
-	 */
-	if (otherBuffer != InvalidBuffer)
-		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+	{
+		/*
+		 * First try to get the lock in no-wait mode, if succeed extend one
+		 * block, else get the lock in normal mode and after we get the lock
+		 * extend some extra blocks, extra blocks will be added to satisfy
+		 * request of other waiters and avoid future contention. Here instead
+		 * of directly taking lock 

Re: [HACKERS] Relation extension scalability

2016-03-18 Thread Petr Jelinek

On 17/03/16 04:42, Dilip Kumar wrote:


On Mon, Mar 14, 2016 at 8:26 AM, Petr Jelinek > wrote:

Well any value we choose will be very arbitrary. If we look at it
from the point of maximum absolute disk space we allocate for
relation at once, the 4MB limit would represent 2.5 orders of
magnitude change. That sounds like enough for one release cycle, I
think we can further tune it if the need arises in next one. (with
my love for round numbers I would have suggested 8MB as that's 3
orders of magnitude, but I am fine with 4MB as well)


I have modified the patch, this contains the max limit on extra pages,
512(4MB) pages is the max limit.

I have measured the performance also and that looks equally good.



Great.

Just small notational thing, maybe this would be simpler?:
extraBlocks = Min(512, lockWaiters * 20);

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Petr Jelinek

On 14/03/16 03:29, Dilip Kumar wrote:


On Mon, Mar 14, 2016 at 5:02 AM, Jim Nasby > wrote:

Well, 16MB is 2K pages, which is what you'd get if 100 connections
were all blocked and we're doing 20 pages per waiter. That seems
like a really extreme scenario, so maybe 4MB is a good compromise.
That's unlikely to be hit in most cases, unlikely to put a ton of
stress on IO, even with magnetic media (assuming the whole 4MB is
queued to write in one shot...). 4MB would still reduce the number
of locks by 500x.


In my performance results given up thread, we are getting max
performance at 32 clients, means at a time we are extending 32*20 ~= max
(600) pages at a time. So now with 4MB limit (max 512 pages) Results
will looks similar. So we need to take a decision whether 4MB is good
limit, should I change it ?




Well any value we choose will be very arbitrary. If we look at it from 
the point of maximum absolute disk space we allocate for relation at 
once, the 4MB limit would represent 2.5 orders of magnitude change. That 
sounds like enough for one release cycle, I think we can further tune it 
if the need arises in next one. (with my love for round numbers I would 
have suggested 8MB as that's 3 orders of magnitude, but I am fine with 
4MB as well)


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Dilip Kumar
On Mon, Mar 14, 2016 at 5:02 AM, Jim Nasby  wrote:

> Well, 16MB is 2K pages, which is what you'd get if 100 connections were
> all blocked and we're doing 20 pages per waiter. That seems like a really
> extreme scenario, so maybe 4MB is a good compromise. That's unlikely to be
> hit in most cases, unlikely to put a ton of stress on IO, even with
> magnetic media (assuming the whole 4MB is queued to write in one shot...).
> 4MB would still reduce the number of locks by 500x.


In my performance results given up thread, we are getting max performance
at 32 clients, means at a time we are extending 32*20 ~= max (600) pages at
a time. So now with 4MB limit (max 512 pages) Results will looks similar.
So we need to take a decision whether 4MB is good limit, should I change it
?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Jim Nasby

On 3/11/16 9:57 PM, Petr Jelinek wrote:

I also think some kind of clamp is a good idea. It's not that
uncommon to run max_connections significantly higher than 100, so
the extension could be way larger than 16MB. In those cases this
patch could actually make things far worse as everyone backs up
waiting on the OS to extend many MB when all you actually needed
were a couple dozen more pages.


I agree, We can have some max limit on number of extra pages, What other
thinks ?



Well, that's what I meant with clamping originally. I don't know what is
a good value though.


Well, 16MB is 2K pages, which is what you'd get if 100 connections were 
all blocked and we're doing 20 pages per waiter. That seems like a 
really extreme scenario, so maybe 4MB is a good compromise. That's 
unlikely to be hit in most cases, unlikely to put a ton of stress on IO, 
even with magnetic media (assuming the whole 4MB is queued to write in 
one shot...). 4MB would still reduce the number of locks by 500x.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Dilip Kumar
On Sat, Mar 12, 2016 at 8:37 AM, Amit Kapila 
wrote:

> Can you post the numbers for 1, 5, 10, 15, 25 or whatever other multiplier
> you have tried, so that it is clear that 20 is best?


I had Tried with 1, 10, 20 and 50.

1. With base code it was almost the same as base code.

2. With 10 thread data it matching with my previous group extend patch data
posted upthread
http://www.postgresql.org/message-id/CAFiTN-tyEu+Wf0-jBc3TGfCoHdEAjNTx=wvuxpoa1vddyst...@mail.gmail.com

3. Beyond 20 with 50 I did not see any extra benefit in performance number
(compared to 20 when tested 50 with 4 byte COPY I did not tested other data
size with 50.).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 12/03/16 01:01, Jim Nasby wrote:

On 3/11/16 5:14 PM, Petr Jelinek wrote:

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare
case, but I did see something similar couple of times in the wild. It's
not objection against committing this patch though, in fact I think it
can be committed as is.


FWIW, this is definitely a real possibility in any shop that has very
high downtime costs and high transaction rates.

I also think some kind of clamp is a good idea. It's not that uncommon
to run max_connections significantly higher than 100, so the extension
could be way larger than 16MB. In those cases this patch could actually
make things far worse as everyone backs up waiting on the OS to extend
many MB when all you actually needed were a couple dozen more pages.



Well yeah I've seen 10k, but not everything will write to same table, 
wanted to stay in realms of something that has realistic chance of 
happening.



BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
demand for extension that means you're running lots of small DML
operations, not really big ones. I'd think that would make *1 more
appropriate.


The benchmarks I've seen showed you want at least *10 and *20 was better.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 12/03/16 03:46, Dilip Kumar wrote:


On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby > wrote:

FWIW, this is definitely a real possibility in any shop that has
very high downtime costs and high transaction rates.

I also think some kind of clamp is a good idea. It's not that
uncommon to run max_connections significantly higher than 100, so
the extension could be way larger than 16MB. In those cases this
patch could actually make things far worse as everyone backs up
waiting on the OS to extend many MB when all you actually needed
were a couple dozen more pages.


I agree, We can have some max limit on number of extra pages, What other
thinks ?



Well, that's what I meant with clamping originally. I don't know what is 
a good value though.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Amit Kapila
On Sat, Mar 12, 2016 at 8:16 AM, Dilip Kumar  wrote:
>
>
> On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby 
wrote:
>>
>> FWIW, this is definitely a real possibility in any shop that has very
high downtime costs and high transaction rates.
>>
>> I also think some kind of clamp is a good idea. It's not that uncommon
to run max_connections significantly higher than 100, so the extension
could be way larger than 16MB. In those cases this patch could actually
make things far worse as everyone backs up waiting on the OS to extend many
MB when all you actually needed were a couple dozen more pages.
>
>
> I agree, We can have some max limit on number of extra pages, What other
thinks ?
>
>
>>
>> BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
demand for extension that means you're running lots of small DML
operations, not really big ones. I'd think that would make *1 more
appropriate.
>
>
> *1 will not solve this problem, Here the main problem was many people are
sleep/wakeup on the extension lock and that was causing the bottleneck. So
if we do *1 this will satisfy only current requesters which has already
waited on the lock. But our goal is to avoid backends from requesting this
lock.
>
> Idea of Finding the requester to get the statistics on this locks (load
on the lock) and extend in multiple of load so that in future this
situation will be avoided for long time and again when happen next time
extend in multiple of load.
>
> How 20 comes ?
>   I tested with Multiple clients loads 1..64,  with multiple load size 4
byte records to 1KB Records,  COPY/ INSERT and found 20 works best.
>

Can you post the numbers for 1, 5, 10, 15, 25 or whatever other multiplier
you have tried, so that it is clear that 20 is best?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Dilip Kumar
On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby  wrote:

> FWIW, this is definitely a real possibility in any shop that has very high
> downtime costs and high transaction rates.
>
> I also think some kind of clamp is a good idea. It's not that uncommon to
> run max_connections significantly higher than 100, so the extension could
> be way larger than 16MB. In those cases this patch could actually make
> things far worse as everyone backs up waiting on the OS to extend many MB
> when all you actually needed were a couple dozen more pages.
>

I agree, We can have some max limit on number of extra pages, What other
thinks ?



> BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
> demand for extension that means you're running lots of small DML
> operations, not really big ones. I'd think that would make *1 more
> appropriate.
>

*1 will not solve this problem, Here the main problem was many people are
sleep/wakeup on the extension lock and that was causing the bottleneck. So
if we do *1 this will satisfy only current requesters which has already
waited on the lock. But our goal is to avoid backends from requesting this
lock.

Idea of Finding the requester to get the statistics on this locks (load on
the lock) and extend in multiple of load so that in future this situation
will be avoided for long time and again when happen next time extend in
multiple of load.

How 20 comes ?
  I tested with Multiple clients loads 1..64,  with multiple load size 4
byte records to 1KB Records,  COPY/ INSERT and found 20 works best.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Jim Nasby

On 3/11/16 5:14 PM, Petr Jelinek wrote:

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare
case, but I did see something similar couple of times in the wild. It's
not objection against committing this patch though, in fact I think it
can be committed as is.


FWIW, this is definitely a real possibility in any shop that has very 
high downtime costs and high transaction rates.


I also think some kind of clamp is a good idea. It's not that uncommon 
to run max_connections significantly higher than 100, so the extension 
could be way larger than 16MB. In those cases this patch could actually 
make things far worse as everyone backs up waiting on the OS to extend 
many MB when all you actually needed were a couple dozen more pages.


BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent 
demand for extension that means you're running lots of small DML 
operations, not really big ones. I'd think that would make *1 more 
appropriate.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 11/03/16 22:29, Robert Haas wrote:

On Thu, Mar 10, 2016 at 8:54 PM, Petr Jelinek  wrote:

I am not talking about extension locks, the lock queue can be long because
there is concurrent DDL for example and then once DDL finishes suddenly 100
connections that tried to insert into table will try to get extension lock
and this will add 2000 new pages when much fewer was actually needed. I
guess that's fine as it's corner case and it's only 16MB even in such
extreme case.


I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare 
case, but I did see something similar couple of times in the wild. It's 
not objection against committing this patch though, in fact I think it 
can be committed as is.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Robert Haas
On Thu, Mar 10, 2016 at 8:54 PM, Petr Jelinek  wrote:
> I am not talking about extension locks, the lock queue can be long because
> there is concurrent DDL for example and then once DDL finishes suddenly 100
> connections that tried to insert into table will try to get extension lock
> and this will add 2000 new pages when much fewer was actually needed. I
> guess that's fine as it's corner case and it's only 16MB even in such
> extreme case.

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Petr Jelinek

On 11/03/16 02:44, Dilip Kumar wrote:


On Fri, Mar 11, 2016 at 12:04 AM, Petr Jelinek > wrote:

Thanks for looking..

Those look good. The patch looks good in general now. I am bit
scared by the lockWaiters * 20 as it can result in relatively big
changes in rare corner cases when for example a lot of backends were
waiting for lock on relation and suddenly all try to extend it. I
wonder if we should clamp it to something sane (although what's sane
today might be small in couple of years).


But in such condition when all are waiting on lock, then at a time only
one waiter will get the lock and that will easily count the lock waiter
and extend in multiple of that. And we also have the check that if any
waiter get the lock it will first check in FSM that anybody else have
added one block or not..



I am not talking about extension locks, the lock queue can be long 
because there is concurrent DDL for example and then once DDL finishes 
suddenly 100 connections that tried to insert into table will try to get 
extension lock and this will add 2000 new pages when much fewer was 
actually needed. I guess that's fine as it's corner case and it's only 
16MB even in such extreme case.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Dilip Kumar
On Fri, Mar 11, 2016 at 12:04 AM, Petr Jelinek  wrote:

Thanks for looking..

Those look good. The patch looks good in general now. I am bit scared by
> the lockWaiters * 20 as it can result in relatively big changes in rare
> corner cases when for example a lot of backends were waiting for lock on
> relation and suddenly all try to extend it. I wonder if we should clamp it
> to something sane (although what's sane today might be small in couple of
> years).


But in such condition when all are waiting on lock, then at a time only one
waiter will get the lock and that will easily count the lock waiter and
extend in multiple of that. And we also have the check that if any waiter
get the lock it will first check in FSM that anybody else have added one
block or not..

And other waiter will not get lock unless first waiter extend all blocks
and release the locks.

One possible case is as soon as we extend the blocks new requester directly
find in FSM and don't come for lock, and old waiter after getting lock
don't find in FSM, But IMHO in such cases, also its good that other waiter
also extend more blocks (because this can happen when request flow is very
high).


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Petr Jelinek

On 10/03/16 09:57, Dilip Kumar wrote:


On Thu, Mar 10, 2016 at 7:55 AM, Petr Jelinek > wrote:

Thanks for the comments..

Hmm, why did you remove the comment above the call to
UnlockRelationForExtension?

While re factoring I lose this comment.. Fixed it

It still seems relevant, maybe with some minor modification?

Also there is a bit of whitespace mess inside the conditional lock
block.

Fixed

I got the result of 10 mins run so posting it..
Note: Base code results are copied from up thread...

Results For 10 Mins run of COPY 1 records of size 4 bytes script and
configuration are same as used in up thread


ClientBasePatch
1105111
2217246
4210428
8166653
16  145808
32  124988
64---974


Results For 10 Mins run of INSERT 1000 records of size 1024 bytes(data
don't fits in shared buffer)
--

ClientBasePatch
1117120
2111126
4 51 130
8 43 147
16   40 209
32   ---  254
64   ---  205



Those look good. The patch looks good in general now. I am bit scared by 
the lockWaiters * 20 as it can result in relatively big changes in rare 
corner cases when for example a lot of backends were waiting for lock on 
relation and suddenly all try to extend it. I wonder if we should clamp 
it to something sane (although what's sane today might be small in 
couple of years).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Dilip Kumar
On Thu, Mar 10, 2016 at 7:55 AM, Petr Jelinek  wrote:

Thanks for the comments..


> Hmm, why did you remove the comment above the call to
> UnlockRelationForExtension?

While re factoring I lose this comment.. Fixed it


> It still seems relevant, maybe with some minor modification?
>
> Also there is a bit of whitespace mess inside the conditional lock block.


Fixed

I got the result of 10 mins run so posting it..
Note: Base code results are copied from up thread...

Results For 10 Mins run of COPY 1 records of size 4 bytes script and
configuration are same as used in up thread


ClientBasePatch
1105111
2217246
4210428
8166653
16  145808
32  124988
64---974


Results For 10 Mins run of INSERT 1000 records of size 1024 bytes(data
don't fits in shared buffer)
--

ClientBasePatch
1117120
2111126
4 51 130
8 43 147
16   40 209
32   ---  254
64   ---  205

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..b73535c 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,87 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * RelationAddOneBlock
+ *
+ * Extend relation by one block and lock the buffer
+ */
+static Buffer
+RelationAddOneBlock(Relation relation, Buffer otherBuffer, BulkInsertState bistate)
+{
+	Buffer buffer;
+	/*
+	 * XXX This does an lseek - rather expensive - but at the moment it is the
+	 * only way to accurately determine how many blocks are in a relation.  Is
+	 * it worth keeping an accurate file length in shared memory someplace,
+	 * rather than relying on the kernel to do it for us?
+	 */
+	buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+	/*
+	 * We can be certain that locking the otherBuffer first is OK, since
+	 * it must have a lower page number.
+	 */
+	if (otherBuffer != InvalidBuffer)
+		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Now acquire lock on the new page.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	return buffer;
+}
+
+/*
+ * RelationAddExtraBlocks
+ *
+ * Extend extra blocks for the relations to avoid the future contention
+ * on the relation extension lock.
+ */
+
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int 		lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * For calculating number of extra blocks to extend, find the level
+	 * of contention on this lock, by getting the requester of this lock
+	 * and add extra blocks in multiple of waiters.
+	 */
+	lockWaiters = RelationExtensionLockWaiter(relation);
+
+	extraBlocks = lockWaiters * 20;
+
+	while (extraBlocks--)
+	{
+		/*
+		 * Here we are adding extra blocks to the relation after
+		 * adding each block update the information in FSM so that
+		 * other backend running parallel can find the block.
+		 */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+
+		UnlockReleaseBuffer(buffer);
+
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +314,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +390,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != InvalidBlockNumber)
 	{
 		/*
@@ -388,6 +471,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
  otherBlock, targetBlock, vmbuffer_other,
  vmbuffer);
 
+		lastValidBlock = targetBlock;
+
 		/*
 		 * Now we can check to see if there's enough free space here. If so,
 		 * we're done.
@@ -441,27 +526,47 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	needLock = !RELATION_IS_LOCAL(relation);
 
 	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
-
-	/*
-	 * XXX 

Re: [HACKERS] Relation extension scalability

2016-03-09 Thread Petr Jelinek

On 10/03/16 02:53, Dilip Kumar wrote:


Attaching a latest patch.



Hmm, why did you remove the comment above the call to 
UnlockRelationForExtension? It still seems relevant, maybe with some 
minor modification?


Also there is a bit of whitespace mess inside the conditional lock block.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-09 Thread Dilip Kumar
On Wed, Mar 9, 2016 at 1:39 AM, Robert Haas  wrote:

> LockWaiterCount() bravely accesses a shared memory data structure that
> is mutable with no locking at all.  That might actually be safe for
> our purposes, but I think it would be better to take the partition
> lock in shared mode if it doesn't cost too much performance.  If
> that's too expensive, then it should at least have a comment
> explaining (1) that it is doing this without the lock and (2) why
> that's safe (sketch: the LOCK can't go away because we hold it, and
> nRequested could change but we don't mind a stale value, and a 32-bit
> read won't be torn).
>

With LWLock also performance are equally good so added the lock.


>
> A few of the other functions in here also lack comments, and perhaps
> should have them.
>
> The changes to RelationGetBufferForTuple need a visit from the
> refactoring police.  Look at the way you are calling
> RelationAddOneBlock.  The logic looks about like this:
>
> if (needLock)
> {
>   if (trylock relation for extension)
> RelationAddOneBlock();
>   else
>   {
> lock relation for extension;
> if (use fsm)
> {
>   complicated;
> }
> else
>   RelationAddOneBlock();
> }
> else
>   RelationAddOneBlock();
>
> So basically you want to do the RelationAddOneBlock() thing if
> !needLock || !use_fsm || can't trylock.  See if you can rearrange the
> code so that there's only one fallthrough call to
> RelationAddOneBlock() instead of three separate ones.
>

Actually in every case we need one blocks, So I have re factored it and
RelationAddOneBlock is now out of any condition.


>
> Also, consider moving the code that adds multiple blocks at a time to
> its own function instead of including it all in line.
>

Done

Attaching a latest patch.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..eb4ee0c 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -169,6 +169,86 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 }
 
 /*
+ * RelationAddOneBlock
+ *
+ * Extend relation by one block and lock the buffer
+ */
+static Buffer
+RelationAddOneBlock(Relation relation, Buffer otherBuffer, BulkInsertState bistate)
+{
+	Buffer buffer;
+	/*
+	 * XXX This does an lseek - rather expensive - but at the moment it is the
+	 * only way to accurately determine how many blocks are in a relation.  Is
+	 * it worth keeping an accurate file length in shared memory someplace,
+	 * rather than relying on the kernel to do it for us?
+	 */
+	buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+	/*
+	 * We can be certain that locking the otherBuffer first is OK, since
+	 * it must have a lower page number.
+	 */
+	if (otherBuffer != InvalidBuffer)
+		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * Now acquire lock on the new page.
+	 */
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	return buffer;
+}
+/*
+ * RelationAddExtraBlocks
+ *
+ * Extend extra blocks for the relations to avoid the future contention
+ * on the relation extension lock.
+ */
+
+static void
+RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
+{
+	Page		page;
+	Size		freespace;
+	BlockNumber blockNum;
+	int			extraBlocks = 0;
+	int 		lockWaiters = 0;
+	Buffer		buffer;
+
+	/*
+	 * For calculating number of extra blocks to extend, find the level
+	 * of contention on this lock, by getting the requester of this lock
+	 * and add extra blocks in multiple of waiters.
+	 */
+	lockWaiters = RelationExtensionLockWaiter(relation);
+
+	extraBlocks = lockWaiters * 20;
+
+	while (extraBlocks--)
+	{
+		/*
+		 * Here we are adding extra blocks to the relation after
+		 * adding each block update the information in FSM so that
+		 * other backend running parallel can find the block.
+		 */
+		buffer = ReadBufferBI(relation, P_NEW, bistate);
+
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		page = BufferGetPage(buffer);
+		PageInit(page, BufferGetPageSize(buffer), 0);
+		freespace = PageGetHeapFreeSpace(page);
+		MarkBufferDirty(buffer);
+		blockNum = BufferGetBlockNumber(buffer);
+
+		UnlockReleaseBuffer(buffer);
+
+		RecordPageWithFreeSpace(relation, blockNum, freespace);
+	}
+}
+
+/*
  * RelationGetBufferForTuple
  *
  *	Returns pinned and exclusive-locked buffer of a page in given relation
@@ -233,10 +313,11 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	bool		use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
 	Buffer		buffer = InvalidBuffer;
 	Page		page;
-	Size		pageFreeSpace,
-saveFreeSpace;
+	Size		pageFreeSpace = 0,
+saveFreeSpace = 0;
 	BlockNumber targetBlock,
-otherBlock;
+otherBlock,
+lastValidBlock = InvalidBlockNumber;
 	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
@@ -308,6 +389,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		}
 	}
 
+loop:
 	while (targetBlock != 

Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 11:20 AM, Dilip Kumar  wrote:
> I have come up with this patch..
>
> If this approach looks fine then I will prepare final patch (more comments,
> indentation, and improve some code) and do some long run testing (current
> results are 5 mins run).
>
> Idea is same what Robert and Amit suggested up thread.

So this seems promising, but I think the code needs some work.

LockWaiterCount() bravely accesses a shared memory data structure that
is mutable with no locking at all.  That might actually be safe for
our purposes, but I think it would be better to take the partition
lock in shared mode if it doesn't cost too much performance.  If
that's too expensive, then it should at least have a comment
explaining (1) that it is doing this without the lock and (2) why
that's safe (sketch: the LOCK can't go away because we hold it, and
nRequested could change but we don't mind a stale value, and a 32-bit
read won't be torn).

A few of the other functions in here also lack comments, and perhaps
should have them.

The changes to RelationGetBufferForTuple need a visit from the
refactoring police.  Look at the way you are calling
RelationAddOneBlock.  The logic looks about like this:

if (needLock)
{
  if (trylock relation for extension)
RelationAddOneBlock();
  else
  {
lock relation for extension;
if (use fsm)
{
  complicated;
}
else
  RelationAddOneBlock();
}
else
  RelationAddOneBlock();

So basically you want to do the RelationAddOneBlock() thing if
!needLock || !use_fsm || can't trylock.  See if you can rearrange the
code so that there's only one fallthrough call to
RelationAddOneBlock() instead of three separate ones.

Also, consider moving the code that adds multiple blocks at a time to
its own function instead of including it all in line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Dilip Kumar
On Tue, Mar 8, 2016 at 8:34 PM, Amit Kapila  wrote:

> > >> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> > >> how big the relation extension lock wait queue is, instead of adding
> > >> more stuff to PGPROC?
> > >
> > > One idea to make it work without adding additional stuff in PGPROC is
> that
> > > after acquiring relation extension lock, check if there is any
> available
> > > block in fsm, if it founds any block, then release the lock and
> proceed,
> > > else extend the relation by one block and then check lock's wait queue
> size
> > > or number of lock requests (nRequested) and extend the relation
> further in
> > > proportion to wait queue size and then release the lock and proceed.
> Here,
> > > I think we can check for wait queue size even before extending the
> relation
> > > by one block.
>


I have come up with this patch..

If this approach looks fine then I will prepare final patch (more comments,
indentation, and improve some code) and do some long run testing (current
results are 5 mins run).

Idea is same what Robert and Amit suggested up thread.

/* First we try the lock and if get just extend one block, this will give
two benefit ,
1. Single thread performance will not impact by checking lock waiters and
all
2. If we check the waiter in else part it will give time for more waiter to
get collected and will get better estimation of contention*/

TryRelExtLock ()
{
extend one block
}
else
{
   RelextLock()
   if (recheck the FSM if somebody have added blocks for me)
  -- don't extend any block just reuse
   else
  --we have to extend blocks
  -- get the waiter = lock->nRequested
  --add extra block to FSM extraBlock = waiter*20;
}

Result looks like this
---

Test1(COPY)
-./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";echo COPY data from
'/tmp/copybinary' WITH BINARY; > copy_script
./pgbench -f copy_script -T 300 -c$ -j$ postgres

Shared Buffer:8GBmax_wal_size:10GB  Storage:Magnetic DiskWAL:SSD
---
ClientBase  multi_extend by 20 page lock_waiter patch(waiter*20)
1  105157 148
2  217255 252
4  210 494442
8  166702 645
16145 563773
32124 480957

Note: @1 thread there should not be any improvement, so many be run to run
variance.

Test2(INSERT)
./psql -d postgres -c "create table test_data(a int, b text)"./psql
-d postgres -c "insert into test_data
values(generate_series(1,1000),repeat('x', 1024))"./psql -d postgres -c
"create table data (a int, b text)
echo "insert into data select * from test_data;" >> insert_script

 shared_buffers=512GB  max_wal_size=20GB  checkpoint_timeout=10min
./pgbench -c $ -j $ -f insert_script -T 300 postgres

ClientBaseMulti-extend by 1000lock_waiter patch(waiter*20)
1117118  117

2111140  132

4 51 190  134

843  254  148

16   40 243  206
32   - -  264


* (waiter*20)-> First process got the lock will find the lock waiters and
add waiter*20 extra blocks.

In next run I will run beyond 32 also, as we can see even at 32 client its
increasing.. so its clear when it see more contentions, adapting to
contention and adding more blocks..


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Amit Kapila
On Tue, Mar 8, 2016 at 7:23 PM, Robert Haas  wrote:
>
> On Tue, Mar 8, 2016 at 4:27 AM, Amit Kapila 
wrote:
> >> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> >> how big the relation extension lock wait queue is, instead of adding
> >> more stuff to PGPROC?
> >
> > One idea to make it work without adding additional stuff in PGPROC is
that
> > after acquiring relation extension lock, check if there is any available
> > block in fsm, if it founds any block, then release the lock and proceed,
> > else extend the relation by one block and then check lock's wait queue
size
> > or number of lock requests (nRequested) and extend the relation further
in
> > proportion to wait queue size and then release the lock and proceed.
Here,
> > I think we can check for wait queue size even before extending the
relation
> > by one block.
> >
> > The benefit of doing it with PGPROC is that there will be relatively
less
> > number LockAcquire calls as compare to heavyweight lock approach, which
I
> > think should not matter much because we are planing to extend the
relation
> > in proportion to wait queue size (probably wait queue size * 10).
>
> I don't think switching relation extension from heavyweight locks to
> lightweight locks is going to work.
>

Sorry, but I am not suggesting to change it to lightweight locks.  I am
just suggesting how to make batching works with heavyweight locks as asked
by you.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 4:27 AM, Amit Kapila  wrote:
>> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
>> how big the relation extension lock wait queue is, instead of adding
>> more stuff to PGPROC?
>
> One idea to make it work without adding additional stuff in PGPROC is that
> after acquiring relation extension lock, check if there is any available
> block in fsm, if it founds any block, then release the lock and proceed,
> else extend the relation by one block and then check lock's wait queue size
> or number of lock requests (nRequested) and extend the relation further in
> proportion to wait queue size and then release the lock and proceed.  Here,
> I think we can check for wait queue size even before extending the relation
> by one block.
>
> The benefit of doing it with PGPROC is that there will be relatively less
> number LockAcquire calls as compare to heavyweight lock approach, which I
> think should not matter much because we are planing to extend the relation
> in proportion to wait queue size (probably wait queue size * 10).

I don't think switching relation extension from heavyweight locks to
lightweight locks is going to work.  It would mean, for example, that
we lose the ability to service interrupts while extending a relation;
not to mention that we lose scalability if many relations are being
extended at once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Amit Kapila
On Mon, Mar 7, 2016 at 8:34 PM, Robert Haas  wrote:
>
> On Fri, Mar 4, 2016 at 11:49 PM, Amit Kapila 
wrote:
> > I think one thing which needs more thoughts about this approach is that
we
> > need to maintain some number of slots so that group extend for different
> > relations can happen in parallel.  Do we want to provide simultaneous
> > extension for 1, 2, 3, 4, 5 or more number of relations?  I think
providing
> > it for three or four relations should be okay as higher the number we
want
> > to provide, bigger the size of PGPROC structure will be.
>
> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> how big the relation extension lock wait queue is, instead of adding
> more stuff to PGPROC?
>

One idea to make it work without adding additional stuff in PGPROC is that
after acquiring relation extension lock, check if there is any available
block in fsm, if it founds any block, then release the lock and proceed,
else extend the relation by one block and then check lock's wait queue size
or number of lock requests (nRequested) and extend the relation further in
proportion to wait queue size and then release the lock and proceed.  Here,
I think we can check for wait queue size even before extending the relation
by one block.

The benefit of doing it with PGPROC is that there will be relatively less
number LockAcquire calls as compare to heavyweight lock approach, which I
think should not matter much because we are planing to extend the relation
in proportion to wait queue size (probably wait queue size * 10).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-07 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:49 PM, Amit Kapila  wrote:
> I think one thing which needs more thoughts about this approach is that we
> need to maintain some number of slots so that group extend for different
> relations can happen in parallel.  Do we want to provide simultaneous
> extension for 1, 2, 3, 4, 5 or more number of relations?  I think providing
> it for three or four relations should be okay as higher the number we want
> to provide, bigger the size of PGPROC structure will be.

Hmm.  Can we drive this off of the heavyweight lock manager's idea of
how big the relation extension lock wait queue is, instead of adding
more stuff to PGPROC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 9:59 PM, Robert Haas  wrote:
>
> On Fri, Mar 4, 2016 at 12:06 AM, Dilip Kumar 
wrote:
> > I have tried the approach of group extend,
> >
> > 1. We convert the extension lock to TryLock and if we get the lock then
> > extend by one block.2.
> > 2. If don't get the Lock then use the Group leader concep where only one
> > process will extend for all, Slight change from ProcArrayGroupClear is
that
> > here other than satisfying the requested backend we Add some extra
blocks in
> > FSM, say GroupSize*10.
> > 3. So Actually we can not get exact load but still we have some factor
like
> > group size tell us exactly the contention size and we extend in
multiple of
> > that.
>
> This approach seems good to me, and the performance results look very
> positive.  The nice thing about this is that there is not a
> user-configurable knob; the system automatically determines when
> larger extensions are needed, which will mean that real-world users
> are much more likely to benefit from this.
>

I think one thing which needs more thoughts about this approach is that we
need to maintain some number of slots so that group extend for different
relations can happen in parallel.  Do we want to provide simultaneous
extension for 1, 2, 3, 4, 5 or more number of relations?  I think providing
it for three or four relations should be okay as higher the number we want
to provide, bigger the size of PGPROC structure will be.

+GroupExtendRelation(PGPROC *proc, Relation relation, BulkInsertState
bistate)

+{

+ volatile PROC_HDR *procglobal = ProcGlobal;

+ uint32 nextidx;

+ uint32 wakeidx;

+ int extraWaits = -1;

+ BlockNumber targetBlock;

+ int count = 0;

+

+ /* Add ourselves to the list of processes needing a group extend. */

+ proc->groupExtendMember = true;

..

..

+ /* We are the leader.  Acquire the lock on behalf of everyone. */

+ LockRelationForExtension(relation, ExclusiveLock);


To provide it for multiple relations, I think you need to advocate the reloid
for relation in each proc and then get the relation descriptor for relation
extension lock.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> This approach seems good to me, and the performance results look very
> positive.  The nice thing about this is that there is not a
> user-configurable knob; the system automatically determines when
> larger extensions are needed, which will mean that real-world users
> are much more likely to benefit from this.  I don't think it matters
> that this is a little faster or slower than an approach with a manual
> knob; what matter is that it is a huge improvement over unpatched
> master, and that it does not need a knob.  The arbitrary constant of
> 10 is a little unsettling but I think we can live with it.

+1.  "No knob" is a huge win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 12:06 AM, Dilip Kumar  wrote:
> I have tried the approach of group extend,
>
> 1. We convert the extension lock to TryLock and if we get the lock then
> extend by one block.2.
> 2. If don't get the Lock then use the Group leader concep where only one
> process will extend for all, Slight change from ProcArrayGroupClear is that
> here other than satisfying the requested backend we Add some extra blocks in
> FSM, say GroupSize*10.
> 3. So Actually we can not get exact load but still we have some factor like
> group size tell us exactly the contention size and we extend in multiple of
> that.

This approach seems good to me, and the performance results look very
positive.  The nice thing about this is that there is not a
user-configurable knob; the system automatically determines when
larger extensions are needed, which will mean that real-world users
are much more likely to benefit from this.  I don't think it matters
that this is a little faster or slower than an approach with a manual
knob; what matter is that it is a huge improvement over unpatched
master, and that it does not need a knob.  The arbitrary constant of
10 is a little unsettling but I think we can live with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-03 Thread Dilip Kumar
On Wed, Mar 2, 2016 at 10:31 AM, Dilip Kumar  wrote:

> 1. One option can be as you suggested like ProcArrayGroupClearXid, With
> some modification, because when we wait for the request and extend w.r.t
> that, may be again we face the Context Switch problem, So may be we can
> extend in some multiple of the Request.
> (But we need to take care whether to give block directly to requester or
> add it into FSM or do both i.e. give at-least one block to requester and
> put some multiple in FSM)
>
>
> 2. Other option can be that we analyse the current Load on the extend and
> then speed up or slow down the extending speed.
> Simple algorithm can look like this
>

I have tried the approach of group extend,

1. We convert the extension lock to TryLock and if we get the lock then
extend by one block.2.
2. If don't get the Lock then use the Group leader concep where only one
process will extend for all, Slight change from ProcArrayGroupClear is that
here other than satisfying the requested backend we Add some extra blocks
in FSM, say GroupSize*10.
3. So Actually we can not get exact load but still we have some factor like
group size tell us exactly the contention size and we extend in multiple of
that.

Performance Analysis:
-
Performance is scaling with this approach, its slightly less compared to
previous patch where we directly give extend_by_block parameter and extend
in multiple blocks, and I think that's obvious because in group extend case
only after contention happen on lock we add extra blocks, but in former
case it was extending extra blocks optimistically.

Test1(COPY)
-
./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";
echo COPY data from '/tmp/copybinary' WITH BINARY; > copy_script

./pgbench -f copy_script -T 300 -c$ -j$ postgres

Shared Buffer:8GBmax_wal_size:10GB  Storage:Magnetic DiskWAL:SSD
---
ClientBase  multi_extend by 20 page group_extend_patch(groupsize*10)
1  105157   149
2  217255   282
4  210 494  452
8  166702   629
16145 563  561
32124 480  566

Test2(INSERT)

./psql -d postgres -c "create table test_data(a int, b text)"
./psql -d postgres -c "insert into test_data
values(generate_series(1,1000),repeat('x', 1024))"
./psql -d postgres -c "create table data (a int, b text)
echo "insert into data select * from test_data;" >> insert_script

 shared_buffers=512GB  max_wal_size=20GB  checkpoint_timeout=10min
./pgbench -c $ -j $ -f insert_script -T 300 postgres

ClientBaseMulti-extend by 1000*group extend (group*10)
*group extend (group*100)
1117118
125 122
2111140
   161 159
4 51 190
141 187
843  254
148 172
16   40 243
150 173


* (group*10)-> means inside the code, Group leader will see how many
members are in the group who want blocks, so we will satisfy request of all
member + will put extra blocks in FSM extra block to extend are =
(group*10) --> 10 is just some constant.


Summary:

1. Here with group extend patch, there is no configuration to tell that how
many block to extend, so that should be decided by current load in the
system (contention on the extension lock).
2. With small multiplier i.e. 10 we can get fairly good improvement compare
to base code, but when load is high like record size is 1K, improving the
multiplier size giving better results.


* Note: Currently this is POC patch, It has only one group Extend List, so
currently can handle only one relation Group extend.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..adb82ba 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -23,6 +23,7 @@
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
+#include "storage/proc.h"
 
 
 /*
@@ -168,6 +169,160 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 	}
 }
 
+
+static BlockNumber
+GroupExtendRelation(PGPROC *proc, Relation relation, BulkInsertState bistate)
+{
+	volatile PROC_HDR *procglobal = ProcGlobal;
+	uint32		nextidx;
+	uint32		

Re: [HACKERS] Relation extension scalability

2016-03-01 Thread Dilip Kumar
On Tue, Mar 1, 2016 at 4:36 PM, Amit Kapila  wrote:

> One thing that is slightly unclear is that whether there is any overhead
> due to buffer eviction especially when the buffer to be evicted is already
> dirty and needs XLogFlush().  One reason why it might not hurt is that by
> the time we tries to evict the buffer, corresponding WAL is already flushed
> by WALWriter or other possibility could be that even if it is getting done
> during buffer eviction, the impact for same is much lesser.  Can we try to
> measure the number of flush calls which happen during buffer eviction?
>
>
Good Idea, I will do this test and post the results..


>
>
>> Proc Sleep test for Insert test when data don't fits in shared buffer and
>> inserting big record of 1024 bytes, is currently running
>> once I get the data will post the same.
>>
>>
> Okay.  However, I wonder if the performance data for the case when data
> doesn't fit into shared buffer also shows similar trend, then it might be
> worth to try by doing extend w.r.t load in system.  I mean to say we can
> batch the extension requests (as we have done in ProcArrayGroupClearXid)
> and extend accordingly, if that works out then the benefit could be that we
> don't need any configurable knob for the same.
>

1. One option can be as you suggested like ProcArrayGroupClearXid, With
some modification, because when we wait for the request and extend w.r.t
that, may be again we face the Context Switch problem, So may be we can
extend in some multiple of the Request.
(But we need to take care whether to give block directly to requester or
add it into FSM or do both i.e. give at-least one block to requester and
put some multiple in FSM)


2. Other option can be that we analyse the current Load on the extend and
then speed up or slow down the extending speed.
Simple algorithm can look like this

If (GotBlockFromFSM())
  Success++  // We got the block from FSM
Else
  Failure++// Did not get Block from FSM and need to
extend by my self

Now while extending
-
Speed up
-
If (failure - success > Threshold )  // Threshold can be one number assume
10.
ExtendByBlock += failure- success;   --> If many failure means load
is high then Increase the ExtendByBlock
Failure = success= 0 --> reset after this
so that we can measure the latest trend.

Slow down..
--
//Now its possible that demand of block is reduced but ExtendByBlock is
still high.. So analyse statistic and slow down the extending pace..

If (success - failure >  Threshold)
{
// Can not reduce it by big number because, may be more request are
satisfying because this is correct amount, so gradually decrease the pace
and re-analyse the statistics next time.
   ExtendByBlock --;
   Failure = success= 0
}

Any Suggestions are Welcome...


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-01 Thread Amit Kapila
On Mon, Feb 29, 2016 at 3:37 PM, Dilip Kumar  wrote:

>
> On Wed, Feb 10, 2016 at 7:06 PM, Dilip Kumar 
> wrote:
>
>>
>
> Test2: Identify that improvement in case of multiextend is becuase of
> avoiding context switch or some other factor, like reusing blocks b/w
> backend by putting in FSM..
>
> 1. Test by just extending multiple blocks and reuse in it's own backend
> (Don't put in FSM)
> Insert 1K record data don't fits in shared buffer 512MB Shared Buffer
>
>
>
ClientBaseExtend 800 block self useExtend 1000 Block
> 1  117  131 118
> 2  111  203 140
> 351  242 178
> 451  231 190
> 552  259 224
> 651  263 243
> 743  253  254
> 843  240  254
> 16  40  190  243
>
> We can see the same improvement in case of self using the blocks also, It
> shows that Sharing the blocks between the backend was not the WIN but
> avoiding context switch was the measure win.
>
>
One thing that is slightly unclear is that whether there is any overhead
due to buffer eviction especially when the buffer to be evicted is already
dirty and needs XLogFlush().  One reason why it might not hurt is that by
the time we tries to evict the buffer, corresponding WAL is already flushed
by WALWriter or other possibility could be that even if it is getting done
during buffer eviction, the impact for same is much lesser.  Can we try to
measure the number of flush calls which happen during buffer eviction?



> 2. Tested the Number of ProcSleep during the Run.
> This is the simple script of copy 1 record in one transaction of size
> 4 Bytes
>
> * BASE CODE*
> *PATCH MULTI EXTEND*
> ClientBase_TPSProcSleep CountExtend By 10 BlockProc
> Sleep Count
> 2280   457,506
> 31162,641
> 32351,098,701
> 358   141,624
> 42161,155,735
> 368   188,173
>
> What we can see in above test that, in Base code performance is degrading
> after 2 threads, while Proc Sleep count in increasing with huge amount.
>
> Compared to that in Patch, with extending 10 blocks at a time Proc Sleep
> reduce to ~1/8 and we can see it is constantly scaling.
>
> Proc Sleep test for Insert test when data don't fits in shared buffer and
> inserting big record of 1024 bytes, is currently running
> once I get the data will post the same.
>
>
Okay.  However, I wonder if the performance data for the case when data
doesn't fit into shared buffer also shows similar trend, then it might be
worth to try by doing extend w.r.t load in system.  I mean to say we can
batch the extension requests (as we have done in ProcArrayGroupClearXid)
and extend accordingly, if that works out then the benefit could be that we
don't need any configurable knob for the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-02-29 Thread Dilip Kumar
On Wed, Feb 10, 2016 at 7:06 PM, Dilip Kumar  wrote:

>
I have tested Relation extension patch from various aspects and performance
results and other statistical data are explained in the mail.

Test 1: Identify the Heavy Weight lock is the Problem or the Actual Context
Switch
1. I converted the RelationExtensionLock to simple LWLock and tested with
single Relation. Results are as below

This is the simple script of copy 1 record in one transaction of size 4
Bytes
clientbaselwlockmulti_extend by 50 block
1155156160
2282276284
4248319428
8161267675
16   143241889

LWLock performance is better than base, obvious reason may be because we
have saved some instructions by converting to LWLock but it don't scales
any better compared to base code.


Test2: Identify that improvement in case of multiextend is becuase of
avoiding context switch or some other factor, like reusing blocks b/w
backend by putting in FSM..

1. Test by just extending multiple blocks and reuse in it's own backend
(Don't put in FSM)
Insert 1K record data don't fits in shared buffer 512MB Shared Buffer


ClientBaseExtend 800 block self useExtend 1000 Block
1  117  131 118
2  111  203 140
351  242 178
451  231 190
552  259 224
651  263 243
743  253  254
843  240  254
16  40  190  243

We can see the same improvement in case of self using the blocks also, It
shows that Sharing the blocks between the backend was not the WIN but
avoiding context switch was the measure win.

2. Tested the Number of ProcSleep during the Run.
This is the simple script of copy 1 record in one transaction of size 4
Bytes

* BASE CODE*
*PATCH MULTI EXTEND*
ClientBase_TPSProcSleep CountExtend By 10 BlockProc
Sleep Count
2280   457,506
31162,641
32351,098,701
358   141,624
42161,155,735
368   188,173

What we can see in above test that, in Base code performance is degrading
after 2 threads, while Proc Sleep count in increasing with huge amount.

Compared to that in Patch, with extending 10 blocks at a time Proc Sleep
reduce to ~1/8 and we can see it is constantly scaling.

Proc Sleep test for Insert test when data don't fits in shared buffer and
inserting big record of 1024 bytes, is currently running
once I get the data will post the same.

Posting the re-based version and moving to next CF.

Open points:
1. After getting the Lock recheck the FSM if some other back end has
already added extra blocks and reuse them.
2. Is it good idea to have user level parameter for extend_by_block or we
can try some approach to internally identify how many blocks are needed and
as per the need only add the blocks, this will make it more flexible.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 86b9ae1..78e81dd 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -268,6 +268,16 @@ static relopt_int intRelOpts[] =
 #endif
 	},
 
+	{
+		{
+			"extend_by_blocks",
+			"Number of blocks to be added to relation in every extend call",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		1, 1, 1
+	},
+
 	/* list terminator */
 	{{NULL}}
 };
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"user_catalog_table", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, user_catalog_table)}
+		offsetof(StdRdOptions, user_catalog_table)},
+		{"extend_by_blocks", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, extend_by_blocks)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, );
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..eb3ce17 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -238,6 +238,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	BlockNumber targetBlock,
 otherBlock;
 	bool		needLock;
+	int			extraBlocks;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -443,25 +444,50 @@ RelationGetBufferForTuple(Relation relation, 

Re: [HACKERS] Relation extension scalability

2016-02-10 Thread Andres Freund
On 2016-02-10 10:32:44 +0530, Dilip Kumar wrote:
> On Fri, Feb 5, 2016 at 4:50 PM, Amit Kapila  wrote:
> 
> > > Could you also measure how this behaves for an INSERT instead of a COPY
> > > workload?
> >
> >
> I think such a test will be useful.
> >
> 
> I have measured the performance with insert to see the behavior when it
> don't use strategy. I have tested multiple option, small tuple, big tuple,
> data fits in shared buffer and doesn't fit in shared buffer.

Could you please also have a look at the influence this has on latency?
I think you unfortunately have to look at the per-transaction logs, and
then see whether the worst case latencies get better or worse.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-02-09 Thread Dilip Kumar
On Fri, Feb 5, 2016 at 4:50 PM, Amit Kapila  wrote:

> > Could you also measure how this behaves for an INSERT instead of a COPY
> > workload?
>
>
I think such a test will be useful.
>

I have measured the performance with insert to see the behavior when it
don't use strategy. I have tested multiple option, small tuple, big tuple,
data fits in shared buffer and doesn't fit in shared buffer.

Observation:
--
 Apart from this test I have also used some tool which can take a many
stack traces with some delay.
1. I have observed with base code (when data don't fits in shared buffer)
almost all the stack traces are waiting on "LockRelationForExtension" and
many on "FlushBuffer" also (Flushing the dirty buffer).

Total Stack Captured: 204, FlushBuffer: 13, LockRelationForExtension: 187

 (This test run with 8 thread (shared buf 512MB) and after every 5 second
stack is captured.)

2. If I change shared buf 48GB then Obviously FlushBuffer disappeared but
still LockRelationForExtension remains in very high number.

3.Performance of base code in both the cases when Data fits in shared
buffers or doesn't fits in shared buffer remain very low and non-scaling(we
can see that in below results).

Test--1 (big record insert and Data fits in shared Buffer)

setup

./psql -d postgres -c "create table test_data(a int, b text)"
./psql -d postgres -c "insert into test_data
values(generate_series(1,1000),repeat('x', 1024))"
./psql -d postgres -c "create table data (a int)
with(extend_by_blocks=$variable)" {create table data (a int) for base code}
echo "insert into data select * from test_data;" >> copy_script

test:
-
 shared_buffers=48GB  max_wal_size=20GB  checkpoint_timeout=10min
./pgbench -c $ -j $ -f copy_script -T 120 postgres



clientbaseextend_by_block=50extend_by_block=1000
1113115118
450  220216
843  202302


Test--2 (big record insert and Data doesn't fits in shared Buffer)
--
setup:
---
./psql -d postgres -c "create table test_data(a int, b text)"
./psql -d postgres -c "insert into test_data
values(generate_series(1,1000),repeat('x', 1024))"
./psql -d postgres -c "create table data (a int)
with(extend_by_blocks=1000)"
echo "insert into data select * from test_data;" >> copy_script

test:
--
 shared_buffers=512MB  max_wal_size=20GB  checkpoint_timeout=10min
./pgbench -c $ -j $ -f copy_script -T 120 postgres


clientbaseextend_by_block=1000
1 125125
4 49  236
8 41  294
1639 279


Test--3 (small record insert and Data fits in shared Buffer)
--
setup:

./psql -d postgres -c "create table test_data(a int)"
./psql -d postgres -c "insert into test_data
values(generate_series(1,1))"
./psql -d postgres -c "create table data (a int) with(extend_by_blocks=20)"
 echo "insert into data select * from test_data;" >> copy_script

test:
-
 shared_buffers=48GB -c max_wal_size=20GB -c checkpoint_timeout=10min
./pgbench -c $ -j $ -f copy_script -T 120 postgres


clientbasePatch-extend_by_block=20
1  137  143
2  269  250
4  377  443
8 170   690
16145  745

*All test done with Data on MD and Wal on SSD

Note: Last patch have max limit of extend_by_block=100 so for taking
performance with  extend_by_block=1000 i localy changed it.
I will send the modified patch once we finalize on which approach to
proceed with.


> > I'm doubtful that anything that does the victim buffer search while
> > holding the extension lock will actually scale in a wide range of
> > scenarios.
> >
>
> I think the problem for victim buffer could be visible if the blocks
> are dirty and it needs to write the dirty buffer and especially as
> the patch is written where after acquiring the extension lock, it again
> tries to extend the relation without checking if it can get a page with
> space from FSM.  It seems to me that we should re-check the
> availability of page because while one backend is waiting on extension
> lock, other backend might have added pages.  To re-check the
> availability we might want to use something similar to
> LWLockAcquireOrWait() semantics as used during WAL writing.
>

I will work on this in next version...

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-02-05 Thread Amit Kapila
On Tue, Feb 2, 2016 at 9:19 PM, Andres Freund  wrote:
>
> On 2016-01-28 16:53:08 +0530, Dilip Kumar wrote:
> > test_script:
> > 
> > ./psql -d postgres -c "truncate table data"
> > ./psql -d postgres -c "checkpoint"
> > ./pgbench -f copy_script -T 120 -c$ -j$ postgres
> >
> > Shared Buffer48GB
> > Table:Unlogged Table
> > ench -c$ -j$ -f -M Prepared postgres
> >
> > ClientsBasepatch
> > 1178   180
> > 2337   338
> > 4265   601
> > 8167   805
>
> Could you also measure how this behaves for an INSERT instead of a COPY
> workload?
>

I think such a test will be useful.

>
> I'm doubtful that anything that does the victim buffer search while
> holding the extension lock will actually scale in a wide range of
> scenarios.
>

I think the problem for victim buffer could be visible if the blocks
are dirty and it needs to write the dirty buffer and especially as
the patch is written where after acquiring the extension lock, it again
tries to extend the relation without checking if it can get a page with
space from FSM.  It seems to me that we should re-check the
availability of page because while one backend is waiting on extension
lock, other backend might have added pages.  To re-check the
availability we might want to use something similar to
LWLockAcquireOrWait() semantics as used during WAL writing.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-02-02 Thread Robert Haas
On Thu, Jan 28, 2016 at 6:23 AM, Dilip Kumar  wrote:
> [ new patch ]

This patch contains a useless hunk and also code not in PostgreSQL
style.  Get pgindent set up and it will do it correctly for you, or
look at the style of the surrounding code.

What I'm a bit murky about is *why* this should be any better than the
status quo.  I mean, the obvious explanation that pops to mind is that
extending the relation by two pages at a time relieves pressure on the
relation extension lock somehow.  One other possible explanation is
that calling RecordPageWithFreeSpace() allows multiple backends to get
access to that page at the same time, while otherwise each backend
would try to conduct a separate extension.  But in the first case,
what we ought to do is try to make the locking more efficient; and in
the second case, we might want to think about recording the first page
in the free space map too.  I don't think the problem here is that w

Here's a sketch of another approach to this problem.  Get rid of the
relation extension lock.  Instead, have an array of, say, 256 lwlocks.
Each one protects the extension of relations where hash(relfilenode) %
256 maps to that lock.  To extend a relation, grab the corresponding
lwlock, do the work, then release the lwlock.  You might occasionally
have a situation where two relations are both being extended very
quickly and happen to map to the same bucket, but that shouldn't be
much of a problem in practice, and the way we're doing it presently is
worse, not better, since two relation extension locks may very easily
map to the same lock manager partition.  The only problem with this is
that acquiring an LWLock holds off interrupts, and we don't want
interrupts to be locked out across a potentially lengthy I/O.  We
could partially fix that if we call RESUME_INTERRUPTS() after
acquiring the lock and HOLD_INTERRUPTS() just before releasing it, but
there's still the problem that you might block non-interruptibly while
somebody else has the lock.  I don't see an easy solution to that
problem right off-hand, but if something like this performs well we
can probably conjure up some solution to that problem.

I'm not saying that we need to do that exact thing - but I am saying
that I don't think we can proceed with an approach like this without
first understanding why it works and whether there's some other way
that might be better to address the underlying problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-02-02 Thread Andres Freund
On 2016-02-02 10:12:38 -0500, Robert Haas wrote:
> Here's a sketch of another approach to this problem.  Get rid of the
> relation extension lock.  Instead, have an array of, say, 256 lwlocks.
> Each one protects the extension of relations where hash(relfilenode) %
> 256 maps to that lock.  To extend a relation, grab the corresponding
> lwlock, do the work, then release the lwlock.  You might occasionally
> have a situation where two relations are both being extended very
> quickly and happen to map to the same bucket, but that shouldn't be
> much of a problem in practice, and the way we're doing it presently is
> worse, not better, since two relation extension locks may very easily
> map to the same lock manager partition.

I guess you suspect that the performance problems come from the
heavyweight lock overhead? That's not what I *think* I've seen in
profiles, but it's hard to conclusively judge that.

I kinda doubt that really solves the problem, profiles aside,
though. The above wouldn't really get rid of the extension locks, it
just changes the implementation a bit. We'd still do victim buffer
search, and filesystem operations, while holding an exclusive
lock. Batching can solve some of that, but I think primarily we need
more granular locking, or get rid of locks entirely.


> The only problem with this is that acquiring an LWLock holds off
> interrupts, and we don't want interrupts to be locked out across a
> potentially lengthy I/O.  We could partially fix that if we call
> RESUME_INTERRUPTS() after acquiring the lock and HOLD_INTERRUPTS()
> just before releasing it, but there's still the problem that you might
> block non-interruptibly while somebody else has the lock.  I don't see
> an easy solution to that problem right off-hand, but if something like
> this performs well we can probably conjure up some solution to that
> problem.

Hm. I think to get rid of the HOLD_INTERRUPTS() we'd have to to record
what lock we were waiting on, and in which mode, before going into
PGSemaphoreLock(). Then LWLockReleaseAll() could "hand off" the wakeup
to the next waiter in the queue. Without that we'd sometimes end up with
absorbing a wakeup without then releasing the lock, causing everyone to
block on a released lock.

There's probably two major questions around that: Will it have a
performance impact, and will there be any impact on existing callers?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-02-02 Thread Robert Haas
On Tue, Feb 2, 2016 at 10:49 AM, Andres Freund  wrote:
> I'm doubtful that anything that does the victim buffer search while
> holding the extension lock will actually scale in a wide range of
> scenarios. The copy scenario here probably isn't too bad because the
> copy ring buffes are in use, and because there's no reads increasing the
> usagecount of recent buffers; thus a victim buffers are easily found.

I agree that's an avenue we should try to explore.  I haven't had any
time to think much about how it should be done, but it seems like it
ought to be possible somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-02-02 Thread Alvaro Herrera
Andres Freund wrote:

> Could you also measure how this behaves for [...]

While we're proposing benchmark cases -- I remember this being an issue
with toast tables getting very large values of xml which causes multiple
toast pages to be extended for each new value inserted.  If there are
multiple processes inserting these all the time, things get clogged.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-02-02 Thread Andres Freund
On 2016-01-28 16:53:08 +0530, Dilip Kumar wrote:
> test_script:
> 
> ./psql -d postgres -c "truncate table data"
> ./psql -d postgres -c "checkpoint"
> ./pgbench -f copy_script -T 120 -c$ -j$ postgres
> 
> Shared Buffer48GB
> Table:Unlogged Table
> ench -c$ -j$ -f -M Prepared postgres
> 
> ClientsBasepatch
> 1178   180
> 2337   338
> 4265   601
> 8167   805

Could you also measure how this behaves for an INSERT instead of a COPY
workload? Both throughput and latency. It's quite possible that this
causes latency hikes, because suddenly backends will have to wait for
one other to extend by 50 pages. You'll probably have to use -P 1 or
full statement logging to judge that.  I think just having a number of
connections inserting relatively wide rows into one table should do the
trick.

I'm doubtful that anything that does the victim buffer search while
holding the extension lock will actually scale in a wide range of
scenarios. The copy scenario here probably isn't too bad because the
copy ring buffes are in use, and because there's no reads increasing the
usagecount of recent buffers; thus a victim buffers are easily found.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-01-28 Thread Dilip Kumar
On Thu, Jan 28, 2016 at 4:53 PM, Dilip Kumar  wrote:

> I did not find in regression in normal case.
> Note: I tested it with previous patch extend_num_pages=10 (guc parameter)
> so that we can see any impact on overall system.
>

Just forgot to mentioned That i have run pgbench read-write case.

S.F: 300

./pgbench -j $ -c $ -T 1800 -M Prepared postgres

Tested with 1,8,16,32,64 Threads and did not see any regression with patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-01-28 Thread Dilip Kumar
On Mon, Jan 25, 2016 at 11:59 AM, Dilip Kumar  wrote:

1.
>> Patch is not getting compiled.
>>
>> 1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
>> identifier
>>
> Oh, My mistake, my preprocessor is ignoring this error and relacing it
> with BLKSIZE
>

FIXED


> 3. I think you don't need to multi-extend a relation if
>> HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
>> get a new page by extending a relation.
>>
>
> So i will change this..
>

FIXED

>
> 4. Again why do you need this multi-extend optimization for local
>> relations (those only accessible to current backend)?
>>
>
> I think we can change this while adding the  table level
> "extend_by_blocks" for local table we will not allow this property, so no
> need to change at this place.
> What do you think ?
>

Now I have added table level parameter for specifying the number of blocks,
So do you think that we still need to block it, as user can control it,
Moreover i think if user want to use for local table then no harm in it at
least by extending in one shot he avoid multiple call of Extension lock,
though there will be no contention.

What is your opinion ?


5. Do we need this for nbtree as well?  One way to check that
>> is by Copying large data in table having index.
>>
>> Ok, i will try this test and update.
>

I tried to load data to table with Index and tried to analyze bottleneck
using perf, and found btcompare was taking maximum time, still i don't deny
that it can not get benefited by multi extend.

So i tried quick POC for this, but i realize that even though we extend
multiple page for index and add to FSM, it will be updated only in current
page, Information about new free page will be propagated to root page only
during vacuum, And unlike heap Btree always search FSM from root and it
will not find the extra added pages.

So i think we can analyze this topic separately for index multi extend and
find is there are cases where index multi extend can give benefit.

Note: Test with both data and WAL on Magnetic Disk : No significant
>>> improvement visible
>>> -- I think wall write is becoming bottleneck in this case.
>>>
>>>
>> In that case, can you try the same test with un-logged tables?
>>
> Date with un-logged table

Test Init:

./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";
echo COPY data from '/tmp/copybinary' WITH BINARY; > copy_script
./psql -d postgres -c "create unlogged table data (data text)" --> base
./psql -d postgres -c "create unlogged table data (data text)
with(extend_by_blocks=50)" --patch

test_script:

./psql -d postgres -c "truncate table data"
./psql -d postgres -c "checkpoint"
./pgbench -f copy_script -T 120 -c$ -j$ postgres

Shared Buffer48GB
Table:Unlogged Table
ench -c$ -j$ -f -M Prepared postgres

ClientsBasepatch
1178   180
2337   338
4265   601
8167   805

Also, it is good to check the performance of patch with read-write work
>> load to ensure that extending relation in multiple-chunks should not
>> regress such cases.
>>
>
> Ok
>

I did not find in regression in normal case.
Note: I tested it with previous patch extend_num_pages=10 (guc parameter)
so that we can see any impact on overall system.


> Currently i have kept extend_num_page as session level parameter but i
>>> think later we can make this as table property.
>>> Any suggestion on this ?
>>>
>>>
>> I think we should have a new storage_parameter at table level
>> extend_by_blocks or something like that instead of GUC. The
>> default value of this parameter should be 1 which means retain
>> current behaviour by default.
>>
> +1
>

Changed it to table level storage parameter. I kept max_limit to 100 any
suggestion on this ? should we increase it ?

latest patch is attached..

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 86b9ae1..76f9a21 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -268,6 +268,16 @@ static relopt_int intRelOpts[] =
 #endif
 	},
 
+	{
+		{
+			"extend_by_blocks",
+			"Number of blocks to be added to relation in every extend call",
+			RELOPT_KIND_HEAP,
+			AccessExclusiveLock
+		},
+		1, 1, 100
+	},
+
 	/* list terminator */
 	{{NULL}}
 };
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_scale_factor)},
 		{"user_catalog_table", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, user_catalog_table)}
+		offsetof(StdRdOptions, user_catalog_table)},
+		{"extend_by_blocks", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, extend_by_blocks)}
 	};
 
 	options = 

Re: [HACKERS] Relation extension scalability

2016-01-24 Thread Dilip Kumar
On Sat, Jan 23, 2016 at 4:28 PM, Amit Kapila 
wrote:

> I found one more problem with patch.
>
> ! UnlockReleaseBuffer(buffer);
> ! RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer),
> freespace);
>
> You can't call BufferGetBlockNumber(buffer) after releasing
> the pin on buffer which will be released by
> UnlockReleaseBuffer().  Get the block number before unlocking
> the buffer.
>

Good catch, will fix this also in next version.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-01-24 Thread Dilip Kumar
On Sat, Jan 23, 2016 at 12:19 PM, Amit Kapila 
wrote
>
>
> Few comments about patch:
>
Thanks for reviewing..


> 1.
> Patch is not getting compiled.
>
> 1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
> identifier
> 1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
> identifier
> 1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
> identifier
>
Oh, My mistake, my preprocessor is ignoring this error and relacing it with
BLKSIZE

I will fix in next version of patch.

> 2.
> ! page = BufferGetPage(buffer);
> ! PageInit(page, BufferGetPageSize
> (buf), 0);
> !
> ! freespace = PageGetHeapFreeSpace(page);
> !
> MarkBufferDirty(buffer);
> ! UnlockReleaseBuffer(buffer);
> !
> RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer), freespace);
>
> What is the need to mark page dirty here, won't it automatically
> be markerd dirty once the page is used?  I think it is required
> if you wish to WAL-log this action.
>

These pages  are not going to be used immediately and we have done PageInit
so i think it should be marked dirty before adding to FSM, so that if
buffer get replaced out then it flushes the init data.


> 3. I think you don't need to multi-extend a relation if
> HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
> get a new page by extending a relation.
>

Yes, if HEAP_INSERT_SKIP_FSM is used and we use multi-extend atleast in
current transaction it will not take pages from FSM and everytime it will
do multi-extend, however pages will be used if there are parallel backend,
but still not a good idea to extend every time in multiple chunk in current
backend.

So i will change this..

4. Again why do you need this multi-extend optimization for local
> relations (those only accessible to current backend)?
>

I think we can change this while adding the  table level "extend_by_blocks"
for local table we will not allow this property, so no need to change at
this place.

What do you think ?

5. Do we need this for nbtree as well?  One way to check that
> is by Copying large data in table having index.
>
> Ok, i will try this test and update.



> Note: Test with both data and WAL on Magnetic Disk : No significant
>> improvement visible
>> -- I think wall write is becoming bottleneck in this case.
>>
>>
> In that case, can you try the same test with un-logged tables?
>

OK

>
> Also, it is good to check the performance of patch with read-write work
> load to ensure that extending relation in multiple-chunks should not
> regress such cases.
>

Ok


>
> Currently i have kept extend_num_page as session level parameter but i
>> think later we can make this as table property.
>> Any suggestion on this ?
>>
>>
> I think we should have a new storage_parameter at table level
> extend_by_blocks or something like that instead of GUC. The
> default value of this parameter should be 1 which means retain
> current behaviour by default.
>

+1


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-01-23 Thread Amit Kapila
On Sat, Jan 23, 2016 at 12:19 PM, Amit Kapila 
wrote:

> On Tue, Jan 12, 2016 at 2:41 PM, Dilip Kumar 
> wrote:
>
>> On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund  wrote:
>>
>>> On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
>>>
>>> I think it's a worthwhile approach to pursue. But until it actually
>>> fixes the problem of leaving around uninitialized pages I don't think
>>> it's very meaningful to do performance comparisons.
>>>
>>
>> Attached patch solves this issue, I am allocating the buffer for each
>> page and initializing the page, only after that adding to FSM.
>>
>
> Few comments about patch:
>
>
I found one more problem with patch.

! UnlockReleaseBuffer(buffer);
! RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer),
freespace);

You can't call BufferGetBlockNumber(buffer) after releasing
the pin on buffer which will be released by
UnlockReleaseBuffer().  Get the block number before unlocking
the buffer.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-01-22 Thread Amit Kapila
On Tue, Jan 12, 2016 at 2:41 PM, Dilip Kumar  wrote:

> On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund  wrote:
>
>> On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
>>
>> I think it's a worthwhile approach to pursue. But until it actually
>> fixes the problem of leaving around uninitialized pages I don't think
>> it's very meaningful to do performance comparisons.
>>
>
> Attached patch solves this issue, I am allocating the buffer for each page
> and initializing the page, only after that adding to FSM.
>

Few comments about patch:

1.
Patch is not getting compiled.

1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier

2.
! page = BufferGetPage(buffer);
! PageInit(page, BufferGetPageSize
(buf), 0);
!
! freespace = PageGetHeapFreeSpace(page);
!
MarkBufferDirty(buffer);
! UnlockReleaseBuffer(buffer);
!
RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer), freespace);

What is the need to mark page dirty here, won't it automatically
be markerd dirty once the page is used?  I think it is required
if you wish to WAL-log this action.

3. I think you don't need to multi-extend a relation if
HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
get a new page by extending a relation.

4. Again why do you need this multi-extend optimization for local
relations (those only accessible to current backend)?

5. Do we need this for nbtree as well?  One way to check that
is by Copying large data in table having index.



> Note: Test with both data and WAL on Magnetic Disk : No significant
> improvement visible
> -- I think wall write is becoming bottleneck in this case.
>
>
In that case, can you try the same test with un-logged tables?

Also, it is good to check the performance of patch with read-write work
load to ensure that extending relation in multiple-chunks should not
regress such cases.



> Currently i have kept extend_num_page as session level parameter but i
> think later we can make this as table property.
> Any suggestion on this ?
>
>
I think we should have a new storage_parameter at table level
extend_by_blocks or something like that instead of GUC. The
default value of this parameter should be 1 which means retain
current behaviour by default.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-01-12 Thread Dilip Kumar
On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund  wrote:

> On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
>
> I think it's a worthwhile approach to pursue. But until it actually
> fixes the problem of leaving around uninitialized pages I don't think
> it's very meaningful to do performance comparisons.
>

Attached patch solves this issue, I am allocating the buffer for each page
and initializing the page, only after that adding to FSM.


> > a. Extend the relation page by page and add it to FSM without
> initializing
> > it.  I think this is what the current patch of Dilip seems to be doing.
> If
> > we
> I think that's pretty much unacceptable, for the non-error path at
> least.
>

Performance results:

Test Case:

./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";

echo COPY data from '/tmp/copybinary' WITH BINARY; > copy_script

./psql -d postgres -c "truncate table data"
./psql -d postgres -c "checkpoint"
./pgbench -f copy_script -T 120 -c$ -j$ postgres

Test Summary:

1. I have measured the performance of base and patch.
2. With patch there are multiple results, that are with different values of
"extend_num_pages" (parameter which says how many extra block to extend)

Test with Data on magnetic Disk and WAL on SSD

Shared Buffer :48GB
max_wal_size :10GB
Storage  :  Magnetic Disk
WAL   :  SSD

 tps with different value of extend_num_page



Client   Base 10-Page   20-Page   50-Page

1105  103 157 129
2217  219 255 288
4210  421 494 486
8166  605 702 701
16   145  484 563 686
32   124  477 480 745

Test with Data and WAL on SSD
---

Shared Buffer :   48GB
Max Wal Size :   10GB
Storage  :   SSD

 tps with different value of extend_num_page



Client   Base 10-Page   20-Page   50-Page   100-Page

1  152  153  155  147  157
2  281  281  292  275  287
4  236  505  502  508  514
8  171  662  687  767  764
16 145  527  639  826  907

Note: Test with both data and WAL on Magnetic Disk : No significant
improvement visible
-- I think wall write is becoming bottleneck in this case.

Currently i have kept extend_num_page as session level parameter but i
think later we can make this as table property.
Any suggestion on this ?

Apart from this approach, I also tried extending the file in multiple block
in one extend call, but this approach (extending one by one) is performing
better.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
*** a/src/backend/access/heap/hio.c
--- b/src/backend/access/heap/hio.c
***
*** 24,29 
--- 24,30 
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  
+ int extend_num_pages = 0;
  
  /*
   * RelationPutHeapTuple - place tuple at specified page
***
*** 238,243  RelationGetBufferForTuple(Relation relation, Size len,
--- 239,245 
  	BlockNumber targetBlock,
  otherBlock;
  	bool		needLock;
+ 	int			totalBlocks;
  
  	len = MAXALIGN(len);		/* be conservative */
  
***
*** 449,467  RelationGetBufferForTuple(Relation relation, Size len,
  	 * it worth keeping an accurate file length in shared memory someplace,
  	 * rather than relying on the kernel to do it for us?
  	 */
! 	buffer = ReadBufferBI(relation, P_NEW, bistate);
! 
! 	/*
! 	 * We can be certain that locking the otherBuffer first is OK, since it
! 	 * must have a lower page number.
! 	 */
! 	if (otherBuffer != InvalidBuffer)
! 		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
! 
! 	/*
! 	 * Now acquire lock on the new page.
! 	 */
! 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
  
  	/*
  	 * Release the file-extension lock; it's now OK for someone else to extend
--- 451,491 
  	 * it worth keeping an accurate file length in shared memory someplace,
  	 * rather than relying on the kernel to do it for us?
  	 */
! 
! 	totalBlocks = extend_num_pages;
! 
! 	do {
! 
! 
! 		buffer = ReadBufferBI(relation, P_NEW, bistate);
! 
! 		/*
! 		 * We can be certain that locking the otherBuffer first is OK, since it
! 		 * must have a lower page number.
! 		 */
! 		if ((otherBuffer != InvalidBuffer) && !totalBlocks)
! 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
! 
! 		/*
! 		 * Now acquire lock on the new page.
! 		

Re: [HACKERS] Relation extension scalability

2016-01-07 Thread Andres Freund
On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
> What I could understand from above e-mail is that Dilip has tried to
> extend relation multiple-pages-at-a-time and observed that it gives
> comparable or better performance as compare to Andres's idea of
> lock-free extension and it doesn't regress the low-thread count case.

I think it's a worthwhile approach to pursue. But until it actually
fixes the problem of leaving around uninitialized pages I don't think
it's very meaningful to do performance comparisons.

> Now, I think here point to discuss is that there could be multiple-ways
> for extending a relation multiple-pages-at-a-time like:
> 
> a. Extend the relation page by page and add it to FSM without initializing
> it.  I think this is what the current patch of Dilip seems to be doing. If
> we
> want to go via this route, then we need to ensure that whenever we get
> the page from FSM, if it is empty and not initialised, then initialise
> it.

I think that's pretty much unacceptable, for the non-error path at
least.

One very simple, linux specific, approach would be to simply do
fallocate(FALLOC_FL_KEEP_SIZE) to extend the file, that way space is
pre-allocated, but not yet marked as allocated.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-01-06 Thread Robert Haas
On Thu, Dec 31, 2015 at 6:22 AM, Dilip Kumar  wrote:

> On Fri, Dec 18, 2015 at 10:51 AM, Dilip Kumar 
> wrote:
>
> On Sun, Jul 19 2015 9:37 PM Andres Wrote,
>>
>> > The situation the read() protect us against is that two backends try to
>> > extend to the same block, but after one of them succeeded the buffer is
>> > written out and reused for an independent page. So there is no in-memory
>> > state telling the slower backend that that page has already been used.
>>
>> I was looking into this patch, and done some performance testing..
>>
>> Currently i have done testing my my local machine, later i will perform
>> on big machine once i get access to that.
>>
>> Just wanted to share the current result what i get i my local machine
>> Machine conf (Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz, 8 core and 16GM
>> of RAM).
>>
>> Test Script:
>> ./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
>> 1) g(i)) TO '/tmp/copybinarywide' WITH BINARY";
>>
>> ./psql -d postgres -c "truncate table data"
>> ./psql -d postgres -c "checkpoint"
>> ./pgbench -f copy_script -T 120 -c$ -j$ postgres
>>
>
> This time i have done some testing on big machine with* 64 physical cores
> @ 2.13GHz and 50GB of RAM*
>
> There is performance comparison of base, extend without
> RelationExtensionLock patch given by Andres and
> multi-extend patch (this will extend the multiple blocks at a time based
> on a configuration parameter.)
>
>
> *Problem Analysis:*
> 1. With base code when i try to observe the problem using perf and other
> method (gdb), i found that RelationExtensionLock is main bottleneck.
> 2. Then after using RelationExtensionLock Free patch, i observed now
> contention is FileWrite (All backends are trying to extend the file.)
>
>
> *Performance Summary and
> Analysis:*
> 1. In my performance results Multi Extend shown best performance and
> scalability.
> 2. I think by extending in multiple blocks we solves both the
> problem(Extension Lock and Parallel File Write).
> 3. After extending one Block immediately adding to FSM so in most of the
> cases other backend can directly find it without taking extension lock.
>
> Currently the patch is in initial stage, i have done only test performance
> and pass the regress test suit.
>
>
>
> *Open problems -*
> 1. After extending the page we are adding it directly to FSM, so if vacuum
> find this page as new it will give WARNING.
> 2. In RelationGetBufferForTuple, when PageIsNew, we are doing PageInit,
> same need to be consider for index cases.
>
>
>
> *Test Script:-*
> ./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
> 1) g(i)) TO '/tmp/copybinarywide' WITH BINARY";
>
> ./psql -d postgres -c "truncate table data"
> ./psql -d postgres -c "checkpoint"
> ./pgbench -f copy_script -T 120 -c$ -j$ postgres
>
> *Performanec Data:*
> --
> *There are Three code base for performance*
> 1. Base Code
>
> 2. Lock Free Patch : patch given in below thread
> *http://www.postgresql.org/message-id/20150719140746.gh25...@awork2.anarazel.de
> *
>
> 3. Multi extend patch attached in the mail.
> *#extend_num_pages : *This this new config parameter to tell how many
> extra page extend in case of normal extend..
> may be it will give more control to user if we make it relation property.
>
> I will work on the patch for this CF, so adding it to CF.
>
>
> *Shared Buffer 48 GB*
>
>
> *Clients* *Base (TPS)*
> *Lock Free Patch* *Multi-extend **extend_num_pages=5* 1 142 138 148 2 251
> 253 280 4 237 416 464 8 168 491 575 16 141 448 404 32 122 337 332
>
>
>
> *Shared Buffer 64 MB*
>
>
> *Clients* *Base (TPS)* *Multi-extend **extend_num_pages=5*
> 1 140 148
> 2 252 266
> 4 229 437
> 8 153 475
> 16 132 364
>
>
I'm not really sure what this email is trying to say.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Relation extension scalability

2015-12-31 Thread Dilip Kumar
On Fri, Dec 18, 2015 at 10:51 AM, Dilip Kumar  wrote:

> On Sun, Jul 19 2015 9:37 PM Andres Wrote,
>
> > The situation the read() protect us against is that two backends try to
> > extend to the same block, but after one of them succeeded the buffer is
> > written out and reused for an independent page. So there is no in-memory
> > state telling the slower backend that that page has already been used.
>
> I was looking into this patch, and done some performance testing..
>
> Currently i have done testing my my local machine, later i will perform on
> big machine once i get access to that.
>
> Just wanted to share the current result what i get i my local machine
> Machine conf (Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz, 8 core and 16GM
> of RAM).
>
> Test Script:
> ./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
> 1) g(i)) TO '/tmp/copybinarywide' WITH BINARY";
>
> ./psql -d postgres -c "truncate table data"
> ./psql -d postgres -c "checkpoint"
> ./pgbench -f copy_script -T 120 -c$ -j$ postgres
>

This time i have done some testing on big machine with* 64 physical cores @
2.13GHz and 50GB of RAM*

There is performance comparison of base, extend without
RelationExtensionLock patch given by Andres and
multi-extend patch (this will extend the multiple blocks at a time based on
a configuration parameter.)


*Problem Analysis:*
1. With base code when i try to observe the problem using perf and other
method (gdb), i found that RelationExtensionLock is main bottleneck.
2. Then after using RelationExtensionLock Free patch, i observed now
contention is FileWrite (All backends are trying to extend the file.)


*Performance Summary and
Analysis:*
1. In my performance results Multi Extend shown best performance and
scalability.
2. I think by extending in multiple blocks we solves both the
problem(Extension Lock and Parallel File Write).
3. After extending one Block immediately adding to FSM so in most of the
cases other backend can directly find it without taking extension lock.

Currently the patch is in initial stage, i have done only test performance
and pass the regress test suit.



*Open problems -*
1. After extending the page we are adding it directly to FSM, so if vacuum
find this page as new it will give WARNING.
2. In RelationGetBufferForTuple, when PageIsNew, we are doing PageInit,
same need to be consider for index cases.



*Test Script:-*
./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinarywide' WITH BINARY";

./psql -d postgres -c "truncate table data"
./psql -d postgres -c "checkpoint"
./pgbench -f copy_script -T 120 -c$ -j$ postgres

*Performanec Data:*
--
*There are Three code base for performance*
1. Base Code

2. Lock Free Patch : patch given in below thread
*http://www.postgresql.org/message-id/20150719140746.gh25...@awork2.anarazel.de
*

3. Multi extend patch attached in the mail.
*#extend_num_pages : *This this new config parameter to tell how many extra
page extend in case of normal extend..
may be it will give more control to user if we make it relation property.

I will work on the patch for this CF, so adding it to CF.


*Shared Buffer 48 GB*


*Clients* *Base (TPS)*
*Lock Free Patch* *Multi-extend **extend_num_pages=5* 1 142 138 148 2 251
253 280 4 237 416 464 8 168 491 575 16 141 448 404 32 122 337 332



*Shared Buffer 64 MB*


*Clients* *Base (TPS)* *Multi-extend **extend_num_pages=5*
1 140 148
2 252 266
4 229 437
8 153 475
16 132 364


Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
***
*** 771,776  brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
--- 771,781 
  			UnlockRelationForExtension(irel, ExclusiveLock);
  
  		page = BufferGetPage(buf);
+ 		if (PageIsNew(page))
+ 		{
+ 			MarkBufferDirty(buf);
+ 			PageInit(page, BufferGetPageSize(buf), 0);
+ 		}
  
  		/*
  		 * We have a new buffer to insert into.  Check that the new page has
*** a/src/backend/access/heap/hio.c
--- b/src/backend/access/heap/hio.c
***
*** 393,398  RelationGetBufferForTuple(Relation relation, Size len,
--- 393,404 
  		 * we're done.
  		 */
  		page = BufferGetPage(buffer);
+ 		if (PageIsNew(page))
+ 		{
+ 			MarkBufferDirty(buffer);
+ 			PageInit(page, BufferGetPageSize(buffer), 0);
+ 		}
+ 
  		pageFreeSpace = PageGetHeapFreeSpace(page);
  		if (len + saveFreeSpace <= pageFreeSpace)
  		{
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***
*** 90,95  int			effective_io_concurrency = 0;
--- 90,96 
   * effective_io_concurrency parameter set.
   */
  int			

Re: [HACKERS] Relation extension scalability

2015-12-17 Thread Dilip Kumar
On Sun, Jul 19 2015 9:37 PM Andres Wrote,

> The situation the read() protect us against is that two backends try to
> extend to the same block, but after one of them succeeded the buffer is
> written out and reused for an independent page. So there is no in-memory
> state telling the slower backend that that page has already been used.

I was looking into this patch, and done some performance testing..

Currently i have done testing my my local machine, later i will perform on
big machine once i get access to that.

Just wanted to share the current result what i get i my local machine
Machine conf (Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz, 8 core and 16GM of
RAM).

Test Script:
./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinarywide' WITH BINARY";

./psql -d postgres -c "truncate table data"
./psql -d postgres -c "checkpoint"
./pgbench -f copy_script -T 120 -c$ -j$ postgres

*Summary of the results:*
1. When data fits into shared buffer improvement is not visible, but when
it don't then some improvement is visible in my local machine (still does
not seems to be scaling, may be we can see some different behaviour in big
machine), Thats because in first case it need not to flush the buffer out.

2. As per Tom's analysis since we are doing extra read it will reduce
performance in lower no of client where RelationExtensionLock is not
bottleneck and same is visible in test result.

As discussed earlier, what about keeping the RelationExtentionLock as it is
and just do the victim buffer search and buffer flushing outside the lock,
that way we can save extra read. Correct me if i have missed something in
this..


Shared Buffer 512 MB
-
Client:  Tps Base Tps Patch
1   145  126
2211  246
4248  302
8225  234


Shared Buffer 5GB
-
Client:  Tps Base Tps Patch
1   165 156
2237244
4294296
8253247


Also observed one problem with patch,

@@ -433,63 +434,50 @@ RelationGetBufferForTuple(Relation relation, Size len,
+ while(true)
+ {
+ buffer = ExtendRelation(relation, MAIN_FORKNUM, bistate->strategy);

bistate can be NULL if direct insert instead of copy case


Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


On Sun, Jul 19, 2015 at 9:37 PM, Andres Freund  wrote:

> On 2015-07-19 11:56:47 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2015-07-19 11:28:25 -0400, Tom Lane wrote:
> > >> At this point session 1 will go and create page 44, won't it, and you
> > >> just wasted a page.
> >
> > > My local code now recognizes that case and uses the page. We just need
> > > to do an PageIsNew().
> >
> > Er, what?  How can you tell whether an all-zero page was or was not
> > just written by another session?
>
> The check is only done while holding the io lock on the relevant page
> (have to hold that anyway), after reading it in ourselves, just before
> setting BM_VALID. As we only can get to that point when there wasn't any
> other entry for the page in the buffer table, that guarantees that no
> other backend isn't currently expanding into that page. Others might
> wait to read it, but those'll wait behind the IO lock.
>
>
> The situation the read() protect us against is that two backends try to
> extend to the same block, but after one of them succeeded the buffer is
> written out and reused for an independent page. So there is no in-memory
> state telling the slower backend that that page has already been used.
>
> Andres
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
Hi,

I, every now and then, spent a bit of time making this more efficient
over the last few weeks.

I had a bit of a problem to reproduce the problems I'd seen in
production on physical hardware (found EC2 to be to variable to
benchmark this), but luckily 2ndQuadrant today allowed me access to
their four socket machine[1] of the AXLE project.  Thanks Simon and
Tomas!

First, some mostly juicy numbers:

My benchmark was a parallel COPY into a single wal logged target
table:
CREATE TABLE data(data text);
The source data has been generated with
narrow:
COPY (select g.i::text FROM generate_series(1, 1) g(i)) TO 
'/tmp/copybinary' WITH BINARY;
wide:
COPY (select repeat(random()::text, 10) FROM generate_series(1, 1) g(i)) TO 
'/tmp/copybinarywide' WITH BINARY;

Between every test I ran a TRUNCATE data; CHECKPOINT;

For each number of clients I ran pgbench for 70 seconds. I'd previously
determined using -P 1 that the numbers are fairly stable. Longer runs
would have been nice, but then I'd not have finished in time.

shared_buffers = 48GB, narrow table contents:
client tps after:  tps before:
1  180.255577  210.125143
2  338.231058  391.875088
4  638.814300  405.243901
8  1126.852233 370.922271
16 1242.363623 498.487008
32 1229.648854 484.477042
48 1223.288397 468.127943
64 1198.007422 438.238119
96 1201.501278 370.556354
1281198.554929 288.213032
1961189.603398 193.841993
2561144.082291 191.293781
512643.323675  200.782105

shared_buffers = 1GB, narrow table contents:
client tps after:  tps before:
1  191.137410  210.787214
2  351.293017  384.086634
4  649.800991  420.703149
8  1103.770749 355.947915
16 1287.192256 489.050768
32 1226.329585 464.936427
48 1187.266489 443.386440
64 1182.698974 402.251258
96 1208.315983 331.290851
1281183.469635 269.250601
1961202.847382 202.788617
2561177.924515 190.876852
512572.457773  192.413191

1
shared_buffers = 48GB, wide table contents:
client tps after:  tps before:
1  59.685215   68.445331
2  102.034688  103.210277
4  179.434065  78.982315
8  222.613727  76.195353
16 232.162484  77.520265
32 231.979136  71.654421
48 231.981216  64.730114
64 230.955979  57.444215
96 228.016910  56.324725
128227.693947  45.701038
196227.410386  37.138537
256224.626948  35.265530
512105.356439  34.397636

shared_buffers = 1GB, wide table contents:
(ran out of patience)

Note that the peak performance with the patch is significantly better,
but there's currently a noticeable regression in single threaded
performance. That undoubtedly needs to be addressed.


So, to get to the actual meat: My goal was to essentially get rid of an
exclusive lock over relation extension alltogether. I think I found a
way to do that that addresses the concerns made in this thread.

Thew new algorithm basically is:
1) Acquire victim buffer, clean it, and mark it as pinned
2) Get the current size of the relation, save buffer into blockno
3) Try to insert an entry into the buffer table for blockno
4) If the page is already in the buffer table, increment blockno by 1,
   goto 3)
5) Try to read the page. In most cases it'll not yet exist. But the page
   might concurrently have been written by another backend and removed
   from shared buffers already. If already existing, goto 1)
6) Zero out the page on disk.

I think this does handle the concurrency issues.

This patch very clearly is in the POC stage. But I do think the approach
is generally sound.  I'd like to see some comments before deciding
whether to carry on.


Greetings,

Andres Freund

PS: Yes, I know that precision in the benchmark isn't warranted, but I'm
too lazy to truncate them.

[1]
[10:28:11 PM] Tomas Vondra: 4x Intel Xeon E5­4620 Eight Core 2.2GHz
Processor’s generation Sandy Bridge EP
each core handles 2 threads, so 16 threads total
256GB (16x16GB) ECC REG System Validated Memory (1333 MHz)
2x 250GB SATA 2.5” Enterprise Level HDs (RAID 1, ~250GB)
17x 600GB SATA 2.5” Solid State HDs (RAID 0, ~10TB)
LSI MegaRAID 9271­8iCC controller and Cache Vault Kit (1GB cache)
2 x Nvidia Tesla K20 Active GPU Cards (GK110GL)

From fc095897a6f4207d384559a095f80a36cf49648c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Mar 2015 20:55:32 +0200
Subject: [PATCH] WIP: Saner heap extension.

---
 src/backend/access/heap/hio.c   |  86 
 src/backend/commands/vacuumlazy.c   |  39 ++--
 src/backend/storage/buffer/bufmgr.c | 377 ++--
 src/backend/storage/smgr/md.c   |  62 ++
 

Re: [HACKERS] Relation extension scalability

2015-07-19 Thread Andres Freund
Hi,

Eeek, the attached patch included a trivial last minute screwup
(dereferencing bistate unconditionally...). Fixed version attached.

Andres
From fc095897a6f4207d384559a095f80a36cf49648c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Mar 2015 20:55:32 +0200
Subject: [PATCH] WIP: Saner heap extension.

---
 src/backend/access/heap/hio.c   |  86 
 src/backend/commands/vacuumlazy.c   |  39 ++--
 src/backend/storage/buffer/bufmgr.c | 377 ++--
 src/backend/storage/smgr/md.c   |  62 ++
 src/backend/storage/smgr/smgr.c |  20 +-
 src/include/storage/buf_internals.h |   1 +
 src/include/storage/bufmgr.h|   1 +
 src/include/storage/smgr.h  |   7 +-
 8 files changed, 417 insertions(+), 176 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 6db73bf..b47f9fe 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -15,6 +15,8 @@
 
 #include postgres.h
 
+#include miscadmin.h
+
 #include access/heapam.h
 #include access/hio.h
 #include access/htup_details.h
@@ -237,7 +239,6 @@ RelationGetBufferForTuple(Relation relation, Size len,
 saveFreeSpace;
 	BlockNumber targetBlock,
 otherBlock;
-	bool		needLock;
 
 	len = MAXALIGN(len);		/* be conservative */
 
@@ -433,63 +434,50 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	/*
 	 * Have to extend the relation.
 	 *
-	 * We have to use a lock to ensure no one else is extending the rel at the
-	 * same time, else we will both try to initialize the same new page.  We
-	 * can skip locking for new or temp relations, however, since no one else
-	 * could be accessing them.
+	 * To avoid, as it used to be the case, holding the extension lock during
+	 * victim buffer search for the new buffer, we extend the relation here
+	 * instead of relying on bufmgr.c. We still have to hold the extension
+	 * lock to prevent a race between two backends initializing the same page.
 	 */
-	needLock = !RELATION_IS_LOCAL(relation);
-
-	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
+	while(true)
+	{
+		buffer = ExtendRelation(relation, MAIN_FORKNUM, bistate-strategy);
 
-	/*
-	 * XXX This does an lseek - rather expensive - but at the moment it is the
-	 * only way to accurately determine how many blocks are in a relation.  Is
-	 * it worth keeping an accurate file length in shared memory someplace,
-	 * rather than relying on the kernel to do it for us?
-	 */
-	buffer = ReadBufferBI(relation, P_NEW, bistate);
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 
-	/*
-	 * We can be certain that locking the otherBuffer first is OK, since it
-	 * must have a lower page number.
-	 */
-	if (otherBuffer != InvalidBuffer)
-		LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
+		/*
+		 * Now acquire lock on the new page.
+		 */
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
-	/*
-	 * Now acquire lock on the new page.
-	 */
-	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+		page = BufferGetPage(buffer);
 
-	/*
-	 * Release the file-extension lock; it's now OK for someone else to extend
-	 * the relation some more.  Note that we cannot release this lock before
-	 * we have buffer lock on the new page, or we risk a race condition
-	 * against vacuumlazy.c --- see comments therein.
-	 */
-	if (needLock)
-		UnlockRelationForExtension(relation, ExclusiveLock);
+		/*
+		 * While unlikely, it's possible that another backend managed to
+		 * initialize the page and use up the free space till we got the
+		 * exclusive lock. That'd require the page to be vacuumed (to be put
+		 * on the free space list) and then be used; possible but fairly
+		 * unlikely in practice. If it happens and there's not enough space,
+		 * just retry.
+		 */
+		if (PageIsNew(page))
+		{
+			PageInit(page, BLCKSZ, 0);
 
-	/*
-	 * We need to initialize the empty new page.  Double-check that it really
-	 * is empty (this should never happen, but if it does we don't want to
-	 * risk wiping out valid data).
-	 */
-	page = BufferGetPage(buffer);
+			Assert(len = PageGetHeapFreeSpace(page));
+			break;
+		}
+		else if (len = PageGetHeapFreeSpace(page))
+			break;
 
-	if (!PageIsNew(page))
-		elog(ERROR, page %u of relation \%s\ should be empty but is not,
-			 BufferGetBlockNumber(buffer),
-			 RelationGetRelationName(relation));
+		if (otherBuffer != InvalidBuffer)
+			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
 
-	PageInit(page, BufferGetPageSize(buffer), 0);
+		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+		ReleaseBuffer(buffer);
 
-	if (len  PageGetHeapFreeSpace(page))
-	{
-		/* We should not get here given the test at the top */
-		elog(PANIC, tuple is too big: size %zu, len);
+		CHECK_FOR_INTERRUPTS();
 	}
 
 	/*
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a01cfb4..896731c 100644
--- a/src/backend/commands/vacuumlazy.c
+++ 

  1   2   >