Re: [HACKERS] Does larger i/o size make sense?

2013-08-27 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us

 Another point here is that you could get some of the hoped-for
 benefit just by increasing BLCKSZ ... but nobody's ever
 demonstrated any compelling benefit from larger BLCKSZ (except on
 specialized workloads, if memory serves).

I think I've seen a handful of reports of performance differences
with different BLCKSZ builds (perhaps not all on community lists). 
My recollection is that some people sifting through data in data
warehouse environments see a performance benefit up to 32KB, but
that tests of GiST index performance with different sizes showed
better performance with smaller sizes down to around 2KB.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-27 Thread Josh Berkus
Kevin,

 I think I've seen a handful of reports of performance differences
 with different BLCKSZ builds (perhaps not all on community lists). 
 My recollection is that some people sifting through data in data
 warehouse environments see a performance benefit up to 32KB, but
 that tests of GiST index performance with different sizes showed
 better performance with smaller sizes down to around 2KB.

I believe that Greenplum currently uses 128K.  There's a definite
benefit for the DW use-case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-27 Thread Greg Smith

On 8/27/13 3:54 PM, Josh Berkus wrote:

I believe that Greenplum currently uses 128K.  There's a definite
benefit for the DW use-case.


Since Linux read-ahead can easily give big gains on fast storage, I 
normally set that to at least 4096 sectors = 2048KB.  That's a lot 
bigger than even this, and definitely necessary for reaching maximum 
storage speed.


I don't think that the block size change alone will necessarily 
duplicate the gains on seq scans that Greenplum gets though.  They've 
done a lot more performance optimization on that part of the read path 
than just the larger block size.


As far as quantifying whether this is worth chasing, the most useful 
thing to do here is find some fast storage and profile the code with 
different block sizes at a large read-ahead.  I wouldn't spend a minute 
on trying to come up with a more complicated management scheme until the 
potential gain is measured.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-23 Thread Fabien COELHO



The big-picture problem with work in this area is that no matter how you
do it, any benefit is likely to be both platform- and workload-specific.
So the prospects for getting a patch accepted aren't all that bright.


Indeed.

Would it make sense to have something easier to configure that recompiling 
postgresql and managing a custom executable, say a block size that could 
be configured from initdb and/or postmaster.conf, or maybe per-object 
settings specified at creation time?


Note that the block size may also affect the cache behavior, for instance 
for pure random accesses, more recently accessed tuples can be kept in 
memory if the pages are smaller. So there are other reasons to play with 
the blocksize than I/O access times, and an option to do that more easily 
would help.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-23 Thread Kohei KaiGai
2013/8/23 Fabien COELHO coe...@cri.ensmp.fr:

 The big-picture problem with work in this area is that no matter how you
 do it, any benefit is likely to be both platform- and workload-specific.
 So the prospects for getting a patch accepted aren't all that bright.


 Indeed.

 Would it make sense to have something easier to configure that recompiling
 postgresql and managing a custom executable, say a block size that could be
 configured from initdb and/or postmaster.conf, or maybe per-object settings
 specified at creation time?

I love the idea of per-object block size setting according to expected workload;
maybe configured by DBA. In case when we have to run sequential scan on
large tables, larger block size may have less pain than interruption per 8KB
boundary to switch the block being currently focused on, even though random
access via index scan loves smaller block size.

 Note that the block size may also affect the cache behavior, for instance
 for pure random accesses, more recently accessed tuples can be kept in
 memory if the pages are smaller. So there are other reasons to play with the
 blocksize than I/O access times, and an option to do that more easily would
 help.

I see. Uniformed block-size could simplify the implementation, thus no need
to worry about a scenario that continuous buffer allocation push out pages to
be kept in memory.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-23 Thread Fabien COELHO



Would it make sense to have something easier to configure that recompiling
postgresql and managing a custom executable, say a block size that could be
configured from initdb and/or postmaster.conf, or maybe per-object settings
specified at creation time?


I love the idea of per-object block size setting according to expected workload;


My 0.02€: wait to see whether the idea get some positive feedback by core 
people before investing any time in that...


The per object would be a lot of work. A per initdb (so per cluster) 
setting (block size, wal size...) would much easier to implement, but it 
impacts for storage format.



large tables, larger block size may have less pain than interruption per 8KB
boundary to switch the block being currently focused on, even though random
access via index scan loves smaller block size.


Yep, as Tom noted, this is really workload specific.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-23 Thread Greg Stark
On Thu, Aug 22, 2013 at 8:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 An idea that I'd like to investigate is, PostgreSQL allocates a set of
 continuous buffers to fit larger i/o size when block is referenced due to
 sequential scan, then invokes consolidated i/o request on the buffer.
 It probably make sense if we can expect upcoming block references
 shall be on the neighbor blocks; that is typical sequential read workload.


I think it makes more sense to use scatter gather i/o or async i/o to read
to regular sized buffers scattered around memory than to restrict the
buffers to needing to be contiguous.

As others said, Postgres depends on the OS buffer cache to do readahead.
The scenario where the above becomes interesting is if it's paired with a
move to directio or other ways of skipping the buffer cache. Double caching
is a huge waste and leads to lots of inefficiencies.

The blocking issue there is that Postgres doesn't understand much about the
underlying hardware storage. If there were APIs to find out more about it
from the kernel -- how much further before the end of the raid chunk, how
much parallelism it has, how congested the i/o channel is, etc -- then
Postgres might be on par with the kernel and able to eliminate the double
buffering inefficiency and might even be able to do better if it
understands its own workload better.

If Postgres did that then it would be necessary to be able to initiate i/o
on multiple buffers in parallel. That can be done using scatter gather i/o
such as readv() and writev() but that would mean blocking on reading blocks
that might not be needed until the future. Or it could be done using libaio
to initiate i/o and return control as soon as the needed data is available
while other i/o is still pending.


-- 
greg


[HACKERS] Does larger i/o size make sense?

2013-08-22 Thread Kohei KaiGai
Hello,

A few days before, I got a question as described in the subject line on
a discussion with my colleague.

In general, larger i/o size per system call gives us wider bandwidth on
sequential read, than multiple system calls with smaller i/o size.
Probably, people knows this heuristics.

On the other hand, PostgreSQL always reads database files by BLCKSZ
(= usually, 8KB) when referenced block was not on the shared buffer,
however, it doesn't seem to me it can pull maximum performance of
modern storage system.

I'm not certain whether we had discussed this kind of ideas, or not.
So, I'd like to see the reason why we stick on the fixed length i/o size,
if similar ideas were rejected before.

An idea that I'd like to investigate is, PostgreSQL allocates a set of
continuous buffers to fit larger i/o size when block is referenced due to
sequential scan, then invokes consolidated i/o request on the buffer.
It probably make sense if we can expect upcoming block references
shall be on the neighbor blocks; that is typical sequential read workload.

Of course, we shall need to solve some complicated stuff, like prevention
of fragmentation on shared buffers, or enhancement of internal APIs of
storage manager to accept larger i/o size.
Furthermore, it seems to me this idea has worth to investigate.

Any comments please. Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-22 Thread Merlin Moncure
On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Hello,

 A few days before, I got a question as described in the subject line on
 a discussion with my colleague.

 In general, larger i/o size per system call gives us wider bandwidth on
 sequential read, than multiple system calls with smaller i/o size.
 Probably, people knows this heuristics.

 On the other hand, PostgreSQL always reads database files by BLCKSZ
 (= usually, 8KB) when referenced block was not on the shared buffer,
 however, it doesn't seem to me it can pull maximum performance of
 modern storage system.

 I'm not certain whether we had discussed this kind of ideas, or not.
 So, I'd like to see the reason why we stick on the fixed length i/o size,
 if similar ideas were rejected before.

 An idea that I'd like to investigate is, PostgreSQL allocates a set of
 continuous buffers to fit larger i/o size when block is referenced due to
 sequential scan, then invokes consolidated i/o request on the buffer.
 It probably make sense if we can expect upcoming block references
 shall be on the neighbor blocks; that is typical sequential read workload.

 Of course, we shall need to solve some complicated stuff, like prevention
 of fragmentation on shared buffers, or enhancement of internal APIs of
 storage manager to accept larger i/o size.
 Furthermore, it seems to me this idea has worth to investigate.

 Any comments please. Thanks,

Isn't this dealt with at least in part by effective i/o concurrency
and o/s readahead?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-22 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 An idea that I'd like to investigate is, PostgreSQL allocates a set of
 continuous buffers to fit larger i/o size when block is referenced due to
 sequential scan, then invokes consolidated i/o request on the buffer.

 Isn't this dealt with at least in part by effective i/o concurrency
 and o/s readahead?

I should think so.  It's very difficult to predict future block-access
requirements for anything except a seqscan, and for that, we expect the
OS will detect the access pattern and start reading ahead on its own.

Another point here is that you could get some of the hoped-for benefit
just by increasing BLCKSZ ... but nobody's ever demonstrated any
compelling benefit from larger BLCKSZ (except on specialized workloads,
if memory serves).

The big-picture problem with work in this area is that no matter how you
do it, any benefit is likely to be both platform- and workload-specific.
So the prospects for getting a patch accepted aren't all that bright.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does larger i/o size make sense?

2013-08-22 Thread Kohei KaiGai
2013/8/23 Tom Lane t...@sss.pgh.pa.us:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Aug 22, 2013 at 2:53 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 An idea that I'd like to investigate is, PostgreSQL allocates a set of
 continuous buffers to fit larger i/o size when block is referenced due to
 sequential scan, then invokes consolidated i/o request on the buffer.

 Isn't this dealt with at least in part by effective i/o concurrency
 and o/s readahead?

 I should think so.  It's very difficult to predict future block-access
 requirements for anything except a seqscan, and for that, we expect the
 OS will detect the access pattern and start reading ahead on its own.

 Another point here is that you could get some of the hoped-for benefit
 just by increasing BLCKSZ ... but nobody's ever demonstrated any
 compelling benefit from larger BLCKSZ (except on specialized workloads,
 if memory serves).

 The big-picture problem with work in this area is that no matter how you
 do it, any benefit is likely to be both platform- and workload-specific.
 So the prospects for getting a patch accepted aren't all that bright.

Hmm. I might overlook effect of readahead on operating system level.
Indeed, sequential scan has a workload that easily launches it, so
smaller i/o size in application level will be hidden.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers