Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 4:32 AM, Kouhei Kaigai  wrote:
>> Why not FileDescriptor(), FileFlags(), FileMode() as separate
>> functions like FilePathName()?
>>
> Here is no deep reason. The attached patch adds three individual
> functions.

This seems unobjectionable to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-07 Thread Kouhei Kaigai




> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Saturday, March 05, 2016 2:42 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote
> Subject: Re: [HACKERS] Way to check whether a particular block is on the
> shared_buffer?
> 
> On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai  wrote:
> > I found one other, but tiny, problem to implement SSD-to-GPU direct
> > data transfer feature under the PostgreSQL storage.
> >
> > Extension cannot know the raw file descriptor opened by smgr.
> >
> > I expect an extension issues an ioctl(2) on the special device file
> > on behalf of the special kernel driver, to control the P2P DMA.
> > This ioctl(2) will pack file descriptor of the DMA source and some
> > various information (like base position, range, destination device
> > pointer, ...).
> >
> > However, the raw file descriptor is wrapped in the fd.c, instead of
> > the File handler, thus, not visible to extension. oops...
> >
> > The attached patch provides a way to obtain raw file descriptor (and
> > relevant flags) of a particular File virtual file descriptor on
> > PostgreSQL. (No need to say, extension has to treat the raw descriptor
> > carefully not to give an adverse effect to the storage manager.)
> >
> > How about this tiny enhancement?
> 
> Why not FileDescriptor(), FileFlags(), FileMode() as separate
> functions like FilePathName()?
>
Here is no deep reason. The attached patch adds three individual
functions.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



pgsql-v9.6-filegetrawdesc.2.patch
Description: pgsql-v9.6-filegetrawdesc.2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai  wrote:
> I found one other, but tiny, problem to implement SSD-to-GPU direct
> data transfer feature under the PostgreSQL storage.
>
> Extension cannot know the raw file descriptor opened by smgr.
>
> I expect an extension issues an ioctl(2) on the special device file
> on behalf of the special kernel driver, to control the P2P DMA.
> This ioctl(2) will pack file descriptor of the DMA source and some
> various information (like base position, range, destination device
> pointer, ...).
>
> However, the raw file descriptor is wrapped in the fd.c, instead of
> the File handler, thus, not visible to extension. oops...
>
> The attached patch provides a way to obtain raw file descriptor (and
> relevant flags) of a particular File virtual file descriptor on
> PostgreSQL. (No need to say, extension has to treat the raw descriptor
> carefully not to give an adverse effect to the storage manager.)
>
> How about this tiny enhancement?

Why not FileDescriptor(), FileFlags(), FileMode() as separate
functions like FilePathName()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-03 Thread Kouhei Kaigai
I found one other, but tiny, problem to implement SSD-to-GPU direct
data transfer feature under the PostgreSQL storage.

Extension cannot know the raw file descriptor opened by smgr.

I expect an extension issues an ioctl(2) on the special device file
on behalf of the special kernel driver, to control the P2P DMA.
This ioctl(2) will pack file descriptor of the DMA source and some
various information (like base position, range, destination device
pointer, ...).

However, the raw file descriptor is wrapped in the fd.c, instead of
the File handler, thus, not visible to extension. oops...

The attached patch provides a way to obtain raw file descriptor (and
relevant flags) of a particular File virtual file descriptor on
PostgreSQL. (No need to say, extension has to treat the raw descriptor
carefully not to give an adverse effect to the storage manager.)

How about this tiny enhancement?

> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> > Sent: Saturday, February 13, 2016 1:46 PM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote
> > Subject: Re: [HACKERS] Way to check whether a particular block is on the
> > shared_buffer?
> >
> > On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai  wrote:
> > > Hmm. In my experience, it is often not a productive discussion whether
> > > a feature is niche or commodity. So, let me change the viewpoint.
> > >
> > > We may utilize OS-level locking mechanism here.
> > >
> > > Even though it depends on filesystem implementation under the VFS,
> > > we may use inode->i_mutex lock that shall be acquired during the buffer
> > > copy from user to kernel, at least, on a few major filesystems; ext4,
> > > xfs and btrfs in my research. As well, the modified NVMe SSD driver can
> > > acquire the inode->i_mutex lock during P2P DMA transfer.
> > >
> > > Once we can consider the OS buffer is updated atomically by the lock,
> > > we don't need to worry about corrupted pages, but still needs to pay
> > > attention to the scenario when updated buffer page is moved to GPU.
> > >
> > > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC
> > > infrastructure, so I intend to move all-visible pages only.
> > > If someone updates the buffer concurrently, then write out the page
> > > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because
> > > updated tuples should not be visible to the transaction which issued
> > > P2P DMA.
> > >
> > > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status
> > > that indicates CPU to retry this page again. In this case, this page is
> > > likely loaded to the shared buffer already, so retry penalty is not so
> > > much.
> > >
> > > I'll try to investigate the implementation in this way.
> > > Please correct me, if I misunderstand something (especially, treatment
> > > of PD_ALL_VISIBLE).
> >
> > I suppose there's no theoretical reason why the buffer couldn't go
> > from all-visible to not-all-visible and back to all-visible again all
> > during the time you are copying it.
> >
> The backend process that is copying the data to GPU has a transaction
> in-progress (= not committed). Is it possible to get the updated buffer
> page back to the all-visible state again?
> I expect that in-progress transactions works as a blocker for backing
> to all-visible. Right?
> 
> > Honestly, I think trying to access buffers without going through
> > shared_buffers is likely to be very hard to make correct and probably
> > a loser.
> >
> No challenge, no outcome. ;-)
> 
> > Copying the data into shared_buffers and then to the GPU is,
> > doubtless, at least somewhat slower.  But I kind of doubt that it's
> > enough slower to make up for all of the problems you're going to have
> > with the approach you've chosen.
> >
> Honestly, I'm still uncertain whether it works well as I expects.
> However, scan workload on the table larger than main memory is
> headache for PG-Strom, so I'd like to try ideas we can implement.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 
>



pgsql-v9.6-filegetrawdesc.1.patch
Description: pgsql-v9.6-filegetrawdesc.1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-13 Thread Robert Haas
On Sat, Feb 13, 2016 at 7:29 AM, Kouhei Kaigai  wrote:
>> I suppose there's no theoretical reason why the buffer couldn't go
>> from all-visible to not-all-visible and back to all-visible again all
>> during the time you are copying it.
>>
> The backend process that is copying the data to GPU has a transaction
> in-progress (= not committed). Is it possible to get the updated buffer
> page back to the all-visible state again?
> I expect that in-progress transactions works as a blocker for backing
> to all-visible. Right?

Yeah, probably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-13 Thread Kouhei Kaigai




> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Saturday, February 13, 2016 1:46 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote
> Subject: Re: [HACKERS] Way to check whether a particular block is on the
> shared_buffer?
> 
> On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai  wrote:
> > Hmm. In my experience, it is often not a productive discussion whether
> > a feature is niche or commodity. So, let me change the viewpoint.
> >
> > We may utilize OS-level locking mechanism here.
> >
> > Even though it depends on filesystem implementation under the VFS,
> > we may use inode->i_mutex lock that shall be acquired during the buffer
> > copy from user to kernel, at least, on a few major filesystems; ext4,
> > xfs and btrfs in my research. As well, the modified NVMe SSD driver can
> > acquire the inode->i_mutex lock during P2P DMA transfer.
> >
> > Once we can consider the OS buffer is updated atomically by the lock,
> > we don't need to worry about corrupted pages, but still needs to pay
> > attention to the scenario when updated buffer page is moved to GPU.
> >
> > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC
> > infrastructure, so I intend to move all-visible pages only.
> > If someone updates the buffer concurrently, then write out the page
> > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because
> > updated tuples should not be visible to the transaction which issued
> > P2P DMA.
> >
> > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status
> > that indicates CPU to retry this page again. In this case, this page is
> > likely loaded to the shared buffer already, so retry penalty is not so
> > much.
> >
> > I'll try to investigate the implementation in this way.
> > Please correct me, if I misunderstand something (especially, treatment
> > of PD_ALL_VISIBLE).
> 
> I suppose there's no theoretical reason why the buffer couldn't go
> from all-visible to not-all-visible and back to all-visible again all
> during the time you are copying it.
>
The backend process that is copying the data to GPU has a transaction
in-progress (= not committed). Is it possible to get the updated buffer
page back to the all-visible state again?
I expect that in-progress transactions works as a blocker for backing
to all-visible. Right?

> Honestly, I think trying to access buffers without going through
> shared_buffers is likely to be very hard to make correct and probably
> a loser.
>
No challenge, no outcome. ;-)

> Copying the data into shared_buffers and then to the GPU is,
> doubtless, at least somewhat slower.  But I kind of doubt that it's
> enough slower to make up for all of the problems you're going to have
> with the approach you've chosen.
>
Honestly, I'm still uncertain whether it works well as I expects.
However, scan workload on the table larger than main memory is
headache for PG-Strom, so I'd like to try ideas we can implement.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-12 Thread Robert Haas
On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai  wrote:
> Hmm. In my experience, it is often not a productive discussion whether
> a feature is niche or commodity. So, let me change the viewpoint.
>
> We may utilize OS-level locking mechanism here.
>
> Even though it depends on filesystem implementation under the VFS,
> we may use inode->i_mutex lock that shall be acquired during the buffer
> copy from user to kernel, at least, on a few major filesystems; ext4,
> xfs and btrfs in my research. As well, the modified NVMe SSD driver can
> acquire the inode->i_mutex lock during P2P DMA transfer.
>
> Once we can consider the OS buffer is updated atomically by the lock,
> we don't need to worry about corrupted pages, but still needs to pay
> attention to the scenario when updated buffer page is moved to GPU.
>
> In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC
> infrastructure, so I intend to move all-visible pages only.
> If someone updates the buffer concurrently, then write out the page
> including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because
> updated tuples should not be visible to the transaction which issued
> P2P DMA.
>
> Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status
> that indicates CPU to retry this page again. In this case, this page is
> likely loaded to the shared buffer already, so retry penalty is not so
> much.
>
> I'll try to investigate the implementation in this way.
> Please correct me, if I misunderstand something (especially, treatment
> of PD_ALL_VISIBLE).

I suppose there's no theoretical reason why the buffer couldn't go
from all-visible to not-all-visible and back to all-visible again all
during the time you are copying it.

Honestly, I think trying to access buffers without going through
shared_buffers is likely to be very hard to make correct and probably
a loser.  Copying the data into shared_buffers and then to the GPU is,
doubtless, at least somewhat slower.  But I kind of doubt that it's
enough slower to make up for all of the problems you're going to have
with the approach you've chosen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-11 Thread Kouhei Kaigai
> On Tue, Feb 9, 2016 at 6:35 PM, Kouhei Kaigai  wrote:
> > Unfortunately, it was not sufficient.
> >
> > Due to the assumption, the buffer page to be suspended does not exist
> > when a backend process issues a series P2P DMA command. (If block would
> > be already loaded to the shared buffer, it don't need to issue P2P DMA,
> > but just use usual memory<->device DMA because RAM is much faster than
> > SSD.)
> > It knows the pair of (rel,fork,block), but no BufferDesc of this block
> > exists. Thus, it cannot acquire locks in BufferDesc structure.
> >
> > Even if the block does not exist at this point, concurrent process may
> > load the same page. BufferDesc of this page shall be assigned at this
> > point, however, here is no chance to lock something in BufferDesc for
> > the process which issues P2P DMA command.
> >
> > It is the reason why I assume the suspend/resume mechanism shall take
> > a pair of (rel,fork,block) as identifier of the target block.
> 
> I see the problem, but I'm not terribly keen on putting in the hooks
> that it would take to let you solve it without hacking core.  It
> sounds like an awfully invasive thing for a pretty niche requirement.
>
Hmm. In my experience, it is often not a productive discussion whether
a feature is niche or commodity. So, let me change the viewpoint.

We may utilize OS-level locking mechanism here.

Even though it depends on filesystem implementation under the VFS,
we may use inode->i_mutex lock that shall be acquired during the buffer
copy from user to kernel, at least, on a few major filesystems; ext4,
xfs and btrfs in my research. As well, the modified NVMe SSD driver can
acquire the inode->i_mutex lock during P2P DMA transfer.

Once we can consider the OS buffer is updated atomically by the lock,
we don't need to worry about corrupted pages, but still needs to pay
attention to the scenario when updated buffer page is moved to GPU.

In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC
infrastructure, so I intend to move all-visible pages only.
If someone updates the buffer concurrently, then write out the page
including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because
updated tuples should not be visible to the transaction which issued
P2P DMA.

Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status
that indicates CPU to retry this page again. In this case, this page is
likely loaded to the shared buffer already, so retry penalty is not so
much.

I'll try to investigate the implementation in this way.
Please correct me, if I misunderstand something (especially, treatment
of PD_ALL_VISIBLE).

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-10 Thread Robert Haas
On Tue, Feb 9, 2016 at 6:35 PM, Kouhei Kaigai  wrote:
> Unfortunately, it was not sufficient.
>
> Due to the assumption, the buffer page to be suspended does not exist
> when a backend process issues a series P2P DMA command. (If block would
> be already loaded to the shared buffer, it don't need to issue P2P DMA,
> but just use usual memory<->device DMA because RAM is much faster than
> SSD.)
> It knows the pair of (rel,fork,block), but no BufferDesc of this block
> exists. Thus, it cannot acquire locks in BufferDesc structure.
>
> Even if the block does not exist at this point, concurrent process may
> load the same page. BufferDesc of this page shall be assigned at this
> point, however, here is no chance to lock something in BufferDesc for
> the process which issues P2P DMA command.
>
> It is the reason why I assume the suspend/resume mechanism shall take
> a pair of (rel,fork,block) as identifier of the target block.

I see the problem, but I'm not terribly keen on putting in the hooks
that it would take to let you solve it without hacking core.  It
sounds like an awfully invasive thing for a pretty niche requirement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-09 Thread Kouhei Kaigai
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Wednesday, February 10, 2016 1:58 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote
> Subject: ##freemail## Re: [HACKERS] Way to check whether a particular block is
> on the shared_buffer?
> 
> On Sun, Feb 7, 2016 at 9:49 PM, Kouhei Kaigai  wrote:
> > On the other hands, it also became clear we have to guarantee OS buffer
> > or storage block must not be updated partially during the P2P DMA.
> > My motivation is a potential utilization of P2P DMA of SSD-to-GPU to
> > filter out unnecessary rows and columns prior to loading to CPU/RAM.
> > It needs to ensure PostgreSQL does not write out buffers to OS buffers
> > to avoid unexpected data corruption.
> >
> > What I want to achieve is suspend of buffer write towards a particular
> > (relnode, forknum, blocknum) pair for a short time, by completion of
> > data processing by GPU (or other external devices).
> > In addition, it is preferable being workable regardless of the choice
> > of storage manager, even if we may have multiple options on top of the
> > pluggable smgr in the future.
> 
> It seems like you just need to take an exclusive content lock on the
> buffer, or maybe a shared content lock would be sufficient.
>
Unfortunately, it was not sufficient.

Due to the assumption, the buffer page to be suspended does not exist
when a backend process issues a series P2P DMA command. (If block would
be already loaded to the shared buffer, it don't need to issue P2P DMA,
but just use usual memory<->device DMA because RAM is much faster than
SSD.)
It knows the pair of (rel,fork,block), but no BufferDesc of this block
exists. Thus, it cannot acquire locks in BufferDesc structure.

Even if the block does not exist at this point, concurrent process may
load the same page. BufferDesc of this page shall be assigned at this
point, however, here is no chance to lock something in BufferDesc for
the process which issues P2P DMA command.

It is the reason why I assume the suspend/resume mechanism shall take
a pair of (rel,fork,block) as identifier of the target block.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-09 Thread Robert Haas
On Sun, Feb 7, 2016 at 9:49 PM, Kouhei Kaigai  wrote:
> On the other hands, it also became clear we have to guarantee OS buffer
> or storage block must not be updated partially during the P2P DMA.
> My motivation is a potential utilization of P2P DMA of SSD-to-GPU to
> filter out unnecessary rows and columns prior to loading to CPU/RAM.
> It needs to ensure PostgreSQL does not write out buffers to OS buffers
> to avoid unexpected data corruption.
>
> What I want to achieve is suspend of buffer write towards a particular
> (relnode, forknum, blocknum) pair for a short time, by completion of
> data processing by GPU (or other external devices).
> In addition, it is preferable being workable regardless of the choice
> of storage manager, even if we may have multiple options on top of the
> pluggable smgr in the future.

It seems like you just need to take an exclusive content lock on the
buffer, or maybe a shared content lock would be sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-07 Thread Kouhei Kaigai
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Monday, February 08, 2016 1:52 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote
> Subject: Re: [HACKERS] Way to check whether a particular block is
> on the shared_buffer?
> 
> On Thu, Feb 4, 2016 at 11:34 PM, Kouhei Kaigai  wrote:
> > I can agree that smgr hooks shall be primarily designed to make storage
> > systems pluggable, even if we can use this hooks for suspend & resume of
> > write i/o stuff.
> > In addition, "pluggable storage" is a long-standing feature, even though
> > it is not certain whether existing smgr hooks are good starting point.
> > It may be a risk if we implement a grand feature on top of the hooks
> > but out of its primary purpose.
> >
> > So, my preference is a mechanism to hook buffer write to implement this
> > feature. (Or, maybe a built-in write i/o suspend / resume stuff if it
> > has nearly zero cost when no extension activate the feature.)
> > One downside of this approach is larger number of hook points.
> > We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc
> > and FlushRelationBuffers, in addition to FlushBuffer, at least.
> 
> I don't understand what you're hoping to achieve by introducing
> pluggability at the smgr layer.  I mean, md.c is pretty much good for
> read and writing from anything that looks like a directory of files.
> Another smgr layer would make sense if we wanted to read and write via
> some kind of network protocol, or if we wanted to have some kind of
> storage substrate that did internally to itself some of the tasks for
> which we are currently relying on the filesystem - e.g. if we wanted
> to be able to use a raw device, or perhaps more plausibly if we wanted
> to reduce the number of separate files we need, or provide a substrate
> that can clip an unused extent out of the middle of a relation
> efficiently.  But I don't understand what this has to do with what
> you're trying to do here.  The subject of this thread is about whether
> you can check for the presence of a block in shared_buffers, and as
> discussed upthread, you can.  I don't quite follow how we made the
> jump from there to smgr pluggability.
>
Yes. smgr pluggability is not what I want to investigate in this thread.
It is not a purpose of discussion, but one potential "idea to implement".

Through the discussion, it became clear that extension can check existence
of buffer of a particular block, using existing infrastructure.

On the other hands, it also became clear we have to guarantee OS buffer
or storage block must not be updated partially during the P2P DMA.
My motivation is a potential utilization of P2P DMA of SSD-to-GPU to
filter out unnecessary rows and columns prior to loading to CPU/RAM.
It needs to ensure PostgreSQL does not write out buffers to OS buffers
to avoid unexpected data corruption.

What I want to achieve is suspend of buffer write towards a particular
(relnode, forknum, blocknum) pair for a short time, by completion of
data processing by GPU (or other external devices).
In addition, it is preferable being workable regardless of the choice
of storage manager, even if we may have multiple options on top of the
pluggable smgr in the future.

The data processing close to the storage needs OS buffer should not be
updated under the P2P DMA, concurrently. So, I want the feature below.
1. An extension (that controls GPU and P2P DMA) can register a particular
   (relnode, forknum, blocknum) pair as suspended block for write.
2. Once a particular block gets suspended, smgrwrite (or its caller) shall
   be blocked unless the above suspended block is not unregistered.
3. The extension will unregister when P2P DMA from the blocks get completed,
   then suspended concurrent backend shall be resumed to write i/o.
4. On the other hands, the extension cannot register the block if some
   other concurrent executes smgrwrite, to avoid potential data flaw.

One idea was injection of a thin layer on top of the smgr mechanism, to
implement the above mechanism.
However, I'm also uncertain whether injection to entire smgr hooks is
a straightforward approach to achieve it.

The minimum stuff I want is a facility to get a control at the head and tail
of smgrwrite() - to suspend the concurrent write prior to smgr_write, and to
notify the concurrent smgr_write gets completed for the mechanism.

It does not need pluggability of smgr, but entrypoint shall be located around
smgr functions.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-07 Thread Robert Haas
On Thu, Feb 4, 2016 at 11:34 PM, Kouhei Kaigai  wrote:
> I can agree that smgr hooks shall be primarily designed to make storage
> systems pluggable, even if we can use this hooks for suspend & resume of
> write i/o stuff.
> In addition, "pluggable storage" is a long-standing feature, even though
> it is not certain whether existing smgr hooks are good starting point.
> It may be a risk if we implement a grand feature on top of the hooks
> but out of its primary purpose.
>
> So, my preference is a mechanism to hook buffer write to implement this
> feature. (Or, maybe a built-in write i/o suspend / resume stuff if it
> has nearly zero cost when no extension activate the feature.)
> One downside of this approach is larger number of hook points.
> We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc
> and FlushRelationBuffers, in addition to FlushBuffer, at least.

I don't understand what you're hoping to achieve by introducing
pluggability at the smgr layer.  I mean, md.c is pretty much good for
read and writing from anything that looks like a directory of files.
Another smgr layer would make sense if we wanted to read and write via
some kind of network protocol, or if we wanted to have some kind of
storage substrate that did internally to itself some of the tasks for
which we are currently relying on the filesystem - e.g. if we wanted
to be able to use a raw device, or perhaps more plausibly if we wanted
to reduce the number of separate files we need, or provide a substrate
that can clip an unused extent out of the middle of a relation
efficiently.  But I don't understand what this has to do with what
you're trying to do here.  The subject of this thread is about whether
you can check for the presence of a block in shared_buffers, and as
discussed upthread, you can.  I don't quite follow how we made the
jump from there to smgr pluggability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-04 Thread Kouhei Kaigai
> -Original Message-
> From: Jim Nasby [mailto:jim.na...@bluetreble.com]
> Sent: Friday, February 05, 2016 9:17 AM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; Robert Haas
> Cc: Amit Langote
> Subject: Re: [HACKERS] Way to check whether a particular block is on the
> shared_buffer?
> 
> On 2/4/16 12:30 AM, Kouhei Kaigai wrote:
> >> 2. A feature to suspend i/o write-out towards a particular blocks
> >> >that are registered by other concurrent backend, unless it is not
> >> >unregistered (usually, at the end of P2P DMA).
> >> >==> to be discussed.
> 
> I think there's still a race condition here though...
> 
> A
> finds buffer not in shared buffers
> 
> B
> reads buffer in
> modifies buffer
> starts writing buffer to OS
> 
> A
> Makes call to block write, but write is already in process; thinks
> writes are now blocked
> Reads corrupted block
> Much hilarity ensues
> 
> Or maybe you were just glossing over that part for brevity.
> 
> ...
> 
> > I tried to design a draft of enhancement to realize the above i/o write-out
> > suspend/resume, with less invasive way as possible as we can.
> >
> >ASSUMPTION: I intend to implement this feature as a part of extension,
> >because this i/o suspend/resume checks are pure overhead increment
> >for the core features, unless extension which utilizes it.
> >
> > Three functions shall be added:
> >
> > extern intGetStorageMgrNumbers(void);
> > extern f_smgr GetStorageMgrHandlers(int smgr_which);
> > extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);
> >
> > As literal, GetStorageMgrNumbers() returns the number of storage manager
> > currently installed. It always return 1 right now.
> > GetStorageMgrHandlers() returns the currently configured f_smgr table to
> > the supplied smgr_which. It allows extensions to know current configuration
> > of the storage manager, even if other extension already modified it.
> > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
> > the current one.
> > If extension wants to intermediate 'smgr_write', extension will replace
> > the 'smgr_write' by own function, then call the original function, likely
> > mdwrite, from the alternative function.
> >
> > In this case, call chain shall be:
> >
> >FlushBuffer, and others...
> > +-- smgrwrite(...)
> >  +-- (extension's own function)
> >   +-- mdwrite
> 
> ISTR someone (Robert Haas?) complaining that this method of hooks is
> cumbersome to use and can be fragile if multiple hooks are being
> installed. So maybe we don't want to extend it's usage...
> 
> I'm also not sure whether this is better done with an smgr hook or a
> hook into shared buffer handling...
>
# sorry, I oversight the later part of your reply.

I can agree that smgr hooks shall be primarily designed to make storage
systems pluggable, even if we can use this hooks for suspend & resume of
write i/o stuff.
In addition, "pluggable storage" is a long-standing feature, even though
it is not certain whether existing smgr hooks are good starting point.
It may be a risk if we implement a grand feature on top of the hooks
but out of its primary purpose.

So, my preference is a mechanism to hook buffer write to implement this
feature. (Or, maybe a built-in write i/o suspend / resume stuff if it
has nearly zero cost when no extension activate the feature.)
One downside of this approach is larger number of hook points.
We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc
and FlushRelationBuffers, in addition to FlushBuffer, at least.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-04 Thread Kouhei Kaigai
> On 2/4/16 12:30 AM, Kouhei Kaigai wrote:
> >> 2. A feature to suspend i/o write-out towards a particular blocks
> >> >that are registered by other concurrent backend, unless it is not
> >> >unregistered (usually, at the end of P2P DMA).
> >> >==> to be discussed.
> 
> I think there's still a race condition here though...
> 
> A
> finds buffer not in shared buffers
> 
> B
> reads buffer in
> modifies buffer
> starts writing buffer to OS
> 
> A
> Makes call to block write, but write is already in process; thinks
> writes are now blocked
> Reads corrupted block
> Much hilarity ensues
> 
> Or maybe you were just glossing over that part for brevity.
>
Thanks, this part was not clear from my previous description.

At the time when B starts writing buffer to OS, extension will catch
this i/o request using a hook around the smgrwrite, then the mechanism
registers the block to block P2P DMA request during B's write operation.
(Of course, it unregisters the block at end of the smgrwrite)
So, even if A wants to issue P2P DMA concurrently, it cannot register
the block until B's write operation.

In practical, this operation shall be "try lock", because B's write
operation implies existence of the buffer in main memory, so B does
not need to wait A's write operation if B switch DMA source from SSD
to main memory.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

> ...
> 
> > I tried to design a draft of enhancement to realize the above i/o write-out
> > suspend/resume, with less invasive way as possible as we can.
> >
> >ASSUMPTION: I intend to implement this feature as a part of extension,
> >because this i/o suspend/resume checks are pure overhead increment
> >for the core features, unless extension which utilizes it.
> >
> > Three functions shall be added:
> >
> > extern intGetStorageMgrNumbers(void);
> > extern f_smgr GetStorageMgrHandlers(int smgr_which);
> > extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);
> >
> > As literal, GetStorageMgrNumbers() returns the number of storage manager
> > currently installed. It always return 1 right now.
> > GetStorageMgrHandlers() returns the currently configured f_smgr table to
> > the supplied smgr_which. It allows extensions to know current configuration
> > of the storage manager, even if other extension already modified it.
> > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
> > the current one.
> > If extension wants to intermediate 'smgr_write', extension will replace
> > the 'smgr_write' by own function, then call the original function, likely
> > mdwrite, from the alternative function.
> >
> > In this case, call chain shall be:
> >
> >FlushBuffer, and others...
> > +-- smgrwrite(...)
> >  +-- (extension's own function)
> >   +-- mdwrite
> 
> ISTR someone (Robert Haas?) complaining that this method of hooks is
> cumbersome to use and can be fragile if multiple hooks are being
> installed. So maybe we don't want to extend it's usage...
> 
> I'm also not sure whether this is better done with an smgr hook or a
> hook into shared buffer handling...
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-04 Thread Jim Nasby
On 2/4/16 12:30 AM, Kouhei Kaigai wrote:
>> 2. A feature to suspend i/o write-out towards a particular blocks
>> >that are registered by other concurrent backend, unless it is not
>> >unregistered (usually, at the end of P2P DMA).
>> >==> to be discussed.

I think there's still a race condition here though...

A
finds buffer not in shared buffers

B
reads buffer in
modifies buffer
starts writing buffer to OS

A
Makes call to block write, but write is already in process; thinks
writes are now blocked
Reads corrupted block
Much hilarity ensues

Or maybe you were just glossing over that part for brevity.

...

> I tried to design a draft of enhancement to realize the above i/o write-out
> suspend/resume, with less invasive way as possible as we can.
> 
>ASSUMPTION: I intend to implement this feature as a part of extension,
>because this i/o suspend/resume checks are pure overhead increment
>for the core features, unless extension which utilizes it.
> 
> Three functions shall be added:
> 
> extern intGetStorageMgrNumbers(void);
> extern f_smgr GetStorageMgrHandlers(int smgr_which);
> extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);
> 
> As literal, GetStorageMgrNumbers() returns the number of storage manager
> currently installed. It always return 1 right now.
> GetStorageMgrHandlers() returns the currently configured f_smgr table to
> the supplied smgr_which. It allows extensions to know current configuration
> of the storage manager, even if other extension already modified it.
> SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
> the current one.
> If extension wants to intermediate 'smgr_write', extension will replace
> the 'smgr_write' by own function, then call the original function, likely
> mdwrite, from the alternative function.
> 
> In this case, call chain shall be:
> 
>FlushBuffer, and others...
> +-- smgrwrite(...)
>  +-- (extension's own function)
>   +-- mdwrite

ISTR someone (Robert Haas?) complaining that this method of hooks is
cumbersome to use and can be fragile if multiple hooks are being
installed. So maybe we don't want to extend it's usage...

I'm also not sure whether this is better done with an smgr hook or a
hook into shared buffer handling...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-03 Thread Kouhei Kaigai
> > KaiGai-san,
> >
> > On 2016/02/01 10:38, Kouhei Kaigai wrote:
> > > As an aside, background of my motivation is the slide below:
> > > http://www.slideshare.net/kaigai/sqlgpussd-english
> > > (LT slides in JPUG conference last Dec)
> > >
> > > I'm under investigation of SSD-to-GPU direct feature on top of
> > > the custom-scan interface. It intends to load a bunch of data
> > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > > It only makes sense if the target blocks are not loaded to the
> > > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > > So, I like to have a reliable way to check the latest status of
> > > the shared buffer, to kwon whether a particular block is already
> > > loaded or not.
> >
> > Quite interesting stuff, thanks for sharing!
> >
> > I'm in no way expert on this but could this generally be attacked from the
> > smgr API perspective? Currently, we have only one implementation - md.c
> > (the hard-coded RelationData.smgr_which = 0). If we extended that and
> > provided end-to-end support so that there would be md.c alternatives to
> > storage operations, I guess that would open up opportunities for
> > extensions to specify smgr_which as an argument to ReadBufferExtended(),
> > provided there is already support in place to install md.c alternatives
> > (perhaps in .so). Of course, these are just musings and, perhaps does not
> > really concern the requirements of custom scan methods you have been
> > developing.
> >
> Thanks for your idea. Indeed, smgr hooks are good candidate to implement
> the feature, however, what I need is a thin intermediation layer rather
> than alternative storage engine.
> 
> It becomes clear we need two features here.
> 1. A feature to check whether a particular block is already on the shared
>buffer pool.
>It is available. BufTableLookup() under the BufMappingPartitionLock
>gives us the information we want.
> 
> 2. A feature to suspend i/o write-out towards a particular blocks
>that are registered by other concurrent backend, unless it is not
>unregistered (usually, at the end of P2P DMA).
>==> to be discussed.
> 
> When we call smgrwrite(), like FlushBuffer(), it fetches function pointer
> from the 'smgrsw' array, then calls smgr_write.
> 
>   void
>   smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
> char *buffer, bool skipFsync)
>   {
>   (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
> buffer, skipFsync);
>   }
> 
> If extension would overwrite smgrsw[] array, then call the original
> function under the control by extension, it allows to suspend the call
> of the original smgr_write until completion of P2P DMA.
> 
> It may be a minimum invasive way to implement, and portable to any
> further storage layers.
> 
> How about your thought? Even though it is a bit different from your
> original proposition.
>
I tried to design a draft of enhancement to realize the above i/o write-out
suspend/resume, with less invasive way as possible as we can.

  ASSUMPTION: I intend to implement this feature as a part of extension,
  because this i/o suspend/resume checks are pure overhead increment
  for the core features, unless extension which utilizes it.

Three functions shall be added:

extern intGetStorageMgrNumbers(void);
extern f_smgr GetStorageMgrHandlers(int smgr_which);
extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);

As literal, GetStorageMgrNumbers() returns the number of storage manager
currently installed. It always return 1 right now.
GetStorageMgrHandlers() returns the currently configured f_smgr table to
the supplied smgr_which. It allows extensions to know current configuration
of the storage manager, even if other extension already modified it.
SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
the current one.
If extension wants to intermediate 'smgr_write', extension will replace
the 'smgr_write' by own function, then call the original function, likely
mdwrite, from the alternative function.

In this case, call chain shall be:

  FlushBuffer, and others...
   +-- smgrwrite(...)
+-- (extension's own function)
 +-- mdwrite

Once extension's own function blocks write i/o until P2P DMA completed by
concurrent process, we don't need to care about partial update of OS cache
or storage device.
It is not difficult for extensions to implement a feature to track/untrack
a pair of (relFileNode, forkNum, blockNum), automatic untracking according
to the resource-owner, and a mechanism to block the caller by P2P DMA
completion.

On the other hands, its flexibility seems to me a bit larger than necessity
(what I want to implement is just a blocker of buffer write i/o). And, it
may give people wrong impression for the feature of pluggable storage.

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-02 Thread Kouhei Kaigai
> > > On 1/31/16 7:38 PM, Kouhei Kaigai wrote:
> 
> > > To answer your direct question, I'm no expert, but I haven't seen any
> > > functions that do exactly what you want. You'd have to pull relevant
> > > bits from ReadBuffer_*. Or maybe a better method would just be to call
> > > BufTableLookup() without any locks and if you get a result > -1 just
> > > call the relevant ReadBuffer function. Sometimes you'll end up calling
> > > ReadBuffer even though the buffer isn't in shared buffers, but I would
> > > think that would be a rare occurrence.
> > >
> > Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer()
> > has a good example for this.
> >
> > If it returned a valid buf_id, we have nothing difficult; just call
> > ReadBuffer() to pin the buffer.
> 
> Isn't this what (or very similar to)
> ReadBufferExtended(RBM_ZERO_AND_LOCK) is already doing?
>
This operation actually acquires a buffer page, fills up with zero
and a valid buffer page is wiped out if no free buffer page.
I want to keep the contents of the shared buffer already loaded on
the main memory. P2P DMA and GPU preprocessing intends to minimize
main memory consumption by rows to be filtered by scan qualifiers.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-02 Thread Alvaro Herrera
Kouhei Kaigai wrote:
> > On 1/31/16 7:38 PM, Kouhei Kaigai wrote:

> > To answer your direct question, I'm no expert, but I haven't seen any
> > functions that do exactly what you want. You'd have to pull relevant
> > bits from ReadBuffer_*. Or maybe a better method would just be to call
> > BufTableLookup() without any locks and if you get a result > -1 just
> > call the relevant ReadBuffer function. Sometimes you'll end up calling
> > ReadBuffer even though the buffer isn't in shared buffers, but I would
> > think that would be a rare occurrence.
> >
> Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer()
> has a good example for this.
> 
> If it returned a valid buf_id, we have nothing difficult; just call
> ReadBuffer() to pin the buffer.

Isn't this what (or very similar to)
ReadBufferExtended(RBM_ZERO_AND_LOCK) is already doing?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Kouhei Kaigai
> KaiGai-san,
> 
> On 2016/02/01 10:38, Kouhei Kaigai wrote:
> > As an aside, background of my motivation is the slide below:
> > http://www.slideshare.net/kaigai/sqlgpussd-english
> > (LT slides in JPUG conference last Dec)
> >
> > I'm under investigation of SSD-to-GPU direct feature on top of
> > the custom-scan interface. It intends to load a bunch of data
> > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > It only makes sense if the target blocks are not loaded to the
> > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > So, I like to have a reliable way to check the latest status of
> > the shared buffer, to kwon whether a particular block is already
> > loaded or not.
> 
> Quite interesting stuff, thanks for sharing!
> 
> I'm in no way expert on this but could this generally be attacked from the
> smgr API perspective? Currently, we have only one implementation - md.c
> (the hard-coded RelationData.smgr_which = 0). If we extended that and
> provided end-to-end support so that there would be md.c alternatives to
> storage operations, I guess that would open up opportunities for
> extensions to specify smgr_which as an argument to ReadBufferExtended(),
> provided there is already support in place to install md.c alternatives
> (perhaps in .so). Of course, these are just musings and, perhaps does not
> really concern the requirements of custom scan methods you have been
> developing.
>
Thanks for your idea. Indeed, smgr hooks are good candidate to implement
the feature, however, what I need is a thin intermediation layer rather
than alternative storage engine.

It becomes clear we need two features here.
1. A feature to check whether a particular block is already on the shared
   buffer pool.
   It is available. BufTableLookup() under the BufMappingPartitionLock
   gives us the information we want.

2. A feature to suspend i/o write-out towards a particular blocks
   that are registered by other concurrent backend, unless it is not
   unregistered (usually, at the end of P2P DMA).
   ==> to be discussed.

When we call smgrwrite(), like FlushBuffer(), it fetches function pointer
from the 'smgrsw' array, then calls smgr_write.

  void
  smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
char *buffer, bool skipFsync)
  {
  (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
buffer, skipFsync);
  }

If extension would overwrite smgrsw[] array, then call the original
function under the control by extension, it allows to suspend the call
of the original smgr_write until completion of P2P DMA.

It may be a minimum invasive way to implement, and portable to any
further storage layers.

How about your thought? Even though it is a bit different from your
original proposition.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Amit Langote

KaiGai-san,

On 2016/02/01 10:38, Kouhei Kaigai wrote:
> As an aside, background of my motivation is the slide below:
> http://www.slideshare.net/kaigai/sqlgpussd-english
> (LT slides in JPUG conference last Dec)
> 
> I'm under investigation of SSD-to-GPU direct feature on top of
> the custom-scan interface. It intends to load a bunch of data
> blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> loading onto CPU/RAM, to preprocess the data to be filtered out.
> It only makes sense if the target blocks are not loaded to the
> CPU/RAM yet, because SSD device is essentially slower than RAM.
> So, I like to have a reliable way to check the latest status of
> the shared buffer, to kwon whether a particular block is already
> loaded or not.

Quite interesting stuff, thanks for sharing!

I'm in no way expert on this but could this generally be attacked from the
smgr API perspective? Currently, we have only one implementation - md.c
(the hard-coded RelationData.smgr_which = 0). If we extended that and
provided end-to-end support so that there would be md.c alternatives to
storage operations, I guess that would open up opportunities for
extensions to specify smgr_which as an argument to ReadBufferExtended(),
provided there is already support in place to install md.c alternatives
(perhaps in .so). Of course, these are just musings and, perhaps does not
really concern the requirements of custom scan methods you have been
developing.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Kouhei Kaigai
> On 1/31/16 7:38 PM, Kouhei Kaigai wrote:
> > I'm under investigation of SSD-to-GPU direct feature on top of
> > the custom-scan interface. It intends to load a bunch of data
> > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > It only makes sense if the target blocks are not loaded to the
> > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > So, I like to have a reliable way to check the latest status of
> > the shared buffer, to kwon whether a particular block is already
> > loaded or not.
> 
> That completely ignores the OS cache though... wouldn't that be a major
> issue?
>
Once we can ensure the target block is not cached in the shared buffer,
it is a job of the driver that support P2P DMA to handle OS page cache.
Once driver get a P2P DMA request from PostgreSQL, it checks OS page
cache status and determine the DMA source; whether OS buffer or SSD block.

> To answer your direct question, I'm no expert, but I haven't seen any
> functions that do exactly what you want. You'd have to pull relevant
> bits from ReadBuffer_*. Or maybe a better method would just be to call
> BufTableLookup() without any locks and if you get a result > -1 just
> call the relevant ReadBuffer function. Sometimes you'll end up calling
> ReadBuffer even though the buffer isn't in shared buffers, but I would
> think that would be a rare occurrence.
>
Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer()
has a good example for this.

If it returned a valid buf_id, we have nothing difficult; just call
ReadBuffer() to pin the buffer.

Elsewhere, when BufTableLookup() returned negative, it means a pair of
(relation, forknum, blocknum) does not exist on the shared buffer.
So, extension enqueues P2P DMA request for asynchronous translation,
then driver processes the P2P DMA soon but later.
Concurrent access may always happen. PostgreSQL uses MVCC, so the backend
which issued P2P DMA does not need to pay attention for new tuples that
didn't exist on executor start time, even if other backend loads and
updates the same buffer just after the above BufTableLookup().

On the other hands, we have to pay attention whether a fraction of
the buffer page is partially written to OS buffer or storage. It is
in the scope of operating system, so it is not controllable from us.

One idea I can find out is, temporary suspension of FlushBuffer() for
a particular pairs of (relation, forknum, blocknum) until P2P DMA gets
completed. Even if concurrent backend updates the buffer page after the
BufTableLookup(), it allows to prevent OS caches and storages getting
dirty during the P2P DMA.

How about people's thought?
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Jim Nasby
On 1/31/16 7:38 PM, Kouhei Kaigai wrote:
> I'm under investigation of SSD-to-GPU direct feature on top of
> the custom-scan interface. It intends to load a bunch of data
> blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> loading onto CPU/RAM, to preprocess the data to be filtered out.
> It only makes sense if the target blocks are not loaded to the
> CPU/RAM yet, because SSD device is essentially slower than RAM.
> So, I like to have a reliable way to check the latest status of
> the shared buffer, to kwon whether a particular block is already
> loaded or not.

That completely ignores the OS cache though... wouldn't that be a major
issue?

To answer your direct question, I'm no expert, but I haven't seen any
functions that do exactly what you want. You'd have to pull relevant
bits from ReadBuffer_*. Or maybe a better method would just be to call
BufTableLookup() without any locks and if you get a result > -1 just
call the relevant ReadBuffer function. Sometimes you'll end up calling
ReadBuffer even though the buffer isn't in shared buffers, but I would
think that would be a rare occurrence.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers