Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Mon, Mar 7, 2016 at 4:32 AM, Kouhei Kaigai wrote: >> Why not FileDescriptor(), FileFlags(), FileMode() as separate >> functions like FilePathName()? >> > Here is no deep reason. The attached patch adds three individual > functions. This seems unobjectionable to me, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > Sent: Saturday, March 05, 2016 2:42 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > Subject: Re: [HACKERS] Way to check whether a particular block is on the > shared_buffer? > > On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai wrote: > > I found one other, but tiny, problem to implement SSD-to-GPU direct > > data transfer feature under the PostgreSQL storage. > > > > Extension cannot know the raw file descriptor opened by smgr. > > > > I expect an extension issues an ioctl(2) on the special device file > > on behalf of the special kernel driver, to control the P2P DMA. > > This ioctl(2) will pack file descriptor of the DMA source and some > > various information (like base position, range, destination device > > pointer, ...). > > > > However, the raw file descriptor is wrapped in the fd.c, instead of > > the File handler, thus, not visible to extension. oops... > > > > The attached patch provides a way to obtain raw file descriptor (and > > relevant flags) of a particular File virtual file descriptor on > > PostgreSQL. (No need to say, extension has to treat the raw descriptor > > carefully not to give an adverse effect to the storage manager.) > > > > How about this tiny enhancement? > > Why not FileDescriptor(), FileFlags(), FileMode() as separate > functions like FilePathName()? > Here is no deep reason. The attached patch adds three individual functions. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei pgsql-v9.6-filegetrawdesc.2.patch Description: pgsql-v9.6-filegetrawdesc.2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai wrote: > I found one other, but tiny, problem to implement SSD-to-GPU direct > data transfer feature under the PostgreSQL storage. > > Extension cannot know the raw file descriptor opened by smgr. > > I expect an extension issues an ioctl(2) on the special device file > on behalf of the special kernel driver, to control the P2P DMA. > This ioctl(2) will pack file descriptor of the DMA source and some > various information (like base position, range, destination device > pointer, ...). > > However, the raw file descriptor is wrapped in the fd.c, instead of > the File handler, thus, not visible to extension. oops... > > The attached patch provides a way to obtain raw file descriptor (and > relevant flags) of a particular File virtual file descriptor on > PostgreSQL. (No need to say, extension has to treat the raw descriptor > carefully not to give an adverse effect to the storage manager.) > > How about this tiny enhancement? Why not FileDescriptor(), FileFlags(), FileMode() as separate functions like FilePathName()? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
I found one other, but tiny, problem to implement SSD-to-GPU direct data transfer feature under the PostgreSQL storage. Extension cannot know the raw file descriptor opened by smgr. I expect an extension issues an ioctl(2) on the special device file on behalf of the special kernel driver, to control the P2P DMA. This ioctl(2) will pack file descriptor of the DMA source and some various information (like base position, range, destination device pointer, ...). However, the raw file descriptor is wrapped in the fd.c, instead of the File handler, thus, not visible to extension. oops... The attached patch provides a way to obtain raw file descriptor (and relevant flags) of a particular File virtual file descriptor on PostgreSQL. (No need to say, extension has to treat the raw descriptor carefully not to give an adverse effect to the storage manager.) How about this tiny enhancement? > > -Original Message- > > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > > Sent: Saturday, February 13, 2016 1:46 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > > Subject: Re: [HACKERS] Way to check whether a particular block is on the > > shared_buffer? > > > > On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai wrote: > > > Hmm. In my experience, it is often not a productive discussion whether > > > a feature is niche or commodity. So, let me change the viewpoint. > > > > > > We may utilize OS-level locking mechanism here. > > > > > > Even though it depends on filesystem implementation under the VFS, > > > we may use inode->i_mutex lock that shall be acquired during the buffer > > > copy from user to kernel, at least, on a few major filesystems; ext4, > > > xfs and btrfs in my research. As well, the modified NVMe SSD driver can > > > acquire the inode->i_mutex lock during P2P DMA transfer. > > > > > > Once we can consider the OS buffer is updated atomically by the lock, > > > we don't need to worry about corrupted pages, but still needs to pay > > > attention to the scenario when updated buffer page is moved to GPU. > > > > > > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC > > > infrastructure, so I intend to move all-visible pages only. > > > If someone updates the buffer concurrently, then write out the page > > > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because > > > updated tuples should not be visible to the transaction which issued > > > P2P DMA. > > > > > > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status > > > that indicates CPU to retry this page again. In this case, this page is > > > likely loaded to the shared buffer already, so retry penalty is not so > > > much. > > > > > > I'll try to investigate the implementation in this way. > > > Please correct me, if I misunderstand something (especially, treatment > > > of PD_ALL_VISIBLE). > > > > I suppose there's no theoretical reason why the buffer couldn't go > > from all-visible to not-all-visible and back to all-visible again all > > during the time you are copying it. > > > The backend process that is copying the data to GPU has a transaction > in-progress (= not committed). Is it possible to get the updated buffer > page back to the all-visible state again? > I expect that in-progress transactions works as a blocker for backing > to all-visible. Right? > > > Honestly, I think trying to access buffers without going through > > shared_buffers is likely to be very hard to make correct and probably > > a loser. > > > No challenge, no outcome. ;-) > > > Copying the data into shared_buffers and then to the GPU is, > > doubtless, at least somewhat slower. But I kind of doubt that it's > > enough slower to make up for all of the problems you're going to have > > with the approach you've chosen. > > > Honestly, I'm still uncertain whether it works well as I expects. > However, scan workload on the table larger than main memory is > headache for PG-Strom, so I'd like to try ideas we can implement. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei > pgsql-v9.6-filegetrawdesc.1.patch Description: pgsql-v9.6-filegetrawdesc.1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Sat, Feb 13, 2016 at 7:29 AM, Kouhei Kaigai wrote: >> I suppose there's no theoretical reason why the buffer couldn't go >> from all-visible to not-all-visible and back to all-visible again all >> during the time you are copying it. >> > The backend process that is copying the data to GPU has a transaction > in-progress (= not committed). Is it possible to get the updated buffer > page back to the all-visible state again? > I expect that in-progress transactions works as a blocker for backing > to all-visible. Right? Yeah, probably. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > Sent: Saturday, February 13, 2016 1:46 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > Subject: Re: [HACKERS] Way to check whether a particular block is on the > shared_buffer? > > On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai wrote: > > Hmm. In my experience, it is often not a productive discussion whether > > a feature is niche or commodity. So, let me change the viewpoint. > > > > We may utilize OS-level locking mechanism here. > > > > Even though it depends on filesystem implementation under the VFS, > > we may use inode->i_mutex lock that shall be acquired during the buffer > > copy from user to kernel, at least, on a few major filesystems; ext4, > > xfs and btrfs in my research. As well, the modified NVMe SSD driver can > > acquire the inode->i_mutex lock during P2P DMA transfer. > > > > Once we can consider the OS buffer is updated atomically by the lock, > > we don't need to worry about corrupted pages, but still needs to pay > > attention to the scenario when updated buffer page is moved to GPU. > > > > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC > > infrastructure, so I intend to move all-visible pages only. > > If someone updates the buffer concurrently, then write out the page > > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because > > updated tuples should not be visible to the transaction which issued > > P2P DMA. > > > > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status > > that indicates CPU to retry this page again. In this case, this page is > > likely loaded to the shared buffer already, so retry penalty is not so > > much. > > > > I'll try to investigate the implementation in this way. > > Please correct me, if I misunderstand something (especially, treatment > > of PD_ALL_VISIBLE). > > I suppose there's no theoretical reason why the buffer couldn't go > from all-visible to not-all-visible and back to all-visible again all > during the time you are copying it. > The backend process that is copying the data to GPU has a transaction in-progress (= not committed). Is it possible to get the updated buffer page back to the all-visible state again? I expect that in-progress transactions works as a blocker for backing to all-visible. Right? > Honestly, I think trying to access buffers without going through > shared_buffers is likely to be very hard to make correct and probably > a loser. > No challenge, no outcome. ;-) > Copying the data into shared_buffers and then to the GPU is, > doubtless, at least somewhat slower. But I kind of doubt that it's > enough slower to make up for all of the problems you're going to have > with the approach you've chosen. > Honestly, I'm still uncertain whether it works well as I expects. However, scan workload on the table larger than main memory is headache for PG-Strom, so I'd like to try ideas we can implement. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Thu, Feb 11, 2016 at 9:05 PM, Kouhei Kaigai wrote: > Hmm. In my experience, it is often not a productive discussion whether > a feature is niche or commodity. So, let me change the viewpoint. > > We may utilize OS-level locking mechanism here. > > Even though it depends on filesystem implementation under the VFS, > we may use inode->i_mutex lock that shall be acquired during the buffer > copy from user to kernel, at least, on a few major filesystems; ext4, > xfs and btrfs in my research. As well, the modified NVMe SSD driver can > acquire the inode->i_mutex lock during P2P DMA transfer. > > Once we can consider the OS buffer is updated atomically by the lock, > we don't need to worry about corrupted pages, but still needs to pay > attention to the scenario when updated buffer page is moved to GPU. > > In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC > infrastructure, so I intend to move all-visible pages only. > If someone updates the buffer concurrently, then write out the page > including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because > updated tuples should not be visible to the transaction which issued > P2P DMA. > > Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status > that indicates CPU to retry this page again. In this case, this page is > likely loaded to the shared buffer already, so retry penalty is not so > much. > > I'll try to investigate the implementation in this way. > Please correct me, if I misunderstand something (especially, treatment > of PD_ALL_VISIBLE). I suppose there's no theoretical reason why the buffer couldn't go from all-visible to not-all-visible and back to all-visible again all during the time you are copying it. Honestly, I think trying to access buffers without going through shared_buffers is likely to be very hard to make correct and probably a loser. Copying the data into shared_buffers and then to the GPU is, doubtless, at least somewhat slower. But I kind of doubt that it's enough slower to make up for all of the problems you're going to have with the approach you've chosen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> On Tue, Feb 9, 2016 at 6:35 PM, Kouhei Kaigai wrote: > > Unfortunately, it was not sufficient. > > > > Due to the assumption, the buffer page to be suspended does not exist > > when a backend process issues a series P2P DMA command. (If block would > > be already loaded to the shared buffer, it don't need to issue P2P DMA, > > but just use usual memory<->device DMA because RAM is much faster than > > SSD.) > > It knows the pair of (rel,fork,block), but no BufferDesc of this block > > exists. Thus, it cannot acquire locks in BufferDesc structure. > > > > Even if the block does not exist at this point, concurrent process may > > load the same page. BufferDesc of this page shall be assigned at this > > point, however, here is no chance to lock something in BufferDesc for > > the process which issues P2P DMA command. > > > > It is the reason why I assume the suspend/resume mechanism shall take > > a pair of (rel,fork,block) as identifier of the target block. > > I see the problem, but I'm not terribly keen on putting in the hooks > that it would take to let you solve it without hacking core. It > sounds like an awfully invasive thing for a pretty niche requirement. > Hmm. In my experience, it is often not a productive discussion whether a feature is niche or commodity. So, let me change the viewpoint. We may utilize OS-level locking mechanism here. Even though it depends on filesystem implementation under the VFS, we may use inode->i_mutex lock that shall be acquired during the buffer copy from user to kernel, at least, on a few major filesystems; ext4, xfs and btrfs in my research. As well, the modified NVMe SSD driver can acquire the inode->i_mutex lock during P2P DMA transfer. Once we can consider the OS buffer is updated atomically by the lock, we don't need to worry about corrupted pages, but still needs to pay attention to the scenario when updated buffer page is moved to GPU. In this case, PD_ALL_VISIBLE may give us a hint. GPU side has no MVCC infrastructure, so I intend to move all-visible pages only. If someone updates the buffer concurrently, then write out the page including invisible tuples, PD_ALL_VISIBLE flag shall be cleared because updated tuples should not be visible to the transaction which issued P2P DMA. Once GPU met a page with !PD_ALL_VISIBLE, it can return an error status that indicates CPU to retry this page again. In this case, this page is likely loaded to the shared buffer already, so retry penalty is not so much. I'll try to investigate the implementation in this way. Please correct me, if I misunderstand something (especially, treatment of PD_ALL_VISIBLE). Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Tue, Feb 9, 2016 at 6:35 PM, Kouhei Kaigai wrote: > Unfortunately, it was not sufficient. > > Due to the assumption, the buffer page to be suspended does not exist > when a backend process issues a series P2P DMA command. (If block would > be already loaded to the shared buffer, it don't need to issue P2P DMA, > but just use usual memory<->device DMA because RAM is much faster than > SSD.) > It knows the pair of (rel,fork,block), but no BufferDesc of this block > exists. Thus, it cannot acquire locks in BufferDesc structure. > > Even if the block does not exist at this point, concurrent process may > load the same page. BufferDesc of this page shall be assigned at this > point, however, here is no chance to lock something in BufferDesc for > the process which issues P2P DMA command. > > It is the reason why I assume the suspend/resume mechanism shall take > a pair of (rel,fork,block) as identifier of the target block. I see the problem, but I'm not terribly keen on putting in the hooks that it would take to let you solve it without hacking core. It sounds like an awfully invasive thing for a pretty niche requirement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Wednesday, February 10, 2016 1:58 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > Subject: ##freemail## Re: [HACKERS] Way to check whether a particular block is > on the shared_buffer? > > On Sun, Feb 7, 2016 at 9:49 PM, Kouhei Kaigai wrote: > > On the other hands, it also became clear we have to guarantee OS buffer > > or storage block must not be updated partially during the P2P DMA. > > My motivation is a potential utilization of P2P DMA of SSD-to-GPU to > > filter out unnecessary rows and columns prior to loading to CPU/RAM. > > It needs to ensure PostgreSQL does not write out buffers to OS buffers > > to avoid unexpected data corruption. > > > > What I want to achieve is suspend of buffer write towards a particular > > (relnode, forknum, blocknum) pair for a short time, by completion of > > data processing by GPU (or other external devices). > > In addition, it is preferable being workable regardless of the choice > > of storage manager, even if we may have multiple options on top of the > > pluggable smgr in the future. > > It seems like you just need to take an exclusive content lock on the > buffer, or maybe a shared content lock would be sufficient. > Unfortunately, it was not sufficient. Due to the assumption, the buffer page to be suspended does not exist when a backend process issues a series P2P DMA command. (If block would be already loaded to the shared buffer, it don't need to issue P2P DMA, but just use usual memory<->device DMA because RAM is much faster than SSD.) It knows the pair of (rel,fork,block), but no BufferDesc of this block exists. Thus, it cannot acquire locks in BufferDesc structure. Even if the block does not exist at this point, concurrent process may load the same page. BufferDesc of this page shall be assigned at this point, however, here is no chance to lock something in BufferDesc for the process which issues P2P DMA command. It is the reason why I assume the suspend/resume mechanism shall take a pair of (rel,fork,block) as identifier of the target block. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Sun, Feb 7, 2016 at 9:49 PM, Kouhei Kaigai wrote: > On the other hands, it also became clear we have to guarantee OS buffer > or storage block must not be updated partially during the P2P DMA. > My motivation is a potential utilization of P2P DMA of SSD-to-GPU to > filter out unnecessary rows and columns prior to loading to CPU/RAM. > It needs to ensure PostgreSQL does not write out buffers to OS buffers > to avoid unexpected data corruption. > > What I want to achieve is suspend of buffer write towards a particular > (relnode, forknum, blocknum) pair for a short time, by completion of > data processing by GPU (or other external devices). > In addition, it is preferable being workable regardless of the choice > of storage manager, even if we may have multiple options on top of the > pluggable smgr in the future. It seems like you just need to take an exclusive content lock on the buffer, or maybe a shared content lock would be sufficient. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Monday, February 08, 2016 1:52 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Jim Nasby; pgsql-hackers@postgresql.org; Amit Langote > Subject: Re: [HACKERS] Way to check whether a particular block is > on the shared_buffer? > > On Thu, Feb 4, 2016 at 11:34 PM, Kouhei Kaigai wrote: > > I can agree that smgr hooks shall be primarily designed to make storage > > systems pluggable, even if we can use this hooks for suspend & resume of > > write i/o stuff. > > In addition, "pluggable storage" is a long-standing feature, even though > > it is not certain whether existing smgr hooks are good starting point. > > It may be a risk if we implement a grand feature on top of the hooks > > but out of its primary purpose. > > > > So, my preference is a mechanism to hook buffer write to implement this > > feature. (Or, maybe a built-in write i/o suspend / resume stuff if it > > has nearly zero cost when no extension activate the feature.) > > One downside of this approach is larger number of hook points. > > We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc > > and FlushRelationBuffers, in addition to FlushBuffer, at least. > > I don't understand what you're hoping to achieve by introducing > pluggability at the smgr layer. I mean, md.c is pretty much good for > read and writing from anything that looks like a directory of files. > Another smgr layer would make sense if we wanted to read and write via > some kind of network protocol, or if we wanted to have some kind of > storage substrate that did internally to itself some of the tasks for > which we are currently relying on the filesystem - e.g. if we wanted > to be able to use a raw device, or perhaps more plausibly if we wanted > to reduce the number of separate files we need, or provide a substrate > that can clip an unused extent out of the middle of a relation > efficiently. But I don't understand what this has to do with what > you're trying to do here. The subject of this thread is about whether > you can check for the presence of a block in shared_buffers, and as > discussed upthread, you can. I don't quite follow how we made the > jump from there to smgr pluggability. > Yes. smgr pluggability is not what I want to investigate in this thread. It is not a purpose of discussion, but one potential "idea to implement". Through the discussion, it became clear that extension can check existence of buffer of a particular block, using existing infrastructure. On the other hands, it also became clear we have to guarantee OS buffer or storage block must not be updated partially during the P2P DMA. My motivation is a potential utilization of P2P DMA of SSD-to-GPU to filter out unnecessary rows and columns prior to loading to CPU/RAM. It needs to ensure PostgreSQL does not write out buffers to OS buffers to avoid unexpected data corruption. What I want to achieve is suspend of buffer write towards a particular (relnode, forknum, blocknum) pair for a short time, by completion of data processing by GPU (or other external devices). In addition, it is preferable being workable regardless of the choice of storage manager, even if we may have multiple options on top of the pluggable smgr in the future. The data processing close to the storage needs OS buffer should not be updated under the P2P DMA, concurrently. So, I want the feature below. 1. An extension (that controls GPU and P2P DMA) can register a particular (relnode, forknum, blocknum) pair as suspended block for write. 2. Once a particular block gets suspended, smgrwrite (or its caller) shall be blocked unless the above suspended block is not unregistered. 3. The extension will unregister when P2P DMA from the blocks get completed, then suspended concurrent backend shall be resumed to write i/o. 4. On the other hands, the extension cannot register the block if some other concurrent executes smgrwrite, to avoid potential data flaw. One idea was injection of a thin layer on top of the smgr mechanism, to implement the above mechanism. However, I'm also uncertain whether injection to entire smgr hooks is a straightforward approach to achieve it. The minimum stuff I want is a facility to get a control at the head and tail of smgrwrite() - to suspend the concurrent write prior to smgr_write, and to notify the concurrent smgr_write gets completed for the mechanism. It does not need pluggability of smgr, but entrypoint shall be located around smgr functions. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On Thu, Feb 4, 2016 at 11:34 PM, Kouhei Kaigai wrote: > I can agree that smgr hooks shall be primarily designed to make storage > systems pluggable, even if we can use this hooks for suspend & resume of > write i/o stuff. > In addition, "pluggable storage" is a long-standing feature, even though > it is not certain whether existing smgr hooks are good starting point. > It may be a risk if we implement a grand feature on top of the hooks > but out of its primary purpose. > > So, my preference is a mechanism to hook buffer write to implement this > feature. (Or, maybe a built-in write i/o suspend / resume stuff if it > has nearly zero cost when no extension activate the feature.) > One downside of this approach is larger number of hook points. > We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc > and FlushRelationBuffers, in addition to FlushBuffer, at least. I don't understand what you're hoping to achieve by introducing pluggability at the smgr layer. I mean, md.c is pretty much good for read and writing from anything that looks like a directory of files. Another smgr layer would make sense if we wanted to read and write via some kind of network protocol, or if we wanted to have some kind of storage substrate that did internally to itself some of the tasks for which we are currently relying on the filesystem - e.g. if we wanted to be able to use a raw device, or perhaps more plausibly if we wanted to reduce the number of separate files we need, or provide a substrate that can clip an unused extent out of the middle of a relation efficiently. But I don't understand what this has to do with what you're trying to do here. The subject of this thread is about whether you can check for the presence of a block in shared_buffers, and as discussed upthread, you can. I don't quite follow how we made the jump from there to smgr pluggability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> -Original Message- > From: Jim Nasby [mailto:jim.na...@bluetreble.com] > Sent: Friday, February 05, 2016 9:17 AM > To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org; Robert Haas > Cc: Amit Langote > Subject: Re: [HACKERS] Way to check whether a particular block is on the > shared_buffer? > > On 2/4/16 12:30 AM, Kouhei Kaigai wrote: > >> 2. A feature to suspend i/o write-out towards a particular blocks > >> >that are registered by other concurrent backend, unless it is not > >> >unregistered (usually, at the end of P2P DMA). > >> >==> to be discussed. > > I think there's still a race condition here though... > > A > finds buffer not in shared buffers > > B > reads buffer in > modifies buffer > starts writing buffer to OS > > A > Makes call to block write, but write is already in process; thinks > writes are now blocked > Reads corrupted block > Much hilarity ensues > > Or maybe you were just glossing over that part for brevity. > > ... > > > I tried to design a draft of enhancement to realize the above i/o write-out > > suspend/resume, with less invasive way as possible as we can. > > > >ASSUMPTION: I intend to implement this feature as a part of extension, > >because this i/o suspend/resume checks are pure overhead increment > >for the core features, unless extension which utilizes it. > > > > Three functions shall be added: > > > > extern intGetStorageMgrNumbers(void); > > extern f_smgr GetStorageMgrHandlers(int smgr_which); > > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > > > As literal, GetStorageMgrNumbers() returns the number of storage manager > > currently installed. It always return 1 right now. > > GetStorageMgrHandlers() returns the currently configured f_smgr table to > > the supplied smgr_which. It allows extensions to know current configuration > > of the storage manager, even if other extension already modified it. > > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > > the current one. > > If extension wants to intermediate 'smgr_write', extension will replace > > the 'smgr_write' by own function, then call the original function, likely > > mdwrite, from the alternative function. > > > > In this case, call chain shall be: > > > >FlushBuffer, and others... > > +-- smgrwrite(...) > > +-- (extension's own function) > > +-- mdwrite > > ISTR someone (Robert Haas?) complaining that this method of hooks is > cumbersome to use and can be fragile if multiple hooks are being > installed. So maybe we don't want to extend it's usage... > > I'm also not sure whether this is better done with an smgr hook or a > hook into shared buffer handling... > # sorry, I oversight the later part of your reply. I can agree that smgr hooks shall be primarily designed to make storage systems pluggable, even if we can use this hooks for suspend & resume of write i/o stuff. In addition, "pluggable storage" is a long-standing feature, even though it is not certain whether existing smgr hooks are good starting point. It may be a risk if we implement a grand feature on top of the hooks but out of its primary purpose. So, my preference is a mechanism to hook buffer write to implement this feature. (Or, maybe a built-in write i/o suspend / resume stuff if it has nearly zero cost when no extension activate the feature.) One downside of this approach is larger number of hook points. We have to deploy the hook nearby existing smgrwrite of LocalBufferAlloc and FlushRelationBuffers, in addition to FlushBuffer, at least. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> On 2/4/16 12:30 AM, Kouhei Kaigai wrote: > >> 2. A feature to suspend i/o write-out towards a particular blocks > >> >that are registered by other concurrent backend, unless it is not > >> >unregistered (usually, at the end of P2P DMA). > >> >==> to be discussed. > > I think there's still a race condition here though... > > A > finds buffer not in shared buffers > > B > reads buffer in > modifies buffer > starts writing buffer to OS > > A > Makes call to block write, but write is already in process; thinks > writes are now blocked > Reads corrupted block > Much hilarity ensues > > Or maybe you were just glossing over that part for brevity. > Thanks, this part was not clear from my previous description. At the time when B starts writing buffer to OS, extension will catch this i/o request using a hook around the smgrwrite, then the mechanism registers the block to block P2P DMA request during B's write operation. (Of course, it unregisters the block at end of the smgrwrite) So, even if A wants to issue P2P DMA concurrently, it cannot register the block until B's write operation. In practical, this operation shall be "try lock", because B's write operation implies existence of the buffer in main memory, so B does not need to wait A's write operation if B switch DMA source from SSD to main memory. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei > ... > > > I tried to design a draft of enhancement to realize the above i/o write-out > > suspend/resume, with less invasive way as possible as we can. > > > >ASSUMPTION: I intend to implement this feature as a part of extension, > >because this i/o suspend/resume checks are pure overhead increment > >for the core features, unless extension which utilizes it. > > > > Three functions shall be added: > > > > extern intGetStorageMgrNumbers(void); > > extern f_smgr GetStorageMgrHandlers(int smgr_which); > > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > > > As literal, GetStorageMgrNumbers() returns the number of storage manager > > currently installed. It always return 1 right now. > > GetStorageMgrHandlers() returns the currently configured f_smgr table to > > the supplied smgr_which. It allows extensions to know current configuration > > of the storage manager, even if other extension already modified it. > > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > > the current one. > > If extension wants to intermediate 'smgr_write', extension will replace > > the 'smgr_write' by own function, then call the original function, likely > > mdwrite, from the alternative function. > > > > In this case, call chain shall be: > > > >FlushBuffer, and others... > > +-- smgrwrite(...) > > +-- (extension's own function) > > +-- mdwrite > > ISTR someone (Robert Haas?) complaining that this method of hooks is > cumbersome to use and can be fragile if multiple hooks are being > installed. So maybe we don't want to extend it's usage... > > I'm also not sure whether this is better done with an smgr hook or a > hook into shared buffer handling... > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On 2/4/16 12:30 AM, Kouhei Kaigai wrote: >> 2. A feature to suspend i/o write-out towards a particular blocks >> >that are registered by other concurrent backend, unless it is not >> >unregistered (usually, at the end of P2P DMA). >> >==> to be discussed. I think there's still a race condition here though... A finds buffer not in shared buffers B reads buffer in modifies buffer starts writing buffer to OS A Makes call to block write, but write is already in process; thinks writes are now blocked Reads corrupted block Much hilarity ensues Or maybe you were just glossing over that part for brevity. ... > I tried to design a draft of enhancement to realize the above i/o write-out > suspend/resume, with less invasive way as possible as we can. > >ASSUMPTION: I intend to implement this feature as a part of extension, >because this i/o suspend/resume checks are pure overhead increment >for the core features, unless extension which utilizes it. > > Three functions shall be added: > > extern intGetStorageMgrNumbers(void); > extern f_smgr GetStorageMgrHandlers(int smgr_which); > extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); > > As literal, GetStorageMgrNumbers() returns the number of storage manager > currently installed. It always return 1 right now. > GetStorageMgrHandlers() returns the currently configured f_smgr table to > the supplied smgr_which. It allows extensions to know current configuration > of the storage manager, even if other extension already modified it. > SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of > the current one. > If extension wants to intermediate 'smgr_write', extension will replace > the 'smgr_write' by own function, then call the original function, likely > mdwrite, from the alternative function. > > In this case, call chain shall be: > >FlushBuffer, and others... > +-- smgrwrite(...) > +-- (extension's own function) > +-- mdwrite ISTR someone (Robert Haas?) complaining that this method of hooks is cumbersome to use and can be fragile if multiple hooks are being installed. So maybe we don't want to extend it's usage... I'm also not sure whether this is better done with an smgr hook or a hook into shared buffer handling... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> > KaiGai-san, > > > > On 2016/02/01 10:38, Kouhei Kaigai wrote: > > > As an aside, background of my motivation is the slide below: > > > http://www.slideshare.net/kaigai/sqlgpussd-english > > > (LT slides in JPUG conference last Dec) > > > > > > I'm under investigation of SSD-to-GPU direct feature on top of > > > the custom-scan interface. It intends to load a bunch of data > > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > > > loading onto CPU/RAM, to preprocess the data to be filtered out. > > > It only makes sense if the target blocks are not loaded to the > > > CPU/RAM yet, because SSD device is essentially slower than RAM. > > > So, I like to have a reliable way to check the latest status of > > > the shared buffer, to kwon whether a particular block is already > > > loaded or not. > > > > Quite interesting stuff, thanks for sharing! > > > > I'm in no way expert on this but could this generally be attacked from the > > smgr API perspective? Currently, we have only one implementation - md.c > > (the hard-coded RelationData.smgr_which = 0). If we extended that and > > provided end-to-end support so that there would be md.c alternatives to > > storage operations, I guess that would open up opportunities for > > extensions to specify smgr_which as an argument to ReadBufferExtended(), > > provided there is already support in place to install md.c alternatives > > (perhaps in .so). Of course, these are just musings and, perhaps does not > > really concern the requirements of custom scan methods you have been > > developing. > > > Thanks for your idea. Indeed, smgr hooks are good candidate to implement > the feature, however, what I need is a thin intermediation layer rather > than alternative storage engine. > > It becomes clear we need two features here. > 1. A feature to check whether a particular block is already on the shared >buffer pool. >It is available. BufTableLookup() under the BufMappingPartitionLock >gives us the information we want. > > 2. A feature to suspend i/o write-out towards a particular blocks >that are registered by other concurrent backend, unless it is not >unregistered (usually, at the end of P2P DMA). >==> to be discussed. > > When we call smgrwrite(), like FlushBuffer(), it fetches function pointer > from the 'smgrsw' array, then calls smgr_write. > > void > smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, > char *buffer, bool skipFsync) > { > (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum, > buffer, skipFsync); > } > > If extension would overwrite smgrsw[] array, then call the original > function under the control by extension, it allows to suspend the call > of the original smgr_write until completion of P2P DMA. > > It may be a minimum invasive way to implement, and portable to any > further storage layers. > > How about your thought? Even though it is a bit different from your > original proposition. > I tried to design a draft of enhancement to realize the above i/o write-out suspend/resume, with less invasive way as possible as we can. ASSUMPTION: I intend to implement this feature as a part of extension, because this i/o suspend/resume checks are pure overhead increment for the core features, unless extension which utilizes it. Three functions shall be added: extern intGetStorageMgrNumbers(void); extern f_smgr GetStorageMgrHandlers(int smgr_which); extern void SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers); As literal, GetStorageMgrNumbers() returns the number of storage manager currently installed. It always return 1 right now. GetStorageMgrHandlers() returns the currently configured f_smgr table to the supplied smgr_which. It allows extensions to know current configuration of the storage manager, even if other extension already modified it. SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of the current one. If extension wants to intermediate 'smgr_write', extension will replace the 'smgr_write' by own function, then call the original function, likely mdwrite, from the alternative function. In this case, call chain shall be: FlushBuffer, and others... +-- smgrwrite(...) +-- (extension's own function) +-- mdwrite Once extension's own function blocks write i/o until P2P DMA completed by concurrent process, we don't need to care about partial update of OS cache or storage device. It is not difficult for extensions to implement a feature to track/untrack a pair of (relFileNode, forkNum, blockNum), automatic untracking according to the resource-owner, and a mechanism to block the caller by P2P DMA completion. On the other hands, its flexibility seems to me a bit larger than necessity (what I want to implement is just a blocker of buffer write i/o). And, it may give people wrong impression for the feature of pluggable storage.
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> > > On 1/31/16 7:38 PM, Kouhei Kaigai wrote: > > > > To answer your direct question, I'm no expert, but I haven't seen any > > > functions that do exactly what you want. You'd have to pull relevant > > > bits from ReadBuffer_*. Or maybe a better method would just be to call > > > BufTableLookup() without any locks and if you get a result > -1 just > > > call the relevant ReadBuffer function. Sometimes you'll end up calling > > > ReadBuffer even though the buffer isn't in shared buffers, but I would > > > think that would be a rare occurrence. > > > > > Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer() > > has a good example for this. > > > > If it returned a valid buf_id, we have nothing difficult; just call > > ReadBuffer() to pin the buffer. > > Isn't this what (or very similar to) > ReadBufferExtended(RBM_ZERO_AND_LOCK) is already doing? > This operation actually acquires a buffer page, fills up with zero and a valid buffer page is wiped out if no free buffer page. I want to keep the contents of the shared buffer already loaded on the main memory. P2P DMA and GPU preprocessing intends to minimize main memory consumption by rows to be filtered by scan qualifiers. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
Kouhei Kaigai wrote: > > On 1/31/16 7:38 PM, Kouhei Kaigai wrote: > > To answer your direct question, I'm no expert, but I haven't seen any > > functions that do exactly what you want. You'd have to pull relevant > > bits from ReadBuffer_*. Or maybe a better method would just be to call > > BufTableLookup() without any locks and if you get a result > -1 just > > call the relevant ReadBuffer function. Sometimes you'll end up calling > > ReadBuffer even though the buffer isn't in shared buffers, but I would > > think that would be a rare occurrence. > > > Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer() > has a good example for this. > > If it returned a valid buf_id, we have nothing difficult; just call > ReadBuffer() to pin the buffer. Isn't this what (or very similar to) ReadBufferExtended(RBM_ZERO_AND_LOCK) is already doing? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> KaiGai-san, > > On 2016/02/01 10:38, Kouhei Kaigai wrote: > > As an aside, background of my motivation is the slide below: > > http://www.slideshare.net/kaigai/sqlgpussd-english > > (LT slides in JPUG conference last Dec) > > > > I'm under investigation of SSD-to-GPU direct feature on top of > > the custom-scan interface. It intends to load a bunch of data > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > > loading onto CPU/RAM, to preprocess the data to be filtered out. > > It only makes sense if the target blocks are not loaded to the > > CPU/RAM yet, because SSD device is essentially slower than RAM. > > So, I like to have a reliable way to check the latest status of > > the shared buffer, to kwon whether a particular block is already > > loaded or not. > > Quite interesting stuff, thanks for sharing! > > I'm in no way expert on this but could this generally be attacked from the > smgr API perspective? Currently, we have only one implementation - md.c > (the hard-coded RelationData.smgr_which = 0). If we extended that and > provided end-to-end support so that there would be md.c alternatives to > storage operations, I guess that would open up opportunities for > extensions to specify smgr_which as an argument to ReadBufferExtended(), > provided there is already support in place to install md.c alternatives > (perhaps in .so). Of course, these are just musings and, perhaps does not > really concern the requirements of custom scan methods you have been > developing. > Thanks for your idea. Indeed, smgr hooks are good candidate to implement the feature, however, what I need is a thin intermediation layer rather than alternative storage engine. It becomes clear we need two features here. 1. A feature to check whether a particular block is already on the shared buffer pool. It is available. BufTableLookup() under the BufMappingPartitionLock gives us the information we want. 2. A feature to suspend i/o write-out towards a particular blocks that are registered by other concurrent backend, unless it is not unregistered (usually, at the end of P2P DMA). ==> to be discussed. When we call smgrwrite(), like FlushBuffer(), it fetches function pointer from the 'smgrsw' array, then calls smgr_write. void smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync) { (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum, buffer, skipFsync); } If extension would overwrite smgrsw[] array, then call the original function under the control by extension, it allows to suspend the call of the original smgr_write until completion of P2P DMA. It may be a minimum invasive way to implement, and portable to any further storage layers. How about your thought? Even though it is a bit different from your original proposition. -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
KaiGai-san, On 2016/02/01 10:38, Kouhei Kaigai wrote: > As an aside, background of my motivation is the slide below: > http://www.slideshare.net/kaigai/sqlgpussd-english > (LT slides in JPUG conference last Dec) > > I'm under investigation of SSD-to-GPU direct feature on top of > the custom-scan interface. It intends to load a bunch of data > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > loading onto CPU/RAM, to preprocess the data to be filtered out. > It only makes sense if the target blocks are not loaded to the > CPU/RAM yet, because SSD device is essentially slower than RAM. > So, I like to have a reliable way to check the latest status of > the shared buffer, to kwon whether a particular block is already > loaded or not. Quite interesting stuff, thanks for sharing! I'm in no way expert on this but could this generally be attacked from the smgr API perspective? Currently, we have only one implementation - md.c (the hard-coded RelationData.smgr_which = 0). If we extended that and provided end-to-end support so that there would be md.c alternatives to storage operations, I guess that would open up opportunities for extensions to specify smgr_which as an argument to ReadBufferExtended(), provided there is already support in place to install md.c alternatives (perhaps in .so). Of course, these are just musings and, perhaps does not really concern the requirements of custom scan methods you have been developing. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
> On 1/31/16 7:38 PM, Kouhei Kaigai wrote: > > I'm under investigation of SSD-to-GPU direct feature on top of > > the custom-scan interface. It intends to load a bunch of data > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > > loading onto CPU/RAM, to preprocess the data to be filtered out. > > It only makes sense if the target blocks are not loaded to the > > CPU/RAM yet, because SSD device is essentially slower than RAM. > > So, I like to have a reliable way to check the latest status of > > the shared buffer, to kwon whether a particular block is already > > loaded or not. > > That completely ignores the OS cache though... wouldn't that be a major > issue? > Once we can ensure the target block is not cached in the shared buffer, it is a job of the driver that support P2P DMA to handle OS page cache. Once driver get a P2P DMA request from PostgreSQL, it checks OS page cache status and determine the DMA source; whether OS buffer or SSD block. > To answer your direct question, I'm no expert, but I haven't seen any > functions that do exactly what you want. You'd have to pull relevant > bits from ReadBuffer_*. Or maybe a better method would just be to call > BufTableLookup() without any locks and if you get a result > -1 just > call the relevant ReadBuffer function. Sometimes you'll end up calling > ReadBuffer even though the buffer isn't in shared buffers, but I would > think that would be a rare occurrence. > Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer() has a good example for this. If it returned a valid buf_id, we have nothing difficult; just call ReadBuffer() to pin the buffer. Elsewhere, when BufTableLookup() returned negative, it means a pair of (relation, forknum, blocknum) does not exist on the shared buffer. So, extension enqueues P2P DMA request for asynchronous translation, then driver processes the P2P DMA soon but later. Concurrent access may always happen. PostgreSQL uses MVCC, so the backend which issued P2P DMA does not need to pay attention for new tuples that didn't exist on executor start time, even if other backend loads and updates the same buffer just after the above BufTableLookup(). On the other hands, we have to pay attention whether a fraction of the buffer page is partially written to OS buffer or storage. It is in the scope of operating system, so it is not controllable from us. One idea I can find out is, temporary suspension of FlushBuffer() for a particular pairs of (relation, forknum, blocknum) until P2P DMA gets completed. Even if concurrent backend updates the buffer page after the BufTableLookup(), it allows to prevent OS caches and storages getting dirty during the P2P DMA. How about people's thought? -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?
On 1/31/16 7:38 PM, Kouhei Kaigai wrote: > I'm under investigation of SSD-to-GPU direct feature on top of > the custom-scan interface. It intends to load a bunch of data > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data > loading onto CPU/RAM, to preprocess the data to be filtered out. > It only makes sense if the target blocks are not loaded to the > CPU/RAM yet, because SSD device is essentially slower than RAM. > So, I like to have a reliable way to check the latest status of > the shared buffer, to kwon whether a particular block is already > loaded or not. That completely ignores the OS cache though... wouldn't that be a major issue? To answer your direct question, I'm no expert, but I haven't seen any functions that do exactly what you want. You'd have to pull relevant bits from ReadBuffer_*. Or maybe a better method would just be to call BufTableLookup() without any locks and if you get a result > -1 just call the relevant ReadBuffer function. Sometimes you'll end up calling ReadBuffer even though the buffer isn't in shared buffers, but I would think that would be a rare occurrence. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers