What enhancement will be necessary to implement similar feature of
partial seq-scan using custom-scan interface?
It seems to me callbacks on the three points below are needed.
Does ForeignScan also need equivalent enhancement?
Background of my motivation is the slides below:
(LT slides in JPUG conference last Dec)
I'm under investigation of SSD-to-GPU direct feature on top of
the custom-scan interface. It intends to load a bunch of data
blocks on NVMe-SSD to GPU RAM using peer-to-peer DMA, prior to
data loading onto CPU/RAM. (Probably, it shall be loaded only
all-visible blocks like as index-only scan.)
Once we load the data blocks onto GPU RAM, we can reduce rows
to be filtered out later but consumes CPU RAM.
An expected major bottleneck is CPU thread which issues the
peer-to-peer DMA requests to the device, rather than GPU tasks.
So, utilization of parallel execution is a natural thought.
However, a CustomScan node that takes underlying PartialSeqScan
node is not sufficient because it once loads the data blocks
onto CPU RAM. P2P DMA does not make sense.
The expected "GpuSsdScan" on CustomScan will reference a shared
block-index to be incremented by multiple backend, then it
enqueues P2P DMA request (if all visible) to the device driver.
Then it receives the rows only visible towards the scan qualifiers.
It is almost equivalent to SeqScan, but wants to bypass heap layer
to utilize SSD-to-GPU direct data translation path.
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: