Thanks Tomas

On Thu, Feb 22, 2024 at 3:05 AM Tomas Vondra <tomas.von...@enterprisedb.com>
wrote:

> On 2/22/24 02:22, Siddharth Jain wrote:
> > Hi All,
> >
> > I understand the storage layer in databases goes to great lengths to
> ensure:
> > - a row does not cross a block boundary
> > - read/writes/allocation happen in units of blocks
> > etc. The motivation is that at the OS level, it reads and writes pages
> > (blocks), not individual bytes. I am only concerned about SSDs but I
> think
> > the principle applies to HDD as well.
> >
> > but how can we do all this when we are not even guaranteed that the
> > beginning of a file will be aligned with a block boundary? refer this
> > <
> https://stackoverflow.com/questions/8018449/is-it-guaranteed-that-the-beginning-of-a-file-is-aligned-with-pagesize-of-file-s
> >
> > .
> >
> > Further, I don't see any APIs exposing I/O operations in terms of blocks.
> > All File I/O APIs I see expose a file as a randomly accessible contiguous
> > byte buffer. Would it not have been easier if there were APIs that
> exposed
> > I/O operations in terms of blocks?
> >
> > can someone explain this to me?
> >
>
> The short answer is that this is well outside our control. We do the
> best we can - split our data files to "our" 8kB pages - and hope that
> the OS / filesystem will do the right thing to map this to blocks at the
> storage level.
>
> The filesystems do the same thing, to some extent - they align stuff
> with respect to the beginning of the partition, but if the partition
> itself is not properly aligned, that won't really work.
>
> As for the APIs, we work with what we have in POSIX - I don't think
> there are any APIs working with blocks, and it's not clear to me how
> would it fundamentally differ from the APIs we have now. Moreover, it's
> not really clear which of the "block" would matter. The postgres 8kB
> page? The filesytem page? The storage block/sector size?
>
> FWIW I think for SSDs this matters way more than for HDD, because SSDs
> have to erase the space before a rewrite, which makes it much more
> expensive. But that's not just about the alignment, but about the page
> size (with smaller pages being better).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Reply via email to