Re: [PATCH] buffile: ensure start offset is aligned with BLCKSZ

2021-11-29 Thread Sasasu
On 2021/11/29 18:05, Antonin Houska wrote:
> Does this test really pass regression tests? In BufFileRead(), I would
> understand if you did
> 
> + file->pos = offsetInBlock;
> + file->curOffset -= offsetInBlock;
> 
> rather than
> 
> + file->pos += offsetInBlock;
> + file->curOffset -= offsetInBlock;

It pass all regression tests. this patch is compatible with
BufFileSeek().

to generate a correct alignment, we need to make sure
  pos_new + offset_new = pos_old + offset_old 
offset_new = offset_old - offset_old % BLCKSZ
it means
  pos_new = pos_old + offset_old % BLCKSZ
  = pos_old + "offsetInBlock"

with your code, backend will read a wrong buffile at the end of buffile
reading. for example: physical file size = 20 and pos = 10, off = 10,
read start at 20. after the '=' code: pos = 10, off = 0, read start at
10, which is wrong.

> Anyway, BufFileDumpBuffer() does not seem to enforce curOffset to end up at
> block boundary, not to mention BufFileSeek().
> 
> When I was implementing this for our fork [1], I concluded that the encryption
> code path is too specific, so I left the existing code for the unecrypted data
> and added separate functions for the encrypted data.
> 
> One specific thing is that if you encrypt and write n bytes, but only need
> part of it later, you need to read and decrypt exactly those n bytes anyway,
> otherwise the decryption won't work. So I decided not only to keep curOffset
> at BLCKSZ boundary, but also to read / write BLCKSZ bytes at a time. This also
> makes sense if the scope of the initialization vector (IV) is BLCKSZ bytes.
> 
> Another problem is that you might want to store the IV somewhere in between
> the data. In short, the encryption makes the buffered IO rather different and
> the specific code should be kept aside, although the same API is used to
> invoke it.
> 
but I want to make less change on existed code. with this path. the only
code added to critical code path is this:

diff --git a/src/backend/storage/file/buffile.c 
b/src/backend/storage/file/buffile.c
index 3be08eb723..ceae85584b 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -512,6 +512,9 @@ BufFileDumpBuffer(BufFile *file)
/* and the buffer is aligned with BLCKSZ */
Assert(file->curOffset % BLCKSZ == 0);
 
+   /* encrypt before write */
+   TBD_ENC(file->buffer.data + wpos /* buffer */, bytestowrite /* 
size */, file->curOffset /* context to find IV */);
+
thisfile = file->files[file->curFile];
bytestowrite = FileWrite(thisfile,
 
file->buffer.data + wpos,
@@ -582,6 +585,9 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
BufFileLoadBuffer(file);
if (file->nbytes <= 0 || (file->nbytes == file->pos && 
file->nbytes != BLCKSZ))
break;  /* no more data 
available */
+
+   /* decrypt after read */
+   TBD_DEC(file->buffer /* buffer */, file->nbytes /* size 
*/, file->curOffset /* context to find IV */);
}
 
nthistime = file->nbytes - file->pos;

those change will allow TDE to use any encryption algorithm (read offset
and write offset are matched) and implement on-the-fly IV generation.


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] buffile: ensure start offset is aligned with BLCKSZ

2021-11-29 Thread Antonin Houska
Sasasu  wrote:

> Hi hackers,
> 
> there are a very long discuss about TDE, and we agreed on that if the
> temporary file I/O can be aligned to some fixed size, it will be easier
> to use some kind of encryption algorithm.
> 
> discuss:
> https://www.postgresql.org/message-id/20211025155814.GD20998%40tamriel.snowman.net
> 
> This patch adjust file->curOffset and file->pos before the real IO to
> ensure the start offset is aligned.

Does this test really pass regression tests? In BufFileRead(), I would
understand if you did

+   file->pos = offsetInBlock;
+   file->curOffset -= offsetInBlock;

rather than

+   file->pos += offsetInBlock;
+   file->curOffset -= offsetInBlock;

Anyway, BufFileDumpBuffer() does not seem to enforce curOffset to end up at
block boundary, not to mention BufFileSeek().

When I was implementing this for our fork [1], I concluded that the encryption
code path is too specific, so I left the existing code for the unecrypted data
and added separate functions for the encrypted data.

One specific thing is that if you encrypt and write n bytes, but only need
part of it later, you need to read and decrypt exactly those n bytes anyway,
otherwise the decryption won't work. So I decided not only to keep curOffset
at BLCKSZ boundary, but also to read / write BLCKSZ bytes at a time. This also
makes sense if the scope of the initialization vector (IV) is BLCKSZ bytes.

Another problem is that you might want to store the IV somewhere in between
the data. In short, the encryption makes the buffered IO rather different and
the specific code should be kept aside, although the same API is used to
invoke it.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

[1] https://github.com/cybertec-postgresql/postgres/tree/PG_14_TDE_1_1




[PATCH] buffile: ensure start offset is aligned with BLCKSZ

2021-11-28 Thread Sasasu
Hi hackers,

there are a very long discuss about TDE, and we agreed on that if the
temporary file I/O can be aligned to some fixed size, it will be easier
to use some kind of encryption algorithm.

discuss:
https://www.postgresql.org/message-id/20211025155814.GD20998%40tamriel.snowman.net

This patch adjust file->curOffset and file->pos before the real IO to
ensure the start offset is aligned.From e04c2595711998d2c3eb4546e98a38d340200949 Mon Sep 17 00:00:00 2001
From: Sasasu 
Date: Fri, 26 Nov 2021 00:12:14 +0800
Subject: [PATCH] buffile: ensure start offset is aligned with BLCKSZ.

If we can ensure that all IO is aligned with BLCKSZ, some other work (TDE
recently and maybe AIO in future) will like it.

this commit ensure all IO on buffile is aligned with BLCKSZ by careful
adjusting file->curOffset and file->pos.

Signed-off-by: Sasasu 
---
 src/backend/storage/file/buffile.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index b673150dbe..3be08eb723 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -430,6 +430,9 @@ BufFileLoadBuffer(BufFile *file)
 {
 	File		thisfile;
 
+	/* the offset of the buffile is aligned with BLCKSZ */
+	Assert(file->curOffset % BLCKSZ == 0);
+
 	/*
 	 * Advance to next component file if necessary and possible.
 	 */
@@ -437,7 +440,7 @@ BufFileLoadBuffer(BufFile *file)
 		file->curFile + 1 < file->numFiles)
 	{
 		file->curFile++;
-		file->curOffset = 0L;
+		file->curOffset -= MAX_PHYSICAL_FILESIZE;
 	}
 
 	/*
@@ -506,6 +509,9 @@ BufFileDumpBuffer(BufFile *file)
 		if ((off_t) bytestowrite > availbytes)
 			bytestowrite = (int) availbytes;
 
+		/* and the buffer is aligned with BLCKSZ */
+		Assert(file->curOffset % BLCKSZ == 0);
+
 		thisfile = file->files[file->curFile];
 		bytestowrite = FileWrite(thisfile,
  file->buffer.data + wpos,
@@ -564,11 +570,17 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 		if (file->pos >= file->nbytes)
 		{
 			/* Try to load more data into buffer. */
-			file->curOffset += file->pos;
-			file->pos = 0;
+			int offsetInBlock = file->curOffset % BLCKSZ;
 			file->nbytes = 0;
+			file->pos += offsetInBlock;
+			file->curOffset -= offsetInBlock;
+
+			/* pos out of current block, move to next block */
+			if (file->pos >= BLCKSZ)
+file->pos -= BLCKSZ, file->curOffset += BLCKSZ;
+
 			BufFileLoadBuffer(file);
-			if (file->nbytes <= 0)
+			if (file->nbytes <= 0 || (file->nbytes == file->pos && file->nbytes != BLCKSZ))
 break;			/* no more data available */
 		}
 
-- 
2.34.1



OpenPGP_signature
Description: OpenPGP digital signature