I want to repeat here what I said in another thread:
I think ultimately we will need three commands to control the keys.
First, there is the cluster_key_command, which we have now. Second, I
think we will need an optional command which returns random bytes ---
this would allow users to get random bytes from a different source than
that used by the server code.
Third, we will probably need a command that returns the data encryption
keys directly, either heap/index or WAL keys, probably based on key
number --- you pass the key number you want, and the command returns the
data key. There would not be a cluster key in this case, but the
command could still prompt the user for perhaps a password to the KMS
server. It could not be used if any of the previous two commands are
used. I assume an HMAC would still be stored in the pg_cryptokeys
directory to check that the right key has been returned.
I thought we should implement the first command, because it will
probably be the most common and easiest to use, and then see what people
want added.
There is also a fourth option where the command returns multiple keys,
one per line of hex digits. That could be written in shell script, but
it would be fragile and complex. It could be written in Perl, but that
would add a new language requirement for this feature. It could be
written in C, but that would limits its flexibility because changes
would require a recompile, and you would probably need that C file to
call external scripts to tailor input like we do now from the server.
You could actually write a full implemention of what we do on the server
side in client code, but pg_alterckey would not work, since it would not
know the data format, so we would need another cluster key alter for that.
It could be written as a C extension, but that would be also have C's
limitations. In summary, having the server do most of the complex work
for the default case seems best, and eventually allowing the ability for
the client to do everything seems ideal. I think we need more input
before we go beyond what we do now.
As I said in the commit thread, I disagree with this approach because it
pushes for no or partial or maybe bad design.
I think that an API should be carefully thought about, without assumption
about the underlying cryptography (algorithm, key lengths, modes, how keys
are derived and stored, and so on), and its usefulness be demonstrated by
actually being used for one implementation which would be what is
currently being proposed in the patch, and possibly others thrown in for
free.
The implementations should not have to be in any particular language:
Shell, Perl, Python, C should be possible.
After giving it more thought during the day, I think that only one
command and a basic protocol is needed. Maybe something as simple as
/path/to/command --options arguments…
With a basic (text? binary?) protocol on stdin/stdout (?) for the
different functions. What the command actually does (connect to a remote
server, ask for a master password, open some other database, whatever)
should be irrelevant to pg, which would just get and pass bunch of bytes
to functions, which could use them for keys, secrets, whatever, and be
easily replaceable.
The API should NOT make assumptions about the cryptographic design, what
depends about what, where things are stored… ISTM that Pg should only care
about naming keys, holding them when created/retrieved (but not create
them), basically interacting with the key manager, passing the stuff to
functions for encryption/decryption seen as black boxes.
I may have suggested something along these lines at the beginning of the
key management thread, probably. Not going this way implicitely implies
making some assumptions which may or may not suit other use cases, so
makes them specific not generic. I do not think pg should do that.
--
Fabien.