Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]

Stefan Hajnoczi Fri, 16 Sep 2011 03:39:58 -0700

On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> ><stef...@linux.vnet.ibm.com>  wrote:
> >>  One property of the blobstore is that it has a certain required size for
> >>accommodating all blobs of device that want to store their blobs onto. The
> >>assumption is that the size of these blobs is know a-priori to the writer of
> >>the device code and all devices can register their space requirements with
> >>the blobstore during device initialization. Then gathering all the
> >>registered blobs' sizes plus knowing the overhead of the layout of the data
> >>on the disk lets QEMU calculate the total required (minimum) size that the
> >>image has to have to accommodate all blobs in a particular blobstore.
> >Libraries like tdb or gdbm come to mind.  We should be careful not to
> >reinvent cpio/tar or FAT :).
> Sure. As long as these dbs allow to over-ride open(), close(),
> read(), write() and seek() with bdrv ops we could recycle any of
> these. Maybe we can build something smaller than those...
> >What about live migration?  If each VM has a LUN assigned on a SAN
> >then these qcow2 files add a new requirement for a shared file system.
> >
> Well, one can still block-migrate these. The user has to know of
> course whether shared storage is setup or not and pass the
> appropriate flags to libvirt for migration. I know it works (modulo
> some problems when using encrypted QCoW2) since I've been testing
> with it.
> 
> >Perhaps it makes sense to include the blobstore in the VM state data
> >instead?  If you take that approach then the blobstore will get
> >snapshotted *into* the existing qcow2 images.  Then you don't need a
> >shared file system for migration to work.
> >
> It could be an option. However, if the user has a raw image for the
> VM we still need the NVRAM emulation for the TPM for example. So we
> need to store the persistent data somewhere but raw is not prepared
> for that. Even if snapshotting doesn't work at all we need to be
> able to persist devices' data.
> 
> 
> >Can you share your design for the actual QEMU API that the TPM code
> >will use to manipulate the blobstore?  Is it designed to work in the
> >event loop while QEMU is running, or is it for rare I/O on
> >startup/shutdown?
> >
> Everything is kind of changing now. But here's what I have right now:
> 
>     tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
>     if (!tb->s.tpm_ltpms->nvram) {
>         fprintf(stderr, "Could not find nvram.\n");
>         return errcode;
>     }
> 
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_VOLASTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
> 
>     rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
> 
> Above first sets up the NVRAM using the drive's id. That is the
> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
> Subsequently the blobs to be written into the NVRAM are registered.
> The nvram_start then reconciles the registered NVRAM blobs with
> those found on disk and if everything fits together the result is
> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
> same also with the same NVRAM or another NVRAM. (NVRAM now after
> renaming from blobstore).
> 
> Reading from NVRAM in case of the TPM is a rare event. It happens in
> the context of QEMU's main thread:
> 
>     if (nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
> &tpm_ltpms->permanent_state.buffer,
> &tpm_ltpms->permanent_state.size,
>                         0, NULL, NULL) ||
>         nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
> &tpm_ltpms->save_state.buffer,
> &tpm_ltpms->save_state.size,
>                         0, NULL, NULL))
>     {
>         tpm_ltpms->had_fatal_error = true;
>         return;
>     }
> 
> Above reads the data of 2 blobs synchronously. This happens during startup.
> 
> 
> Writes are depending on what the user does with the TPM. He can
> trigger lots of updates to persistent state if he performs certain
> operations, i.e., persisting keys inside the TPM.
> 
>     rc = nvram_write_data(tpm_ltpms->nvram,
>                           what, tsb->buffer, tsb->size,
>                           VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
>                           NULL, NULL);
> 
> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
> thread and notifies the QEMU main thread to write the blob into
> NVRAM. I do this synchronously at the moment not using the last two
> parameters for callback after completion but the two flags. The
> first is to notify the main thread the 2nd flag is to wait for the
> completion of the request (using a condition internally).
> 
> Here are the protos:
> 
> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
> 
> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
> 
> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
>                         unsigned int maxsize);
> 
> unsigned int nvram_get_totalsize(VNVRAM *bs);
> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
> 
> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
>                              unsigned char **data, unsigned int len);
> 
> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
>                      const unsigned char *data, unsigned int len,
>                      int flags, NVRAMRWFinishCB cb, void *opaque);
> 
> 
> As said, things are changing right now, so this is to give an impression...


Thanks, these details are interesting.  I interpreted the blobstore as a
key-value store but these example show it as a stream.  No IDs or
offsets are given, the reads are just performed in order and move
through the NVRAM.  If it stays this simple then bdrv_*() is indeed a
natural way to do this - although my migration point remains since this
feature adds a new requirement for shared storage when it would be
pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
is relatively small?).

Stefan

Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]

Reply via email to