On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote: > On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote: > >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger > ><stef...@linux.vnet.ibm.com> wrote: > >> One property of the blobstore is that it has a certain required size for > >>accommodating all blobs of device that want to store their blobs onto. The > >>assumption is that the size of these blobs is know a-priori to the writer of > >>the device code and all devices can register their space requirements with > >>the blobstore during device initialization. Then gathering all the > >>registered blobs' sizes plus knowing the overhead of the layout of the data > >>on the disk lets QEMU calculate the total required (minimum) size that the > >>image has to have to accommodate all blobs in a particular blobstore. > >Libraries like tdb or gdbm come to mind. We should be careful not to > >reinvent cpio/tar or FAT :). > Sure. As long as these dbs allow to over-ride open(), close(), > read(), write() and seek() with bdrv ops we could recycle any of > these. Maybe we can build something smaller than those... > >What about live migration? If each VM has a LUN assigned on a SAN > >then these qcow2 files add a new requirement for a shared file system. > > > Well, one can still block-migrate these. The user has to know of > course whether shared storage is setup or not and pass the > appropriate flags to libvirt for migration. I know it works (modulo > some problems when using encrypted QCoW2) since I've been testing > with it. > > >Perhaps it makes sense to include the blobstore in the VM state data > >instead? If you take that approach then the blobstore will get > >snapshotted *into* the existing qcow2 images. Then you don't need a > >shared file system for migration to work. > > > It could be an option. However, if the user has a raw image for the > VM we still need the NVRAM emulation for the TPM for example. So we > need to store the persistent data somewhere but raw is not prepared > for that. Even if snapshotting doesn't work at all we need to be > able to persist devices' data. > > > >Can you share your design for the actual QEMU API that the TPM code > >will use to manipulate the blobstore? Is it designed to work in the > >event loop while QEMU is running, or is it for rare I/O on > >startup/shutdown? > > > Everything is kind of changing now. But here's what I have right now: > > tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode); > if (!tb->s.tpm_ltpms->nvram) { > fprintf(stderr, "Could not find nvram.\n"); > return errcode; > } > > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_PERMSTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE)); > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_SAVESTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE)); > nvram_register_blob(tb->s.tpm_ltpms->nvram, > NVRAM_ENTRY_VOLASTATE, > tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE)); > > rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive); > > Above first sets up the NVRAM using the drive's id. That is the > -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM. > Subsequently the blobs to be written into the NVRAM are registered. > The nvram_start then reconciles the registered NVRAM blobs with > those found on disk and if everything fits together the result is > 'rc = 0' and the NVRAM is ready to go. Other devices can than do the > same also with the same NVRAM or another NVRAM. (NVRAM now after > renaming from blobstore). > > Reading from NVRAM in case of the TPM is a rare event. It happens in > the context of QEMU's main thread: > > if (nvram_read_data(tpm_ltpms->nvram, > NVRAM_ENTRY_PERMSTATE, > &tpm_ltpms->permanent_state.buffer, > &tpm_ltpms->permanent_state.size, > 0, NULL, NULL) || > nvram_read_data(tpm_ltpms->nvram, > NVRAM_ENTRY_SAVESTATE, > &tpm_ltpms->save_state.buffer, > &tpm_ltpms->save_state.size, > 0, NULL, NULL)) > { > tpm_ltpms->had_fatal_error = true; > return; > } > > Above reads the data of 2 blobs synchronously. This happens during startup. > > > Writes are depending on what the user does with the TPM. He can > trigger lots of updates to persistent state if he performs certain > operations, i.e., persisting keys inside the TPM. > > rc = nvram_write_data(tpm_ltpms->nvram, > what, tsb->buffer, tsb->size, > VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F, > NULL, NULL); > > Above writes a TPM blob into the NVRAM. This is triggered by the TPM > thread and notifies the QEMU main thread to write the blob into > NVRAM. I do this synchronously at the moment not using the last two > parameters for callback after completion but the two flags. The > first is to notify the main thread the 2nd flag is to wait for the > completion of the request (using a condition internally). > > Here are the protos: > > VNVRAM *nvram_setup(const char *drive_id, int *errcode); > > int nvram_start(VNVRAM *, bool fail_on_encrypted_drive); > > int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type, > unsigned int maxsize); > > unsigned int nvram_get_totalsize(VNVRAM *bs); > unsigned int nvram_get_totalsize_kb(VNVRAM *bs); > > typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write, > unsigned char **data, unsigned int len); > > int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type, > const unsigned char *data, unsigned int len, > int flags, NVRAMRWFinishCB cb, void *opaque); > > > As said, things are changing right now, so this is to give an impression...
Thanks, these details are interesting. I interpreted the blobstore as a key-value store but these example show it as a stream. No IDs or offsets are given, the reads are just performed in order and move through the NVRAM. If it stays this simple then bdrv_*() is indeed a natural way to do this - although my migration point remains since this feature adds a new requirement for shared storage when it would be pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM is relatively small?). Stefan