The documentation is sure plentiful but it really needs a fact and consistency checker. The rampant use of '/var/lib/one/datastores' in particular needs to be excised everywhere and replaced with $DATASTORE_LOCATION because there is a huge difference between what the front-end is doing with it and what the Host is using it for, or so the documentation seems to imply.

If I may summarize my understandings.
It looks like ONE started out with every host (front-end included) having their own local storage in the guise of 'system, id=0' and under that model I/O for any given VM was limited only by the local host's spindle capacity and any other VMs likewise co-located.

When moving to a shared 'system' mountpoint the underlying storage gets hammered because every host and every guest that's alive is using it. The problem could be mitigated somewhat if the source image could be referenced indirectly via symlinks (does that work on VMware VMFS via RDM?). Or by using clusters and selectively overriding what datastore was marked as 'system'. The upside was obviously the ability to do warm/hot migration between hosts.

Under the old way presumably the 'system' used "TM_MAD=ssh" and the front-end could (must?) be used as the repository of all non-running disk images. Yet all image operations are supposed to be carried out at the host, so

Q0: why was the front-end involved in storing anything under '.../datastores/0'? If it was storing "at rest" disk images because there was no other provider, then it should have been under datastores with id!=0.

Q1: Under the shared model, the front-end definitely doesn't need access to 'system' ever? The drawing and text disagree on "http://opennebula.org/documentation:rel3.6:system_ds";

BUG: Can we please fix 'onedatastore show <#>' such that "BASE PATH" to use the literal string '$DATASTORE_LOCATION' or the current value of the variable as 'oned' understands it to be (see /var/lib/one/config). Always returning '/var/lib/one/datastores/...' is wrong. Better yet the value should be auto-generated unless the user has hard-coded it. Currently it appears there is no way for me to force it to be an arbitrary value. This is particularly pertinent when dealing with VMware since VMFS are located at '/vmfs/volumes' and since there is no persistence, any hackery like creating '/var/datastores/<#>' isn't going to survive a reboot.

Q2: Why are disks images being "copied" (to mean symlinks I guess) when the datastore type is 'shared' or 'vmware' unless the disk type is 'clone'? Just hit the source image directly wherever it is.

Q3: Can we dispense with this whole 'system' being mandatory let alone being at a fixed location? There is no reason why the datastore that contains the "at rest" image can't be used when the VM is running and also include the volatile and clone images. Of course that doesn't apply if the source is only reachable via SSH, can't withstand the IOPs, or is otherwise unsuitable. I also find the term 'system' misleading when it should be named something more like 'runtime'. May I suggest a datastore attribute "ALLOW_RUN=" or "RUNTIME_SAFE=yes|no" with the unspecified behavior being that of 'no' and thus do the copying?

Q4: What happens if there are multiple "SYSTEM=yes" datastores in the context of a cluster (including the special cluster 'none')? Why shouldn't the runtime datastore(s) also be a HOST attribute in addition to a cluster one ala "SYSTEM_DS = <id> [id ...]"? If not specified the scheduler would revert to the more general scope and pick one that has sufficient space. It is perfectly reasonable to have different 'system' datastore sets across hosts even in the same cluster; some may have extra disks, broken disks, whatever. Deployment shouldn't break and I shouldn't have to side-line a host because it isn't strictly identical to it's peers.

Q5: Are there plans to have a 'system' datastore of type iSCSI or LVM? It would only make sense if the source was of like type. Though actually a sparse file on a filesystem would work as a block device too so this is more about supporting BLOCK devices for 'system' use.

Q6: when is it safe to override variables like VM_DIR or DS_DIR? Is there an accepted methodology?

--
Cloud Services Architect, Senior System Administrator
InfoRelay Online Systems (www.inforelay.com)
_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to