Overall, the Ceph GUI is great. I actually got Ceph up and running (and working) this time! Syncing ceph.conf through corosync is such an obvious way to simplify things... for small clusters, anyway.

I am seeing some problems, however, and I'm not sure if they're just me, or if I should be opening bugs:


1. I have one node that's up and running just fine, pvecm claims everything's fine, but I can't migrate VMs that started somewhere else to it - migration always fails, claiming the node is dead. Nothing unusual appears in any logfile that I can see... or at least nothing that looks bad to me. I can create a new VM there, migrate it (online) to another node and migrate it back (online, again), but VMs that were started on another node won't migrate.

2. CPU usage in the "Summary" screen of each VM sometimes reports non-sensical values: right now one VM is using 126% of 1 vCPU.

3. The Wiki page on setting up CEPH Server doesn't mention that you can do most of the setup from within the GUI. Since I have write access there, I guess I should fix it myself :-).

4. (This isn't really new...) SPICE continues to be a major PITA when running Ubuntu 12.04LTS as the management client. Hmm, I just found a PPA with virt-viewer packages that work. I should update the Wiki with that info, too.

5. Stopping VMs with HA enabled is now an *extremely* slow process... If I disable HA for a particular VM, I now notice that Stopping also produces a Shutdown task, and it takes longer than previously, but not unreasonably slow. I don't understand why Stop isn't instantaneous, though. I notice that typing "stop" into a qm monitor also is slow... the only way I have to rapidly stop a VM is to kill the KVM process running it.

6. I'm not sure if this is new, but when I have a VM under HA, if I stop it manually, it immediately restarts. I don't know if I ever tried that under 3.1 Enterprise... maybe it always worked this way?


Ceph speeds are barely acceptable (10-20MB/sec) but that's typical of Ceph in my experience so far, even with caching turned on. (Still a bit of a letdown compared to Sheepdog's 300MB/sec burst throughput, though.)

One thing I'm not sure of is OSD placement... if I have two drives per host dedicated to Ceph (and thus two OSDs), and my pool "size" is 2, does that mean a single node failure could render some data unreachable? I've adjusted my "size" to 3 just in case, but I don't understand how this works. Sheepdog guarantees that multiple copies of an object won't be stored on the same host for exactly this reason, but I can't tell what Ceph does.

Also not sure what's going on with thin-provisioning; I guess Ceph and QEMU/KVM don't do thin provisioning at all, in any way, shape or form?

--
-Adam Thompson
 [email protected]

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to