Hi, Thanks for your report.
On 2010/04/19 3:47, Wido den Hollander wrote: > Hi, > > My sheepdog cluster isn't online, so i gets rebooted a few times a week. > > I'm using the cluster for testing Ceph and Sheepdog, and this week i was > playing more with Ceph then Sheepdog. > > Now i just checked my cluster and it seems that my nodes can't find > eachother anymore. > Sorry, the latest sheepdog in the git tree cannot seem to handle node join well. > > I double check, collie is running on all 5 nodes and the sheepdog > directory is mounted on all 5. > > Please note, this cluster was running fine a few days ago, nothing > changed in the mount points, corosync configuration or anything else > regarding sheepdog. > > What i did notice is: > > r...@osd1:~# shepherd info -t cluster > there is inconsistency between epochs > > Ctime Epoch Nodes > 10-04-15 17:24:00 4 [192.168.6.215:7000, 192.168.6.215:7000, > 192.168.6.213:7000, 192.168.6.211:7000, 192.168.6.211:7000, > 192.168.6.214:7000] > r...@osd1:~# > Let me clarify a few things. Did you run `shepherd shutdown' before stopping collie processes? Do problems occur only when rebooting sheepdog? Clean startup doesn't cause problems always? > Creating a new image also fails.. > > r...@osd1:~# /usr/local/bin/qemu-img create -f sheepdog johndoe 10G > Formatting 'johndoe', fmt=sheepdog size=10737418240 > do_sd_create 1143: Invalid error code, johndoe > qemu-img: Error while formatting > r...@osd1:~# > > I got the cluster running again after clearing all the sheepdog > directories and do a mkfs again, but this shouldn't happen, a cluster > should survive several reboots, shouldn't it? > Yes, it should. We'll fix this problem as soon as possible. > After rebooting my machines, the sheepdog cluster was unstable again. > Same result, nodes couldn't find eachother. > Thanks, Kazutaka Morita -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
