Re: [Sheepdog] Cluster doesn't come up correctly after reboot

MORITA Kazutaka Sun, 18 Apr 2010 19:36:21 -0700

Hi,

Thanks for your report.


On 2010/04/19 3:47, Wido den Hollander wrote:
> Hi,
> 
> My sheepdog cluster isn't online, so i gets rebooted a few times a week.
> 
> I'm using the cluster for testing Ceph and Sheepdog, and this week i was
> playing more with Ceph then Sheepdog.
> 
> Now i just checked my cluster and it seems that my nodes can't find
> eachother anymore.
> 
Sorry, the latest sheepdog in the git tree cannot seem to handle node join
well.

> 
> I double check, collie is running on all 5 nodes and the sheepdog
> directory is mounted on all 5.
> 
> Please note, this cluster was running fine a few days ago, nothing
> changed in the mount points,  corosync configuration or anything else
> regarding sheepdog.
> 
> What i did notice is:
> 
> r...@osd1:~# shepherd info -t cluster
> there is inconsistency between epochs
> 
> Ctime              Epoch Nodes
> 10-04-15 17:24:00      4 [192.168.6.215:7000, 192.168.6.215:7000,
> 192.168.6.213:7000, 192.168.6.211:7000, 192.168.6.211:7000,
> 192.168.6.214:7000]
> r...@osd1:~#
> 
Let me clarify a few things. Did you run `shepherd shutdown' before
stopping collie processes? Do problems occur only when rebooting
sheepdog? Clean startup doesn't cause problems always?

> Creating a new image also fails..
> 
> r...@osd1:~# /usr/local/bin/qemu-img create -f sheepdog johndoe 10G
> Formatting 'johndoe', fmt=sheepdog size=10737418240 
> do_sd_create 1143: Invalid error code, johndoe
> qemu-img: Error while formatting
> r...@osd1:~# 
> 
> I got the cluster running again after clearing all the sheepdog
> directories and do a mkfs again, but this shouldn't happen, a cluster
> should survive several reboots, shouldn't it?
> 
Yes, it should. We'll fix this problem as soon as possible.

> After rebooting my machines, the sheepdog cluster was unstable again.
> Same result, nodes couldn't find eachother.
> 


Thanks,

Kazutaka Morita


-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Re: [Sheepdog] Cluster doesn't come up correctly after reboot

Reply via email to