Re: [ceph-users] ceph journal failed??
ok, You give me the answer, thanks a lot. But, I don't know the answer to your questions. Maybe someone else can answer. --Original-- From: "Loris Cuoghi";<l...@stella-telecom.fr>; Date: Tue, Dec 22, 2015 07:31 PM To: "ceph-users"<ceph-users@lists.ceph.com>; Subject: Re: [ceph-users] ceph journal failed?? Le 22/12/2015 09:42, yuyang a ??crit : Hello, everyone, [snip snap] Hi If the SSD failed or down, can the OSD work? Is the osd down or only can be read? If you don't have a journal anymore, the OSD has already quit, as it can't continue writing, nor it can assure data consistency, since writes have probably been interrupted. The Ceph's community general assumption for a dead journal, is a dead OSD. But. http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/ How does this apply in reality? Is the solution that S??bastien is proposing viable? In most/all cases? Will the OSD continue chugging along after this kind of surgery? Is it necessary/suggested to deep scrub ASAP the OSD's placement groups? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph journal failed?
Hello, On Wed, 23 Dec 2015 11:46:58 +0800 yuyang wrote: > ok, You give me the answer, thanks a lot. > Assume that a journal SSD failure means a loss of all associated OSDs. So in your case a single SSD failure will cause the data loss of a whole node. If you have 15 or more of those nodes, your cluster should be able to handle the resulting I/O storm from recovering 9 OSDs, but with just a few nodes you will have a severe performance impact and also risk data loss if other failures occur during recovery. Lastly, a 1:9 SSD journal to SATA ratio sounds also wrong when it comes to performance, your SSD would need be able to handle about 900MB/s sync writes, that's very expensive territory. Christan > But, I don't know the answer to your questions. > > Maybe someone else can answer. > > --Original-- > From: "Loris Cuoghi";<l...@stella-telecom.fr>; > Date: Tue, Dec 22, 2015 07:31 PM > To: "ceph-users"<ceph-users@lists.ceph.com>; > Subject: Re: [ceph-users] ceph journal failed? > > Le 22/12/2015 09:42, yuyang a écrit : > Hello, everyone, > [snip snap] > > Hi > > If the SSD failed or down, can the OSD work? > Is the osd down or only can be read? > > If you don't have a journal anymore, the OSD has already quit, as it > can't continue writing, nor it can assure data consistency, since writes > have probably been interrupted. > > The Ceph's community general assumption for a dead journal, is a dead > OSD. > > But. > > http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/ > > How does this apply in reality? > Is the solution that Sébastien is proposing viable? > In most/all cases? > Will the OSD continue chugging along after this kind of surgery? > Is it necessary/suggested to deep scrub ASAP the OSD's placement groups? > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph journal failed??
Hello, everyone, I have a ceph cluster with sereral nodes, every node has 1 SSD and 9 STAT disks. Every STAT disk is used as an OSD, in order to improve IO performance, the SSD is used as journal file disk. That is, there are 9 nournal files in every SSD. If the SSD failed or down, can the OSD work? Is the osd down or only can be read? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph journal failed?
Le 22/12/2015 09:42, yuyang a écrit : Hello, everyone, [snip snap] Hi > If the SSD failed or down, can the OSD work? > Is the osd down or only can be read? If you don't have a journal anymore, the OSD has already quit, as it can't continue writing, nor it can assure data consistency, since writes have probably been interrupted. The Ceph's community general assumption for a dead journal, is a dead OSD. But. http://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/ How does this apply in reality? Is the solution that Sébastien is proposing viable? In most/all cases? Will the OSD continue chugging along after this kind of surgery? Is it necessary/suggested to deep scrub ASAP the OSD's placement groups? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com