Re: [ceph-users] Client forward compatibility
Hi Greg, On 24 Nov 2014, at 22:01, Gregory Farnum g...@gregs42.com wrote: On Thu, Nov 20, 2014 at 9:08 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi all, What is compatibility/incompatibility of dumpling clients to talk to firefly and giant clusters? We sadly don't have a good matrix about this yet, but in general you should assume that anything which changed the way the data is physically placed on the cluster will prevent them from communicating; if you don't enable those features then they should remain compatible. It would be good to have such a compat matrix, as I was confused, probably others are confused, and if I’m not wrong, even you are confused ... see below. In particular I know that tunables=firefly will prevent dumpling clients from talking to a firefly cluster, but how about the existence or not of erasure pools? As you mention, updating the tunables will prevent old clients from accessing them (although that shouldn't be the case in future now that they're all set by the crush map for later interpretation). Erasure pools are a special case (precisely because people had issues with them) and you should be able to communicate with a cluster that has EC pools while using old clients That’s what we’d hoped, but alas we get the same error mentioned here: http://tracker.ceph.com/issues/8178 In our case (0.67.11 clients talking to the latest firefly gitbuilder build) we get: protocol feature mismatch, my 407 peer 417 missing 10 By adding an EC pool, we lose connectivity for dumpling clients to even the replicated pools. The good news is that when we remove the EC pool, the 10 feature bit is removed so dumpling clients can connect again. But nevertheless it leaves open the possibility of accidentally breaking the users’ access. So this means we should upgrade all clients (quite a few qemu-kvm processes) to the firefly librbd before upgrading the cluster to firefly, to be 100% safe. — but, no: Can a dumpling client talk to a Firefly/Giant erasure pool if the tunables are still dumping? Definitely not. EC pools use a slightly different CRUSH algorithm than the old clients could, among many other things. Oops, duh! I’d convinced myself that it might be possible, since the EC work is coordinated by the primary OSD anyway. But I forgot that things like failover to other OSDs could never work without client knowledge of EC rules. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-announce list
Great news! Thanks for your support! We'll be waiting for subscribing to it when is ready. Best, --- JuanFra Rodriguez Cardoso 2014-11-24 21:55 GMT+01:00 Gregory Farnum g...@gregs42.com: On Fri, Nov 21, 2014 at 12:34 AM, JuanFra Rodriguez Cardoso juanfra.rodriguez.card...@gmail.com wrote: Hi all: As it was asked weeks ago.. what is the way the ceph community uses to stay tuned on new features and bug fixes? I asked Sage about this today and he said he'd set one up. Seems like a good idea; just not something we've ever thought about before. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What is the state of filestore sloppy CRC?
Hello, as far as I can tell, Ceph does not make any guarantee that reads from an object return what was actually written to it. In other words, it does not check data integrity (except doing deep-scrub once every few days). Considering the fact that BTRFS is not production-ready, not many people use Ceph on top of ZFS, then the only option to have some sort of guarantee of integrity is to enable filestore sloppy crc option. Unfortunately the docs aren't too clear about this matter and filestore sloppy crc is not even documented, which is weird considering it's merged since Emperor. Getting back to my actual question - what is the state of filestore sloppy crc? Does someone actually use it in production? Are there any considerations one should make before enabling it? Is it safe to enable it on an existing cluster? -- Tomasz Kuzemko tom...@kuzemko.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph fs has error: no valid command found; 10 closest matches: fsid
Huynh Dac Nguyen writes: Hi Chris, I see. I'm runing on version 0.80.7. How do we know which part of document for our version? As you see, we have only one ceph document here, It make us confused. Could you show me the document for ceph version 0.80.7? Tried ceph.com/docs/firefly [..] -- Abhishek signature.asc Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Hi travis, can I have a develop account or tester account in order to submit issue by myself? Thanks, Massimiliano Cuttini Il 18/11/2014 23:03, Travis Rhoden ha scritto: I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com mailto:trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it mailto:m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] private network - VLAN vs separate switch
Hi For a large network (say 100 servers and 2500 disks), are there any strong advantages to using separate switch and physical network instead of VLAN? Also, how difficult it would be to switch from a VLAN to using separate switches later? -Sreenath ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What is the state of filestore sloppy CRC?
According to XFS docs, setting crc=1 will only enable CRC validation of XFS metadata (ie. mtime, xattrs, etc.). Still, nothing guarantees integrity of the actual data. 2014-11-25 11:05 GMT+01:00 Denis Kaganovich maha...@bspu.unibel.by: How about XFS journal crc (mkfs stage crc=1)? Somebody trying? Tomasz Kuzemko писал 2014-11-25 13:01: Hello, as far as I can tell, Ceph does not make any guarantee that reads from an object return what was actually written to it. In other words, it does not check data integrity (except doing deep-scrub once every few days). Considering the fact that BTRFS is not production-ready, not many people use Ceph on top of ZFS, then the only option to have some sort of guarantee of integrity is to enable filestore sloppy crc option. Unfortunately the docs aren't too clear about this matter and filestore sloppy crc is not even documented, which is weird considering it's merged since Emperor. Getting back to my actual question - what is the state of filestore sloppy crc? Does someone actually use it in production? Are there any considerations one should make before enabling it? Is it safe to enable it on an existing cluster? -- Tomasz Kuzemko tom...@kuzemko.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Tomasz Kuzemko tom...@kuzemko.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Virtual machines using RBD remount read-only on OSD slow requests
Hi Alexandre, Thanks for your suggestion. I also considered using errors=continue, in line with the nobarrier idea, but I was afraid it might lead to silent corruption on errors not caused by slow requests on OSDs. I was hoping for a solution that would specifically allow slowness of the block device while still reacting to other errors, but I may have to accept the lesser evil or keep investigating. Cheers, Paulo On Mon, 2014-11-24 at 20:14 +0100, Alexandre DERUMIER wrote: Hi, try to mount your filesystems with errors=continue option From the mount (8) man page errors={continue|remount-ro|panic} Define the behaviour when an error is encountered. (Either ignore errors and just mark the filesystem erroneous and continue, or remount the filesystem read-only, or panic and halt the system.) The default is set in the filesystem superblock, and can be changed using tune2fs(8). - Mail original - De: Paulo Almeida palme...@igc.gulbenkian.pt À: ceph-users@lists.ceph.com Envoyé: Lundi 24 Novembre 2014 17:06:40 Objet: [ceph-users] Virtual machines using RBD remount read-only on OSD slow requests Hi, I have a Ceph cluster with 4 disk servers, 14 OSDs and replica size of 3. A number of KVM virtual machines are using RBD as their only storage device. Whenever some OSDs (always on a single server) have slow requests, caused, I believe, by flaky hardware or, in one occasion, by a S.M.A.R.T command that crashed the system disk of one of the disk servers, most virtual machines remount their disk read-only and need to be rebooted. One of the virtual machines still has Debian 6 installed, and it never crashes. It also has an ext3 filesystem, contrary to some other machines, which have ext4. ext3 does crash in systems with Debian 7, but those have different mount flags, such as barrier and data=ordered. I suspect (but haven't tested) that using nobarrier may solve the problem, but that doesn't seem to be an ideal solution. Most of those machines have Debian 7 or Ubuntu 12.04, but two of them have Ubuntu 14.04 (and thus a more recent kernel) and they also remount read-only. I searched the mailing list and found a couple of relevant messages. One person seemed to have the same problem[1], but someone else replied that it didn't happen in his case (I've had multiple VMs hang for hours at a time when I broke a Ceph cluster and after fixing it the VMs would start working again). The other message[2] is not very informative. Are other people experiencing this problem? Is there a file system or kernel version that is recommended for KVM guests that would prevent it? Or does this problem indicate that something else is wrong and should be fixed? I did configure all machines to use cache=writeback, but never investigated whether that makes a difference or even whether it is actually working. Thanks, Paulo Almeida Instituto Gulbenkian de Ciência, Oeiras, Portugal [1] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/8011 [2] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/1742 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Create OSD on ZFS Mount (firefly)
Testing ceph on top of ZFS (zfsonlinux), kernel driver. - Have created ZFS mount: /var/lib/ceph/osd/ceph-0 - followed the instructions at: http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/ failing on the step 4. Initialize the OSD data directory. ceph-osd -i 0 --mkfs --mkkey 2014-11-25 22:12:26.563666 7ff12b466780 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument Is this supported? thanks, -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Massimiliano, We have a documentation update coming shortly. RHEL 7 doesn't have yum-priorities, but you can use rpmfind to get it. Regards, John On Tue, Nov 25, 2014 at 3:02 AM, Massimiliano Cuttini m...@phoenixweb.it wrote: Hi travis, can I have a develop account or tester account in order to submit issue by myself? Thanks, Massimiliano Cuttini Il 18/11/2014 23:03, Travis Rhoden ha scritto: I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Red Hat jowil...@redhat.com (415) 425-9599 http://redhat.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install
Hi Massimiliano, On Tue, Nov 25, 2014 at 6:02 AM, Massimiliano Cuttini m...@phoenixweb.it wrote: Hi travis, can I have a develop account or tester account in order to submit issue by myself? Registration for the Ceph tracker is open -- anyone can sign up for an account to report issues. If you visit http://tracker.ceph.com, in the top right-hand corner is a link for Register. Hope that helps! - Travis Thanks, Massimiliano Cuttini Il 18/11/2014 23:03, Travis Rhoden ha scritto: I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote: Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the report! - Travis On Tue, Nov 18, 2014 at 4:41 PM, Massimiliano Cuttini m...@phoenixweb.it wrote: I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... [] [node1][DEBUG ] --- Pacchetto libXxf86vm.x86_64 0:1.1.3-2.1.el7 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libgbm.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] --- Pacchetto mesa-libglapi.x86_64 0:9.2.5-6.20131218.el7_0 settato per essere installato [node1][DEBUG ] -- Risoluzione delle dipendenze completata [node1][WARNIN] Errore: Pacchetto: ceph-common-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][DEBUG ] Si può provare ad usare --skip-broken per aggirare il problema [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Errore: Pacchetto: ceph-0.80.7-0.el7.centos.x86_64 (Ceph) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][DEBUG ] Provare ad eseguire: rpm -Va --nofiles --nodigest [node1][ERROR ] RuntimeError: command returned non-zero exit status: 1 *[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph* I installed GIANT version not FIREFLY on admin-node. Is it a typo error in the config file or is it truly trying to install FIREFLY instead of GIANT. About the error, i see that it's related to wrong python default libraries. It seems that CEPH require libraries not available in the current distro: [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) [node1][WARNIN] Richiede: libleveldb.so.1()(64bit) [node1][WARNIN] Richiede: libtcmalloc.so.4()(64bit) This seems strange. Can you fix this? Thanks, Massimiliano Cuttini ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What is the state of filestore sloppy CRC?
On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: Hello, as far as I can tell, Ceph does not make any guarantee that reads from an object return what was actually written to it. In other words, it does not check data integrity (except doing deep-scrub once every few days). Considering the fact that BTRFS is not production-ready, not many people use Ceph on top of ZFS, then the only option to have some sort of guarantee of integrity is to enable filestore sloppy crc option. Unfortunately the docs aren't too clear about this matter and filestore sloppy crc is not even documented, which is weird considering it's merged since Emperor. Getting back to my actual question - what is the state of filestore sloppy crc? Does someone actually use it in production? Are there any considerations one should make before enabling it? Is it safe to enable it on an existing cluster? We enable it in our automated QA, but do not know of anyone using it in production and have not recommended it for that. It is not intended to be particularly fast and we didn't thoroughly analyze the xattr size implications on the file systems people may run on. Also note that it simply fails (crashes) the OSD when it detects an error and has no integration with scrub, which makes it not particularly friendly. Note that I am working on a related patch set that will keep a persistent checksum of the entire object that will interact directly with deep scrubs. It will not be as fine-grained but is intended for production use and will cover the bulk of data that sits unmodified at rest for extended periods. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What is the state of filestore sloppy CRC?
On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote: On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: Hello, as far as I can tell, Ceph does not make any guarantee that reads from an object return what was actually written to it. In other words, it does not check data integrity (except doing deep-scrub once every few days). Considering the fact that BTRFS is not production-ready, not many people use Ceph on top of ZFS, then the only option to have some sort of guarantee of integrity is to enable filestore sloppy crc option. Unfortunately the docs aren't too clear about this matter and filestore sloppy crc is not even documented, which is weird considering it's merged since Emperor. Getting back to my actual question - what is the state of filestore sloppy crc? Does someone actually use it in production? Are there any considerations one should make before enabling it? Is it safe to enable it on an existing cluster? We enable it in our automated QA, but do not know of anyone using it in production and have not recommended it for that. It is not intended to be particularly fast and we didn't thoroughly analyze the xattr size implications on the file systems people may run on. Also note that it simply fails (crashes) the OSD when it detects an error and has no integration with scrub, which makes it not particularly friendly. We have run some initial tests of sloppy crc on our dev cluster and performance hit was in fact neglible (on SSD). We noticed also the crashing behavior on bad CRC, bad still I would prefer OSD to crash than to serve corrupted data to the client. So far we only had to modify upstart script to stop respawning OSD after a few crashes so we can detect the CRC error and let clients failover to another OSD. About xattr size limitations, as I understand it, when using omap no such limitations apply? Besides, considering default settings of 64k CRC block and 4M object size, only 64 additional metadata entries for CRC would be required. Note that I am working on a related patch set that will keep a persistent checksum of the entire object that will interact directly with deep scrubs. It will not be as fine-grained but is intended for production use and will cover the bulk of data that sits unmodified at rest for extended periods. When is it planned to release this feature? Will it be included as point release to Giant, or should we expect it in Hammer? sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Tomasz Kuzemko tomasz.kuze...@ovh.net signature.asc Description: Digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
On 11/25/2014 09:41 AM, Erik Logtenberg wrote: If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. This is really good advice, and it's not just the Intel 530s. Most consumer grade SSDs have pretty low write endurance. If you mostly are doing reads from your cluster you may be OK, but if you have even moderately high write workloads and you care about avoiding OSD downtime (which in a production cluster is pretty important though not usually 100% critical), get high write endurance SSDs. Mark Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-announce list
Should be all set now. I neglected to push the update yesterday, but it's there now. Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph On Tue, Nov 25, 2014 at 10:01 AM, Brian Rak b...@gameservers.com wrote: Hmm, that doesn't seem to be linked to from http://ceph.com/resources/mailing-list-irc/ On 11/25/2014 4:08 AM, JuanFra Rodriguez Cardoso wrote: Sorry.. as mentioned above, it's now open to sign up: http://lists.cep communityh.com/listinfo.cgi/ceph-announce-ceph.com Thanks a lot! --- JuanFra Rodriguez Cardoso 2014-11-25 10:03 GMT+01:00 JuanFra Rodriguez Cardoso juanfra.rodriguez.card...@gmail.com: Great news! Thanks for your support! We'll be waiting for subscribing to it when is ready. Best, --- JuanFra Rodriguez Cardoso 2014-11-24 21:55 GMT+01:00 Gregory Farnum g...@gregs42.com: On Fri, Nov 21, 2014 at 12:34 AM, JuanFra Rodriguez Cardoso juanfra.rodriguez.card...@gmail.com wrote: Hi all: As it was asked weeks ago.. what is the way the ceph community uses to stay tuned on new features and bug fixes? I asked Sage about this today and he said he'd set one up. Seems like a good idea; just not something we've ever thought about before. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
Hi. If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. We are using as a check_smart_attributes (1,2) nagios check to handle perf/thresholds for our different HDD/SDD models of our ceph cluser. regards Danny (1) http://git.thomas-krenn.com/check_smart_attributes.git/ (2) http://www.thomas-krenn.com/de/wiki/SMART_Attributes_Monitoring_Plugin smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
My cluster is actually very fast without SSD drives. Thanks for the advice! Michael Kuriger mk7...@yp.com 818-649-7235 MikeKuriger (IM) On 11/25/14, 7:49 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/25/2014 09:41 AM, Erik Logtenberg wrote: If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. This is really good advice, and it's not just the Intel 530s. Most consumer grade SSDs have pretty low write endurance. If you mostly are doing reads from your cluster you may be OK, but if you have even moderately high write workloads and you care about avoiding OSD downtime (which in a production cluster is pretty important though not usually 100% critical), get high write endurance SSDs. Mark Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listin fo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOS ncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaT ojJmuDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf o.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc m6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaTojJ muDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What is the state of filestore sloppy CRC?
sloppy crc uses fs xattrs directly, omap won't help. -Sam On Tue, Nov 25, 2014 at 7:39 AM, Tomasz Kuzemko tomasz.kuze...@ovh.net wrote: On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote: On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: Hello, as far as I can tell, Ceph does not make any guarantee that reads from an object return what was actually written to it. In other words, it does not check data integrity (except doing deep-scrub once every few days). Considering the fact that BTRFS is not production-ready, not many people use Ceph on top of ZFS, then the only option to have some sort of guarantee of integrity is to enable filestore sloppy crc option. Unfortunately the docs aren't too clear about this matter and filestore sloppy crc is not even documented, which is weird considering it's merged since Emperor. Getting back to my actual question - what is the state of filestore sloppy crc? Does someone actually use it in production? Are there any considerations one should make before enabling it? Is it safe to enable it on an existing cluster? We enable it in our automated QA, but do not know of anyone using it in production and have not recommended it for that. It is not intended to be particularly fast and we didn't thoroughly analyze the xattr size implications on the file systems people may run on. Also note that it simply fails (crashes) the OSD when it detects an error and has no integration with scrub, which makes it not particularly friendly. We have run some initial tests of sloppy crc on our dev cluster and performance hit was in fact neglible (on SSD). We noticed also the crashing behavior on bad CRC, bad still I would prefer OSD to crash than to serve corrupted data to the client. So far we only had to modify upstart script to stop respawning OSD after a few crashes so we can detect the CRC error and let clients failover to another OSD. About xattr size limitations, as I understand it, when using omap no such limitations apply? Besides, considering default settings of 64k CRC block and 4M object size, only 64 additional metadata entries for CRC would be required. Note that I am working on a related patch set that will keep a persistent checksum of the entire object that will interact directly with deep scrubs. It will not be as fine-grained but is intended for production use and will cover the bulk of data that sits unmodified at rest for extended periods. When is it planned to release this feature? Will it be included as point release to Giant, or should we expect it in Hammer? sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Tomasz Kuzemko tomasz.kuze...@ovh.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
I have suffered power losses in every data center I've been in. I have lost SSDs because of it (Intel 320 Series). The worst time, I lost both SSDs in a RAID1. That was a bad day. I'm using the Intel DC S3700 now, so I don't have a repeat. My cluster is small enough that losing a journal SSD would be a major headache. I'm manually monitoring wear level. So far all of my journals are still at 100% lifetime. I do have some of the Intel 320 that are down to 45% lifetime remaining. (Those Intel 320s are in less critical roles). One of these days I'll get around to automating it. Speed wise, my small cluster was fast enough without SSDs, until I started to expand. I'm only using RadosGW, and I only care about latency in the human timeframe. A second or two of latency is annoying, but not a big deal. I went from 3 nodes to 5, and the expansion was extremely painful. I admit that I inflicted a lot of pain on myself. I expanded too fast (add all the OSDs at the same time? Sure, why not.), and I was using the default configs. Things got better after I lowered the backfill priority and count, and learned to add one or two disks at a time. Still, customers noticed the increase in latency when I was adding osds. Now that I have the journals on SSDs, customers don't notice the maintenance anymore. RadosGW latency goes from ~50ms to ~80ms, not ~50ms to 2000ms. On Tue, Nov 25, 2014 at 9:12 AM, Michael Kuriger mk7...@yp.com wrote: My cluster is actually very fast without SSD drives. Thanks for the advice! Michael Kuriger mk7...@yp.com 818-649-7235 MikeKuriger (IM) On 11/25/14, 7:49 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/25/2014 09:41 AM, Erik Logtenberg wrote: If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. This is really good advice, and it's not just the Intel 530s. Most consumer grade SSDs have pretty low write endurance. If you mostly are doing reads from your cluster you may be OK, but if you have even moderately high write workloads and you care about avoiding OSD downtime (which in a production cluster is pretty important though not usually 100% critical), get high write endurance SSDs. Mark Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listin fo.cgi_ceph-2Dusers-2Dceph.com d=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOS ncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaT ojJmuDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf o.cgi_ceph-2Dusers-2Dceph.com d=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc m6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaTojJ muDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] private network - VLAN vs separate switch
It's mostly about bandwidth. With VLANs, the public and cluster networks are going to be sharing the inter-switch links. For a cluster that size, I don't see much advantage to the VLANs. You'll save a few ports by having the inter-switch links shared, at the expense of contention on those links. If you're trying to save ports, I'd go with a single network. Adding a cluster network later is relatively straight forward. Just monitor the bandwidth on the inter-switch links, and plan to expand when you saturate those links. That said, I am using VLANs, but my cluster is much smaller. I only have 5 nodes and a single switch. I'm planning to transition to a dedicated cluster switch when I need the extra ports. I don't anticipate the transition being difficult. I'll continue to use the same VLAN on the dedicated switch, just to make the migration less complicated. On Tue, Nov 25, 2014 at 3:11 AM, Sreenath BH bhsreen...@gmail.com wrote: Hi For a large network (say 100 servers and 2500 disks), are there any strong advantages to using separate switch and physical network instead of VLAN? Also, how difficult it would be to switch from a VLAN to using separate switches later? -Sreenath ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] private network - VLAN vs separate switch
Hi, From my humble opinion if you have enough money, separate switches are always a better choice. Regrads, I 2014-11-25 20:47 GMT+01:00 Craig Lewis cle...@centraldesktop.com: It's mostly about bandwidth. With VLANs, the public and cluster networks are going to be sharing the inter-switch links. For a cluster that size, I don't see much advantage to the VLANs. You'll save a few ports by having the inter-switch links shared, at the expense of contention on those links. If you're trying to save ports, I'd go with a single network. Adding a cluster network later is relatively straight forward. Just monitor the bandwidth on the inter-switch links, and plan to expand when you saturate those links. That said, I am using VLANs, but my cluster is much smaller. I only have 5 nodes and a single switch. I'm planning to transition to a dedicated cluster switch when I need the extra ports. I don't anticipate the transition being difficult. I'll continue to use the same VLAN on the dedicated switch, just to make the migration less complicated. On Tue, Nov 25, 2014 at 3:11 AM, Sreenath BH bhsreen...@gmail.com wrote: Hi For a large network (say 100 servers and 2500 disks), are there any strong advantages to using separate switch and physical network instead of VLAN? Also, how difficult it would be to switch from a VLAN to using separate switches later? -Sreenath ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA) Santander, Spain Tel: +34942200969 PGP PUBLIC KEY: http://pgp.mit.edu/pks/lookup?op=getsearch=0xD9DF0B3D6C8C08AC Bertrand Russell: *El problema con el mundo es que los estúpidos están seguros de todo y los inteligentes están llenos de dudas* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] private network - VLAN vs separate switch
For a large network (say 100 servers and 2500 disks), are there any strong advantages to using separate switch and physical network instead of VLAN? Physical isolation will ensure that congestion on one does not affect the other. On the flip side, asymmetric network failures tend to be more difficult to troubleshoot eg. backend failure with functional front end. That said, in a pinch you can switch to using the front end network for both until you can repair the backend. Also, how difficult it would be to switch from a VLAN to using separate switches later? Should be relatively straight forward. Simply configure the VLAN/subnets on the new physical switches and move links over one by one. Once all the links are moved over you can remove the VLAN and subnets that are now on the new kit from the original hardware. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] evaluating Ceph
Hi All, I am evaluating Ceph for one of our product requirements. I have gone through the website, http://ceph.com/docs/master/start/ I am using Ubuntu 14.04 LTS and am done with most of the steps. Finally, I am struck on Creating File System. From the website, The ceph fs new command was introduced in Ceph 0.84. Prior to this release, no manual steps are required to create a filesystem, and pools named data and metadata exist by default. Even though, I did update from ceph-deploy and directly from ceph, ceph -v ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) I am not able to upgrade/update to .84. Everytime, it says, you are already at the latest version, so I am not able to deploy File System. Any help, is highly appreciated. Thanks Raj --- Shashiraj Jeripotula(Raj) DMTS Systems Engineering, Internet Software and Technology Group | Verizon Corporate Technology ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] evaluating Ceph
Hi, Use ceph mds newfs {metaid} {dataid} instead JC On Nov 25, 2014, at 12:27, Jeripotula, Shashiraj shashiraj.jeripot...@verizon.com wrote: Hi All, I am evaluating Ceph for one of our product requirements. I have gone through the website, http://ceph.com/docs/master/start/ http://ceph.com/docs/master/start/ I am using Ubuntu 14.04 LTS and am done with most of the steps. Finally, I am struck on Creating File System. From the website, The ceph fs new command was introduced in Ceph 0.84. Prior to this release, no manual steps are required to create a filesystem, and pools named data and metadata exist by default. Even though, I did update from ceph-deploy and directly from ceph, ceph -v ceph version 0.80.7(6c0127fcb58008793d3c8b62d925bc91963672a3) I am not able to upgrade/update to .84. Everytime, it says, you are already at the latest version, so I am not able to deploy File System. Any help, is highly appreciated. Thanks Raj --- Shashiraj Jeripotula(Raj) DMTS Systems Engineering, Internet Software and Technology Group | Verizon Corporate Technology ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] evaluating Ceph
HI JC, I tried, sysenguser@blade3:~$ ceph mds newfs cephfs_metadata cephfs_data Invalid command: cephfs_metadata doesn't represent an int mds newfs int[0-] int[0-] {--yes-i-really-mean-it} : make new filesystom using pools metadata and data Error EINVAL: invalid command Here is the original command, that I used, sysenguser@blade3:~$ ceph fs new cephfs cephfs_metadata cephfs_data no valid command found; 10 closest matches: fsid Error EINVAL: invalid command Here is my version, sysenguser@blade3:~$ ceph --version ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) Please advise. Thanks Raj From: Jean-Charles LOPEZ [mailto:jc.lo...@inktank.com] Sent: Tuesday, November 25, 2014 12:52 PM To: Jeripotula, Shashiraj Cc: LOPEZ Jean-Charles; ceph-users@lists.ceph.com Subject: Re: [ceph-users] evaluating Ceph Hi, Use ceph mds newfs {metaid} {dataid} instead JC On Nov 25, 2014, at 12:27, Jeripotula, Shashiraj shashiraj.jeripot...@verizon.commailto:shashiraj.jeripot...@verizon.com wrote: Hi All, I am evaluating Ceph for one of our product requirements. I have gone through the website, http://ceph.com/docs/master/start/ I am using Ubuntu 14.04 LTS and am done with most of the steps. Finally, I am struck on Creating File System. From the website, The ceph fs new command was introduced in Ceph 0.84. Prior to this release, no manual steps are required to create a filesystem, and pools named data and metadata exist by default. Even though, I did update from ceph-deploy and directly from ceph, ceph -v ceph version 0.80.7(6c0127fcb58008793d3c8b62d925bc91963672a3) I am not able to upgrade/update to .84. Everytime, it says, you are already at the latest version, so I am not able to deploy File System. Any help, is highly appreciated. Thanks Raj --- Shashiraj Jeripotula(Raj) DMTS Systems Engineering, Internet Software and Technology Group | Verizon Corporate Technology ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
Thanks for the advise! I've checked a couple of my Intel 520s which I use for the osd journals and have been using them for almost 2 years now. I do not have a great deal of load though. Only have about 60vms or so which have a general usage. Disk 1: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5754781 Disk 2: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 233 Media_Wearout_Indicator 0x0032 095 095 000 Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5697133 So, from what I can see, I still have 95 and 96 percent left on the disks and they have done around 190 TeraBytes, which seems like a lot for a consumer grade disk. Or maybe I am reading the data wrongly? Thanks Andrei - Original Message - From: Michael Kuriger mk7...@yp.com To: Mark Nelson mark.nel...@inktank.com, ceph-users@lists.ceph.com Sent: Tuesday, 25 November, 2014 5:12:20 PM Subject: Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals My cluster is actually very fast without SSD drives. Thanks for the advice! Michael Kuriger mk7...@yp.com 818-649-7235 MikeKuriger (IM) On 11/25/14, 7:49 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/25/2014 09:41 AM, Erik Logtenberg wrote: If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. This is really good advice, and it's not just the Intel 530s. Most consumer grade SSDs have pretty low write endurance. If you mostly are doing reads from your cluster you may be OK, but if you have even moderately high write workloads and you care about avoiding OSD downtime (which in a production cluster is pretty important though not usually 100% critical), get high write endurance SSDs. Mark Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listin fo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOS ncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaT ojJmuDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf o.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc m6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaTojJ muDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals
FWIW, I've got Intel 520s in one of our test nodes at Inktank that has a fair amount of data thrown at it and we haven't lost a drive in 2 years. Having said that, I'd use higher write endurance drives in production, especially with how much cheaper they are getting these days. Mark On 11/25/2014 03:25 PM, Andrei Mikhailovsky wrote: Thanks for the advise! I've checked a couple of my Intel 520s which I use for the osd journals and have been using them for almost 2 years now. I do not have a great deal of load though. Only have about 60vms or so which have a general usage. Disk 1: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 233 Media_Wearout_Indicator 0x0032 096 096 000Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000Old_age Always - 5754781 Disk 2: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 233 Media_Wearout_Indicator 0x0032 095 095 000Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000Old_age Always - 5697133 So, from what I can see, I still have 95 and 96 percent left on the disks and they have done around 190 TeraBytes, which seems like a lot for a consumer grade disk. Or maybe I am reading the data wrongly? Thanks Andrei *From: *Michael Kuriger mk7...@yp.com *To: *Mark Nelson mark.nel...@inktank.com, ceph-users@lists.ceph.com *Sent: *Tuesday, 25 November, 2014 5:12:20 PM *Subject: *Re: [ceph-users] Tip of the week: don't use Intel 530 SSD's for journals My cluster is actually very fast without SSD drives. Thanks for the advice! Michael Kuriger mk7...@yp.com 818-649-7235 MikeKuriger (IM) On 11/25/14, 7:49 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/25/2014 09:41 AM, Erik Logtenberg wrote: If you are like me, you have the journals for your OSD's with rotating media stored separately on an SSD. If you are even more like me, you happen to use Intel 530 SSD's in some of your hosts. If so, please do check your S.M.A.R.T. statistics regularly, because these SSD's really can't cope with Ceph. Check out the media-wear graphs for the two Intel 530's in my cluster. As soon as those declining lines get down to 30% or so, they need to be replaced. That means less than half a year between purchase and end-of-life :( Tip of the week, keep an eye on those statistics, don't let a failing SSD surprise you. This is really good advice, and it's not just the Intel 530s. Most consumer grade SSDs have pretty low write endurance. If you mostly are doing reads from your cluster you may be OK, but if you have even moderately high write workloads and you care about avoiding OSD downtime (which in a production cluster is pretty important though not usually 100% critical), get high write endurance SSDs. Mark Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listin fo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOS ncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaT ojJmuDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf o.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc m6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaTojJ muDFDpQs=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3Ie= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] evaluating Ceph
The two numbers (ints) are meant to the ids of the pools you have created for data and meta data. Assuming you have already created the pools, run ceph osd lspools and use the numbers from there to create the FS From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeripotula, Shashiraj Sent: 25 November 2014 22:30 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] evaluating Ceph Hi All, Can anyone, help here. I am pretty struck on creating File System. Thanks in advance. Raj From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeripotula, Shashiraj Sent: Tuesday, November 25, 2014 1:03 PM To: Jean-Charles LOPEZ Cc: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] evaluating Ceph HI JC, I tried, sysenguser@blade3:~$ ceph mds newfs cephfs_metadata cephfs_data Invalid command: cephfs_metadata doesn't represent an int mds newfs int[0-] int[0-] {--yes-i-really-mean-it} : make new filesystom using pools metadata and data Error EINVAL: invalid command Here is the original command, that I used, sysenguser@blade3:~$ ceph fs new cephfs cephfs_metadata cephfs_data no valid command found; 10 closest matches: fsid Error EINVAL: invalid command Here is my version, sysenguser@blade3:~$ ceph --version ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) Please advise. Thanks Raj From: Jean-Charles LOPEZ [mailto:jc.lo...@inktank.com] Sent: Tuesday, November 25, 2014 12:52 PM To: Jeripotula, Shashiraj Cc: LOPEZ Jean-Charles; ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] evaluating Ceph Hi, Use ceph mds newfs {metaid} {dataid} instead JC On Nov 25, 2014, at 12:27, Jeripotula, Shashiraj shashiraj.jeripot...@verizon.com mailto:shashiraj.jeripot...@verizon.com wrote: Hi All, I am evaluating Ceph for one of our product requirements. I have gone through the website, http://ceph.com/docs/master/start/ http://ceph.com/docs/master/start/ I am using Ubuntu 14.04 LTS and am done with most of the steps. Finally, I am struck on Creating File System. From the website, The ceph fs new command was introduced in Ceph 0.84. Prior to this release, no manual steps are required to create a filesystem, and pools named data and metadata exist by default. Even though, I did update from ceph-deploy and directly from ceph, ceph -v ceph version 0.80.7(6c0127fcb58008793d3c8b62d925bc91963672a3) I am not able to upgrade/update to .84. Everytime, it says, you are already at the latest version, so I am not able to deploy File System. Any help, is highly appreciated. Thanks Raj --- Shashiraj Jeripotula(Raj) DMTS Systems Engineering, Internet Software and Technology Group | Verizon Corporate Technology ___ ceph-users mailing list mailto:ceph-users@lists.ceph.com ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] evaluating Ceph
It looks to me like you need to supply it the *ids* of the pools not their names. So do: $ ceph osd dump # (or lspools) note down the ids of the pools you want to use (suppose I have cephfs_data 10 and cepfs_metadata 12): $ ceph mds newfs 10 12 --yes-i-really-mean-it On 26/11/14 11:30, Jeripotula, Shashiraj wrote: sysenguser@blade3:~$ *ceph mds newfs cephfs_metadata cephfs_data* Invalid command: cephfs_metadata doesn't represent an int mds newfs int[0-] int[0-] {--yes-i-really-mean-it} : make new filesystom using pools metadata and data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] evaluating Ceph
Thanks Nick and Mark, I was able to run with id's and yes-i-really-mean-it, sysenguser@blade3:~$ ceph mds newfs 3 4 --yes-i-really-mean-it new fs with metadata pool 3 and data pool 4 Regards Raj -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Kirkwood Sent: Tuesday, November 25, 2014 2:57 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] evaluating Ceph ...and get the order right, sorry: $ ceph mds newfs 12 10 --yes-i-really-mean-it (metadata pool then data one) On 26/11/14 11:47, Mark Kirkwood wrote: note down the ids of the pools you want to use (suppose I have cephfs_data 10 and cepfs_metadata 12): $ ceph mds newfs 10 12 --yes-i-really-mean-it ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Create OSD on ZFS Mount (firefly)
There was a good thread on the mailing list a little while ago. There were several recommendations in that thread, maybe some of them will help. Found it: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: Testing ceph on top of ZFS (zfsonlinux), kernel driver. - Have created ZFS mount: /var/lib/ceph/osd/ceph-0 - followed the instructions at: http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/ failing on the step 4. Initialize the OSD data directory. ceph-osd -i 0 --mkfs --mkkey 2014-11-25 22:12:26.563666 7ff12b466780 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument Is this supported? thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Create OSD on ZFS Mount (firefly)
Thanks Craig, I had a good read of it - from what I read, the std ceph packages should work with zfs, just not make use of its extra features (writeparallel support), whose performance was not all that good anyway. I did note the set xattr to sa comment which gave me a different error:) ceph-osd -i 0 --mkfs --mkkey 2014-11-26 10:51:33.559288 7fd10544c780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2014-11-26 10:51:33.559355 7fd10544c780 -1 journal check: ondisk fsid ---- doesn't match expected c064615f-d692-4eb0-9211-a26dcb186478, invalid (someone else's?) journal 2014-11-26 10:51:33.559405 7fd10544c780 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2014-11-26 10:51:33.559430 7fd10544c780 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2014-11-26 10:51:33.559505 7fd10544c780 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument nb. The command actually succeeds if I unmount zfs and use the underlying ext4 system. This is on a proxmox (debian wheezy) box: ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) zfs: 0.6.3-1~wheezy kernel: 3.10 thanks, On 26 November 2014 at 09:43, Craig Lewis cle...@centraldesktop.com wrote: There was a good thread on the mailing list a little while ago. There were several recommendations in that thread, maybe some of them will help. Found it: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: Testing ceph on top of ZFS (zfsonlinux), kernel driver. - Have created ZFS mount: /var/lib/ceph/osd/ceph-0 - followed the instructions at: http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/ failing on the step 4. Initialize the OSD data directory. ceph-osd -i 0 --mkfs --mkkey 2014-11-25 22:12:26.563666 7ff12b466780 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument Is this supported? thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Create OSD on ZFS Mount (firefly)
I've found the cause of the problem - ceph was attempting to create the journal with direct io which zfs doesn't support. I worked round it by disabling journal dio in ceph.config [osd] journal dio = false Dunno if this is a good idea or not or whether there is a better way of doing it :_ On 26 November 2014 at 09:43, Craig Lewis cle...@centraldesktop.com wrote: There was a good thread on the mailing list a little while ago. There were several recommendations in that thread, maybe some of them will help. Found it: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg14154.html On Tue, Nov 25, 2014 at 4:16 AM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: Testing ceph on top of ZFS (zfsonlinux), kernel driver. - Have created ZFS mount: /var/lib/ceph/osd/ceph-0 - followed the instructions at: http://ceph.com/docs/firefly/rados/operations/add-or-rm-osds/ failing on the step 4. Initialize the OSD data directory. ceph-osd -i 0 --mkfs --mkkey 2014-11-25 22:12:26.563666 7ff12b466780 -1 filestore(/var/lib/ceph/osd/ceph-0) mkjournal error creating journal on /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument 2014-11-25 22:12:26.563691 7ff12b466780 -1 OSD::mkfs: ObjectStore::mkfs failed with error -22 2014-11-25 22:12:26.563765 7ff12b466780 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument Is this supported? thanks, -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSDs down and out of cluster
Hello, We are running 6-node ceph cluster version 0.80.7 operating system centos 7. The osd's on one node are not gettng marked up and in. I have started/restarted the osd's couple of times with no luck. All the osds have the following message: 2014-11-25 08:36:04.150120 7f9ff676f700 0 -- 192.168.60.2:6804/43692 192.168.60.4:6803/420005544 pipe(0x3720780 sd=61 :45043 s=1 pgs=0 cs=0 l=0 c=0x171f0160).connect claims to be 192.168.60.4:6803/4810 not 192.168.60.4:6803/420005544 - wrong node! 2014-11-25 08:36:04.150162 7f9ff676f700 0 -- 192.168.60.2:6804/43692 192.168.60.4:6803/420005544 pipe(0x3720780 sd=61 :45043 s=1 pgs=0 cs=0 l=0 c=0x171f0160).fault with nothing to send, going to standby 2014-11-25 08:36:04.152692 7f9ff666e700 0 -- 192.168.60.2:6804/43692 192.168.60.5:6840/231014105 pipe(0x3720c80 sd=63 :0 s=1 pgs=0 cs=0 l=0 c=0x171f0580).fault with nothing to send, going to standby 2014-11-25 08:36:04.156785 7f9ff656d700 0 -- 192.168.60.2:6804/43692 192.168.60.3:6800/229013836 pipe(0x3721180 sd=65 :46104 s=1 pgs=0 cs=0 l=0 c=0x171f09a0).connect claims to be 192.168.60.3:6800/4073 not 192.168.60.3:6800/229013836 - wrong node! 2014-11-25 08:36:04.156842 7f9ff656d700 0 -- 192.168.60.2:6804/43692 192.168.60.3:6800/229013836 pipe(0x3721180 sd=65 :46104 s=1 pgs=0 cs=0 l=0 c=0x171f09a0).fault with nothing to send, going to standby I have tried starting one osd on this node with debug osd= 20 and seeing the following messages repeatedly in the log file. 2014-11-26 13:31:44.787520 7fdaa3f52700 10 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] handle_activate_map 2014-11-26 13:31:44.787531 7fdaa3f52700 7 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] stateStarted/Primary: handle ActMap primary 2014-11-26 13:31:44.787539 7fdaa3f52700 15 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] publish_stats_to_osd 11521:21834 2014-11-26 13:31:44.787547 7fdaa3f52700 10 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] take_waiters 2014-11-26 13:31:44.787553 7fdaa3751700 20 osd.7 12403 get_map 11523 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787553 7fdaa3f52700 20 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] handle_activate_map: Not dirtying info: last_persisted is 11509 while current is 11521 2014-11-26 13:31:44.787559 7fdaa3f52700 10 osd.7 12403 advance_pg advanced by max 200 past min epoch 11521 ... will requeue pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] 2014-11-26 13:31:44.787565 7fdaa3f52700 10 osd.7 pg_epoch: 11521 pg[10.118( v 11165'3650 (347'649,11165'3650] local-les=11509 n=985 ec=342 les/c 11509/11509 11508/11508/11508) [7,6,41] r=0 lpr=11509 crt=11165'3647 lcod 0'0 mlcod 0'0 peering] null 2014-11-26 13:31:44.787573 7fdaa3f52700 10 log is not dirty 2014-11-26 13:31:44.787574 7fdaa3751700 20 osd.7 12403 get_map 11524 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787597 7fdaa3751700 20 osd.7 12403 get_map 11525 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787621 7fdaa3751700 20 osd.7 12403 get_map 11526 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787644 7fdaa3751700 20 osd.7 12403 get_map 11527 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787667 7fdaa3751700 20 osd.7 12403 get_map 11528 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787690 7fdaa3751700 20 osd.7 12403 get_map 11529 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787714 7fdaa3751700 20 osd.7 12403 get_map 11530 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787737 7fdaa3751700 20 osd.7 12403 get_map 11531 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787760 7fdaa3751700 20 osd.7 12403 get_map 11532 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787783 7fdaa3751700 20 osd.7 12403 get_map 11533 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787806 7fdaa3751700 20 osd.7 12403 get_map 11534 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787829 7fdaa3751700 20 osd.7 12403 get_map 11535 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787853 7fdaa3751700 20 osd.7 12403 get_map 11536 - loading and decoding 0x29d5800 2014-11-26 13:31:44.787876 7fdaa3751700 20 osd.7 12403 get_map 11537
[ceph-users] Question about mount the same rbd in different machine
Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OCFS2 on RBD
Hi all, I have a requirement for a highly available, high performance storage environment to serve data for webheads. (lots of reads on small files, limited writes) Having looked at all available options, OCFS2 on top of RBD appears to be the only solution that meets all my needs. I have used OCFS2 on top of an ISCSI SAN in the past with great success. Also, Ceph is is sexy. I would have like to try CephFS but the issue with no redundant / HA MD servers means it doesn't tick my boxes right now. I built a Ceph system (three nodes with 3 1TB OSD's). The nodes are KVM Virtual Machines with storage and networking delivered via Infiniband on the hypervisor/host level. The VM's use VirtIO to access storage and networking. I mounted an RBD on some different nodes, ran up OCFS2 on top of that, copied some files onto the mount, and everything worked just peachy up to this point. I then proceeded to rsync some files (about 7 files worth about 15GB) across from a remote system which isn't very close or fast, and things broke about half way through. At some point, things started hanging, and the machine used to rsync (an OCFS2 node) spontaneously rebooted. This is probably because when OCFS2 doesn't like something, it will panic and reboot (this can of course be configured). The actual node that rebooted didn't produce any useful logs, but this showed up on a different OCFS2 node: (Node 90 is the node that was doing the rsyncing and rebooting): [Mon Nov 24 04:00:26 2014] libceph: loaded (mon/osd proto 15/24) [Mon Nov 24 04:00:26 2014] rbd: loaded rbd (rados block device) [Mon Nov 24 04:00:26 2014] libceph: client4305 fsid b157f972-771b-47aa-826a-3bc97cf8d853 [Mon Nov 24 04:00:26 2014] libceph: mon0 192.168.5.31:6789 session established [Mon Nov 24 04:00:26 2014] rbd1: unknown partition table [Mon Nov 24 04:00:26 2014] rbd: rbd1: added with size 0x19 [Mon Nov 24 04:15:27 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 04:25:30 2014] OCFS2 Node Manager 1.5.0 [Mon Nov 24 04:25:30 2014] OCFS2 DLM 1.5.0 [Mon Nov 24 04:25:30 2014] ocfs2: Registered cluster interface o2cb [Mon Nov 24 04:25:30 2014] OCFS2 DLMFS 1.5.0 [Mon Nov 24 04:25:30 2014] OCFS2 User DLM kernel interface loaded [Mon Nov 24 04:30:27 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 04:39:49 2014] OCFS2 1.5.0 [Mon Nov 24 04:39:49 2014] o2dlm: Joining domain A04D7A74D30641B595E50B8C0CAAA157 ( 21 ) 1 nodes [Mon Nov 24 04:39:49 2014] JBD2: Ignoring recovery information on journal [Mon Nov 24 04:39:49 2014] ocfs2: Mounting device (251,0) on (node 21, slot 0) with ordered data mode. [Mon Nov 24 04:39:55 2014] o2net: Accepted connection from node phptst01 (num 80) at 192.168.5.80: [Mon Nov 24 04:39:57 2014] o2net: Accepted connection from node wwwtst01 (num 90) at 192.168.5.90: [Mon Nov 24 04:40:02 2014] o2dlm: Node 80 joins domain A04D7A74D30641B595E50B8C0CAAA157 ( 21 80 ) 2 nodes [Mon Nov 24 04:40:04 2014] o2dlm: Node 90 joins domain A04D7A74D30641B595E50B8C0CAAA157 ( 21 80 90 ) 3 nodes [Mon Nov 24 04:44:58 2014] o2dlm: Node 90 leaves domain A04D7A74D30641B595E50B8C0CAAA157 ( 21 80 ) 2 nodes [Mon Nov 24 04:45:00 2014] o2net: Connection to node wwwtst01 (num 90) at 192.168.5.90: shutdown, state 8 [Mon Nov 24 04:45:00 2014] o2net: No longer connected to node wwwtst01 (num 90) at 192.168.5.90: [Mon Nov 24 04:45:30 2014] o2net: Accepted connection from node wwwtst01 (num 90) at 192.168.5.90: [Mon Nov 24 04:45:34 2014] o2dlm: Node 90 joins domain A04D7A74D30641B595E50B8C0CAAA157 ( 21 80 90 ) 3 nodes [Mon Nov 24 04:54:49 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 05:09:49 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 05:24:50 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 05:39:50 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 05:54:50 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 06:09:51 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 06:24:51 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 06:39:51 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 06:54:51 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 07:09:52 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 07:24:52 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 07:39:52 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 07:54:52 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 08:09:53 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 08:24:53 2014] libceph: osd2 192.168.5.31:6800 socket closed (con state OPEN) [Mon Nov 24 08:39:53 2014] libceph: osd2
Re: [ceph-users] Question about mount the same rbd in different machine
You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about mount the same rbd in different machine
hi, But i have touched the same file on the two machine under the same rbi with no error. will it cause some problem or just not suggested but can do? On Nov 26, 2014, at 12:08, Michael Kuriger mk7...@yp.com wrote: You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about mount the same rbd in different machine
Each server mounting the rbd device thinks it's the only server writing to it. They are not aware of the other server and therefore will overwrite and corrupt the filesystem as soon as each server writes a file. -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:11 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine hi, But i have touched the same file on the two machine under the same rbi with no error. will it cause some problem or just not suggested but can do? On Nov 26, 2014, at 12:08, Michael Kuriger mk7...@yp.com wrote: You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about mount the same rbd in different machine
Hi Michael, I write the same file with different content, and there is no hint for overwrite, so when the corrupt will appear? On Nov 26, 2014, at 12:23, Michael Kuriger mk7...@yp.com wrote: Each server mounting the rbd device thinks it's the only server writing to it. They are not aware of the other server and therefore will overwrite and corrupt the filesystem as soon as each server writes a file. -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:11 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine hi, But i have touched the same file on the two machine under the same rbi with no error. will it cause some problem or just not suggested but can do? On Nov 26, 2014, at 12:08, Michael Kuriger mk7...@yp.com wrote: You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about mount the same rbd in different machine
I cannot go into detail about how or where your particular system is writing files. All I can reiterate is that bbd images can only be mounted to one host at a time, unless you're using a cluster aware file system. Hope that helps! -Mike -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:27 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine Hi Michael, I write the same file with different content, and there is no hint for overwrite, so when the corrupt will appear? On Nov 26, 2014, at 12:23, Michael Kuriger mk7...@yp.com wrote: Each server mounting the rbd device thinks it's the only server writing to it. They are not aware of the other server and therefore will overwrite and corrupt the filesystem as soon as each server writes a file. -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:11 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine hi, But i have touched the same file on the two machine under the same rbi with no error. will it cause some problem or just not suggested but can do? On Nov 26, 2014, at 12:08, Michael Kuriger mk7...@yp.com wrote: You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question about mount the same rbd in different machine
Thanks very much,Mike ! On Nov 26, 2014, at 12:42, Michael Kuriger mk7...@yp.com wrote: I cannot go into detail about how or where your particular system is writing files. All I can reiterate is that bbd images can only be mounted to one host at a time, unless you're using a cluster aware file system. Hope that helps! -Mike -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:27 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine Hi Michael, I write the same file with different content, and there is no hint for overwrite, so when the corrupt will appear? On Nov 26, 2014, at 12:23, Michael Kuriger mk7...@yp.com wrote: Each server mounting the rbd device thinks it's the only server writing to it. They are not aware of the other server and therefore will overwrite and corrupt the filesystem as soon as each server writes a file. -Original Message- From: mail list [mailto:louis.hust...@gmail.com] Sent: Tuesday, November 25, 2014 8:11 PM To: Michael Kuriger Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Question about mount the same rbd in different machine hi, But i have touched the same file on the two machine under the same rbi with no error. will it cause some problem or just not suggested but can do? On Nov 26, 2014, at 12:08, Michael Kuriger mk7...@yp.com wrote: You can't write from 2 nodes mounted to the same rbd at the same time without a cluster aware file system. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mail list Sent: Tuesday, November 25, 2014 7:30 PM To: ceph-us...@ceph.com Subject: [ceph-users] Question about mount the same rbd in different machine Hi, all I create a rbd named foo, and then map it and mount on two different machine, and when i touch a file on the machine A, machine B can not see the new file, and machine B can also touch a same file! I want to know if the rbd the same on machine A and B? or exactly they are two rbd? Any idea will be appreciate! ___ ceph-users mailing list ceph-users@lists.ceph.com https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AAICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=Xr6ISOkb1Lr6Bkl1PP_iLNTxnz38NvS2tI3k_MDbARAs=IFerPncrFNxmE2_vwEv8X-XVezPfATstmGR6FL8jmaUe= ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com