[ceph-users] Restricting access of a users to only objects of a specific bucket
Added appropriate subject On Fri, Aug 5, 2016 at 10:23 AM, Parveen Sharmawrote: > Have a cluster and I want a radosGW user to have access on a bucket > objects only like /* but user should not be able to create new > or remove this bucket > > > > - > Parveen Kumar Sharma > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] (no subject)
Have a cluster and I want a radosGW user to have access on a bucket objects only like /* but user should not be able to create new or remove this bucket - Parveen Kumar Sharma ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Advice on migrating from legacy tunables to Jewel tunables.
Dear cephers... I am looking for some advice on migrating from legacy tunables to Jewel tunables. What would be the best strategy? 1) A step by step approach? - starting with the transition from bobtail to firefly (and, in this particular step, by starting to set setting chooseleaf_vary_r=5 and then decrease it slowly to 1?) - then from firefly to hammer - then from hammer to jewel 2) or going directly to jewel tunables? Any advise on how to minimize the data movement? TIA Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Scst-devel] Thin Provisioning and Ceph RBD's
On Wed, Aug 3, 2016 at 10:54 AM, Alex Gorbachevwrote: > On Wed, Aug 3, 2016 at 9:59 AM, Alex Gorbachev > wrote: >> On Tue, Aug 2, 2016 at 10:49 PM, Vladislav Bolkhovitin wrote: >>> Alex Gorbachev wrote on 08/02/2016 07:56 AM: On Tue, Aug 2, 2016 at 9:56 AM, Ilya Dryomov wrote: > On Tue, Aug 2, 2016 at 3:49 PM, Alex Gorbachev > wrote: >> On Mon, Aug 1, 2016 at 11:03 PM, Vladislav Bolkhovitin >> wrote: >>> Alex Gorbachev wrote on 08/01/2016 04:05 PM: Hi Ilya, On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov wrote: > On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev > wrote: >> RBD illustration showing RBD ignoring discard until a certain >> threshold - why is that? This behavior is unfortunately incompatible >> with ESXi discard (UNMAP) behavior. >> >> Is there a way to lower the discard sensitivity on RBD devices? >> >> >> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28 >> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >> print SUM/1024 " KB" }' >> 819200 KB >> >> root@e1:/var/log# blkdiscard -o 0 -l 4096 /dev/rbd28 >> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >> print SUM/1024 " KB" }' >> 782336 KB > > Think about it in terms of underlying RADOS objects (4M by default). > There are three cases: > > discard range | command > - > whole object| delete > object's tail | truncate > object's head | zero > > Obviously, only delete and truncate free up space. In all of your > examples, except the last one, you are attempting to discard the head > of the (first) object. > > You can free up as little as a sector, as long as it's the tail: > > OffsetLength Type > 0 4194304 data > > # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28 > > OffsetLength Type > 0 4193792 data Looks like ESXi is sending in each discard/unmap with the fixed granularity of 8192 sectors, which is passed verbatim by SCST. There is a slight reduction in size via rbd diff method, but now I understand that actual truncate only takes effect when the discard happens to clip the tail of an image. So far looking at https://kb.vmware.com/selfservice/microsites/search.do?language=en_US=displayKC=2057513 ...the only variable we can control is the count of 8192-sector chunks and not their size. Which means that most of the ESXi discard commands will be disregarded by Ceph. Vlad, is 8192 sectors coming from ESXi, as in the debug: Aug 1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector 1342099456, nr_sects 8192) >>> >>> Yes, correct. However, to make sure that VMware is not (erroneously) >>> enforced to do this, you need to perform one more check. >>> >>> 1. Run cat /sys/block/rbd28/queue/discard*. Ceph should report here >>> correct granularity and alignment (4M, I guess?) >> >> This seems to reflect the granularity (4194304), which matches the >> 8192 pages (8192 x 512 = 4194304). However, there is no alignment >> value. >> >> Can discard_alignment be specified with RBD? > > It's exported as a read-only sysfs attribute, just like > discard_granularity: > > # cat /sys/block/rbd0/discard_alignment > 4194304 Ah thanks Ilya, it is indeed there. Vlad, your email says to look for discard_alignment in /sys/block//queue, but for RBD it's in /sys/block/ - could this be the source of the issue? >>> >>> No. As you can see below, the alignment reported correctly. So, this must >>> be VMware >>> issue, because it is ignoring the alignment parameter. You can try to align >>> your VMware >>> partition on 4M boundary, it might help. >> >> Is this not a mismatch: >> >> - From sg_inq: Unmap granularity alignment: 8192 >> >> - From "cat /sys/block/rbd0/discard_alignment": 4194304 >> >> I am compiling the latest SCST trunk now. > > Scratch that, please, I just did a test that shows correct calculation > of 4MB in sectors. > > - On iSCSI client node: > > dd if=/dev/urandom of=/dev/sdf bs=1M count=800 > blkdiscard -o 0 -l 4194304 /dev/sdf > > - On iSCSI server node: > > Aug 3 10:50:57 e1 kernel: [ 893.444538] [1381]: > vdisk_unmap_range:3832:Discarding (start_sector
[ceph-users] 答复: Bad performance when two fio write to the same image
Hi Jason Thanks for your information -邮件原件- 发件人: Jason Dillaman [mailto:jdill...@redhat.com] 发送时间: 2016年8月4日 19:49 收件人: Alexandre DERUMIER抄送: Zhiyuan Wang ; ceph-users 主题: Re: [ceph-users] Bad performance when two fio write to the same image With exclusive-lock, only a single client can have write access to the image at a time. Therefore, if you are using multiple fio processes against the same image, they will be passing the lock back and forth between each other and you can expect bad performance. If you have a use-case where you really need to share the same image between multiple concurrent clients, you will need to disable the exclusive-lock feature (this can be done with the RBD cli on existing images or by passing "--image-shared" when creating new images). On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER wrote: > Hi, > > I think this is because of exclusive-lock feature enabled by default > since jessie on rbd image > > > - Mail original - > De: "Zhiyuan Wang" > À: "ceph-users" > Envoyé: Jeudi 4 Août 2016 11:37:04 > Objet: [ceph-users] Bad performance when two fio write to the same > image > > > > Hi Guys > > I am testing the performance of Jewel (10.2.2) with FIO, but found the > performance would drop dramatically when two process write to the same image. > > My environment: > > 1. Server: > > One mon and four OSDs running on the same server. > > Intel P3700 400GB SSD which have 4 partitions, and each for one osd > journal (journal size is 10GB) > > Inter P3700 400GB SSD which have 4 partitions, and each format to XFS > for one osd data (each data is 90GB) > > 10GB network > > CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) > > Memory: 256GB (it is not the bottleneck) > > 2. Client > > 10GB network > > CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) > > Memory: 256GB (it is not the bottleneck) > > 3. Ceph > > Default configuration expect use async messager (have tried simple > messager, got nearly the same result) > > 10GB image with 256 pg num > > Test Case > > 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; > randwrite > > The performance is nearly 60MB/s and IOPS is nearly 15K > > Four osd are nearly the same busy > > 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; > randwrite (write to the same image) > > The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each > Terrible > > And I found that only one osd is busy, the other three are much more > idle on CPU > > And I also run FIO on two clients, the same result > > 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd > randwrite (one to image1, one to image2) > > The performance is nearly 35MB/s each and IOPS is nearly 8.5K each > Reasonable > > Four osd are nearly the same busy > > > > > > Could someone help to explain the reason of TEST 2 > > > > Thanks > > > Email Disclaimer & Confidentiality Notice > > This message is confidential and intended solely for the use of the recipient > to whom they are addressed. If you are not the intended recipient you should > not deliver, distribute or copy this e-mail. Please notify the sender > immediately by e-mail and delete this e-mail from your system. Copyright © > 2016 by Istuary Innovation Labs, Inc. All rights reserved. > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason Email Disclaimer & Confidentiality Notice This message is confidential and intended solely for the use of the recipient to whom they are addressed. If you are not the intended recipient you should not deliver, distribute or copy this e-mail. Please notify the sender immediately by e-mail and delete this e-mail from your system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights reserved. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Ok i will try without creating by myself Never the less thanks a lot Christian for your patience, i will try more clever questions when i'm ready for them Le 5 août 2016 02:44, "Christian Balzer"a écrit : Hello, On Fri, 5 Aug 2016 02:41:47 +0200 Guillaume Comte wrote: > Maybe you are mispelling, but in the docs they dont use white space but : > this is quite misleading if it works > I'm quoting/showing "ceph-disk", which is called by ceph-deploy, which indeed uses a ":". Christian > Le 5 août 2016 02:30, "Christian Balzer" a écrit : > > > > > Hello, > > > > On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote: > > > > > I am reading half your answer > > > > > > Do you mean that ceph will create by itself the partitions for the > > journal? > > > > > Yes, "man ceph-disk". > > > > > If so its cool and weird... > > > > > It can be very weird indeed. > > If sdc is your data (OSD) disk and sdb your journal device then: > > > > "ceph-disk prepare /dev/sdc /dev/sdb1" > > will not work, but: > > > > "ceph-disk prepare /dev/sdc /dev/sdb" > > will and create a journal partition on sdb. > > However you have no control over numbering or positioning this way. > > > > Christian > > > > > Le 5 août 2016 02:01, "Christian Balzer" a écrit : > > > > > > > > > > > Hello, > > > > > > > > you need to work on your google skills. ^_- > > > > > > > > I wrote about his just yesterday and if you search for "ceph-deploy > > wrong > > > > permission" the second link is the issue description: > > > > http://tracker.ceph.com/issues/13833 > > > > > > > > So I assume your journal partitions are either pre-made or non-GPT. > > > > > > > > Christian > > > > > > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > > > > > > > Hi All, > > > > > > > > > > With ceph jewel, > > > > > > > > > > I'm pretty stuck with > > > > > > > > > > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > > > > > > > Because when i specify a journal path like this: > > > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > > > > And then: > > > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > > > > I end up with "wrong permission" on the osd when activating, > > complaining > > > > > about "tmp" directory where the files are owned by root, and it > > seems it > > > > > tryes to do stuff as ceph user. > > > > > > > > > > It works when i don't specify a separate journal > > > > > > > > > > Any idea of what i'm doing wrong ? > > > > > > > > > > thks > > > > > > > > > > > > -- > > > > Christian BalzerNetwork/Systems Engineer > > > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > > > http://www.gol.com/ > > > > > > > > > > -- > > Christian BalzerNetwork/Systems Engineer > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Hello, On Fri, 5 Aug 2016 02:41:47 +0200 Guillaume Comte wrote: > Maybe you are mispelling, but in the docs they dont use white space but : > this is quite misleading if it works > I'm quoting/showing "ceph-disk", which is called by ceph-deploy, which indeed uses a ":". Christian > Le 5 août 2016 02:30, "Christian Balzer"a écrit : > > > > > Hello, > > > > On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote: > > > > > I am reading half your answer > > > > > > Do you mean that ceph will create by itself the partitions for the > > journal? > > > > > Yes, "man ceph-disk". > > > > > If so its cool and weird... > > > > > It can be very weird indeed. > > If sdc is your data (OSD) disk and sdb your journal device then: > > > > "ceph-disk prepare /dev/sdc /dev/sdb1" > > will not work, but: > > > > "ceph-disk prepare /dev/sdc /dev/sdb" > > will and create a journal partition on sdb. > > However you have no control over numbering or positioning this way. > > > > Christian > > > > > Le 5 août 2016 02:01, "Christian Balzer" a écrit : > > > > > > > > > > > Hello, > > > > > > > > you need to work on your google skills. ^_- > > > > > > > > I wrote about his just yesterday and if you search for "ceph-deploy > > wrong > > > > permission" the second link is the issue description: > > > > http://tracker.ceph.com/issues/13833 > > > > > > > > So I assume your journal partitions are either pre-made or non-GPT. > > > > > > > > Christian > > > > > > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > > > > > > > Hi All, > > > > > > > > > > With ceph jewel, > > > > > > > > > > I'm pretty stuck with > > > > > > > > > > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > > > > > > > Because when i specify a journal path like this: > > > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > > > > And then: > > > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > > > > I end up with "wrong permission" on the osd when activating, > > complaining > > > > > about "tmp" directory where the files are owned by root, and it > > seems it > > > > > tryes to do stuff as ceph user. > > > > > > > > > > It works when i don't specify a separate journal > > > > > > > > > > Any idea of what i'm doing wrong ? > > > > > > > > > > thks > > > > > > > > > > > > -- > > > > Christian BalzerNetwork/Systems Engineer > > > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > > > http://www.gol.com/ > > > > > > > > > > -- > > Christian BalzerNetwork/Systems Engineer > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Maybe you are mispelling, but in the docs they dont use white space but : this is quite misleading if it works Le 5 août 2016 02:30, "Christian Balzer"a écrit : > > Hello, > > On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote: > > > I am reading half your answer > > > > Do you mean that ceph will create by itself the partitions for the > journal? > > > Yes, "man ceph-disk". > > > If so its cool and weird... > > > It can be very weird indeed. > If sdc is your data (OSD) disk and sdb your journal device then: > > "ceph-disk prepare /dev/sdc /dev/sdb1" > will not work, but: > > "ceph-disk prepare /dev/sdc /dev/sdb" > will and create a journal partition on sdb. > However you have no control over numbering or positioning this way. > > Christian > > > Le 5 août 2016 02:01, "Christian Balzer" a écrit : > > > > > > > > Hello, > > > > > > you need to work on your google skills. ^_- > > > > > > I wrote about his just yesterday and if you search for "ceph-deploy > wrong > > > permission" the second link is the issue description: > > > http://tracker.ceph.com/issues/13833 > > > > > > So I assume your journal partitions are either pre-made or non-GPT. > > > > > > Christian > > > > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > > > > > Hi All, > > > > > > > > With ceph jewel, > > > > > > > > I'm pretty stuck with > > > > > > > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > > > > > Because when i specify a journal path like this: > > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > > > And then: > > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > > > I end up with "wrong permission" on the osd when activating, > complaining > > > > about "tmp" directory where the files are owned by root, and it > seems it > > > > tryes to do stuff as ceph user. > > > > > > > > It works when i don't specify a separate journal > > > > > > > > Any idea of what i'm doing wrong ? > > > > > > > > thks > > > > > > > > > -- > > > Christian BalzerNetwork/Systems Engineer > > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > > http://www.gol.com/ > > > > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Hello, On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote: > I am reading half your answer > > Do you mean that ceph will create by itself the partitions for the journal? > Yes, "man ceph-disk". > If so its cool and weird... > It can be very weird indeed. If sdc is your data (OSD) disk and sdb your journal device then: "ceph-disk prepare /dev/sdc /dev/sdb1" will not work, but: "ceph-disk prepare /dev/sdc /dev/sdb" will and create a journal partition on sdb. However you have no control over numbering or positioning this way. Christian > Le 5 août 2016 02:01, "Christian Balzer"a écrit : > > > > > Hello, > > > > you need to work on your google skills. ^_- > > > > I wrote about his just yesterday and if you search for "ceph-deploy wrong > > permission" the second link is the issue description: > > http://tracker.ceph.com/issues/13833 > > > > So I assume your journal partitions are either pre-made or non-GPT. > > > > Christian > > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > > > Hi All, > > > > > > With ceph jewel, > > > > > > I'm pretty stuck with > > > > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > > > Because when i specify a journal path like this: > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > > And then: > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > > I end up with "wrong permission" on the osd when activating, complaining > > > about "tmp" directory where the files are owned by root, and it seems it > > > tryes to do stuff as ceph user. > > > > > > It works when i don't specify a separate journal > > > > > > Any idea of what i'm doing wrong ? > > > > > > thks > > > > > > -- > > Christian BalzerNetwork/Systems Engineer > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
I am reading half your answer Do you mean that ceph will create by itself the partitions for the journal? If so its cool and weird... Le 5 août 2016 02:01, "Christian Balzer"a écrit : > > Hello, > > you need to work on your google skills. ^_- > > I wrote about his just yesterday and if you search for "ceph-deploy wrong > permission" the second link is the issue description: > http://tracker.ceph.com/issues/13833 > > So I assume your journal partitions are either pre-made or non-GPT. > > Christian > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > Hi All, > > > > With ceph jewel, > > > > I'm pretty stuck with > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > Because when i specify a journal path like this: > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > And then: > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > I end up with "wrong permission" on the osd when activating, complaining > > about "tmp" directory where the files are owned by root, and it seems it > > tryes to do stuff as ceph user. > > > > It works when i don't specify a separate journal > > > > Any idea of what i'm doing wrong ? > > > > thks > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fixing NTFS index in snapshot for new and existing clones
Hello! I would like some guidance about how to proceed with a problem inside of a snap which is used to clone images. My sincere apologies if what I am asking isn't possible. I have snapshot which is used to create clones for guest virtual machines. It is a raw object with an NTFS OS contained within it. My understanding is that when you clone the snap, all children become bound to the parent snap via layering. We had a system problem in which I was able to recover almost fully. I could go into details, but I figure if I do, the advice will be to upgrade past dumpling (I can see you shaking your head :D). It is in very short term plan to upgrade. I just want to be sure my cluster is totally as clean as I can make it before I do it. Recently, new clones and old clones started having a problem with the drive inside of windows. It seems to be a NTFS Index issue, which I can fix. (i've exported, verified the fix) So I only have 4 pretty simple questions: 1) Would it be right to assume that if I fix the snapshot NTFS problem, that would 'cascade' to all cloned VMs? If not, I'm assuming I have to repair all clones individually (which I can script). 2) Am I off-base if I think the problem is in the snapshot? Could it be in the source image all along? 3) If there is no relationship with this snap or master image, then am I correct to assume that this is an individual problem on each of these guests? Or is there a source I should look at? 4) Would upgrading to at least firefly resolve this issue? I've run many checks on the cluster and the data seems fully accessible and correct. No inconsistent pages, everything exports, snaps, can be moved. I also have the gdb debugger attached to watch for things which may arise in this version of ceph. I'll be upgrading once I find the answer to this. I have also attempted to ensure the parent/child relationship is intact at HEAD by rolling back to the snap as mentioned on this mailing list in January. Many thanks for your time! -- John Holder Trapp Technology Developer, Linux, & Mail Operations Complacency kills innovation, but ambition kills complacency. Office: 602-443-9145 x2017 On Call Cell: 480-548-3902 Skype: z_jholder Alt-Email: jhol...@brinkster.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Yeah you are right >From what i understand is that using a ceph is a good idea But the fact is that it dont work So i circumvent that by configuring ceph-deploy to use root Was it the main goal, i dont think so Thanks for your answer Le 5 août 2016 02:01, "Christian Balzer"a écrit : > > Hello, > > you need to work on your google skills. ^_- > > I wrote about his just yesterday and if you search for "ceph-deploy wrong > permission" the second link is the issue description: > http://tracker.ceph.com/issues/13833 > > So I assume your journal partitions are either pre-made or non-GPT. > > Christian > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > > > Hi All, > > > > With ceph jewel, > > > > I'm pretty stuck with > > > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > > > Because when i specify a journal path like this: > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > > And then: > > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > > I end up with "wrong permission" on the osd when activating, complaining > > about "tmp" directory where the files are owned by root, and it seems it > > tryes to do stuff as ceph user. > > > > It works when i don't specify a separate journal > > > > Any idea of what i'm doing wrong ? > > > > thks > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question about ceph-deploy osd create
Hello, you need to work on your google skills. ^_- I wrote about his just yesterday and if you search for "ceph-deploy wrong permission" the second link is the issue description: http://tracker.ceph.com/issues/13833 So I assume your journal partitions are either pre-made or non-GPT. Christian On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote: > Hi All, > > With ceph jewel, > > I'm pretty stuck with > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] > > Because when i specify a journal path like this: > ceph-deploy osd prepare ceph-osd1:sdd:sdf7 > And then: > ceph-deploy osd activate ceph-osd1:sdd:sdf7 > I end up with "wrong permission" on the osd when activating, complaining > about "tmp" directory where the files are owned by root, and it seems it > tryes to do stuff as ceph user. > > It works when i don't specify a separate journal > > Any idea of what i'm doing wrong ? > > thks -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performance when two fio write to the same image
If you search through the archives, there's been a couple of other people that have run into this as well with Jewel. With the librbd engine, you are much better using iodepth and/or multiple fio processes vs numjobs. Even pre-jewel, there were gotchas that might not be immediately apparent. If you for instance increase numjobs and do sequential reads, after the first job reads some data, it gets cached on the OSD, and then all subsequent jobs will re-read the same cached data unless you explicitly change the offsets. IE it was probably never a good idea to use numjobs, but now it's really apparent that it's not a good idea. :) Mark On 08/04/2016 03:48 PM, Warren Wang - ISD wrote: Wow, thanks. I think that¹s the tidbit of info I needed to explain why increasing numjobs doesn¹t (anymore) scale performance as expected. Warren Wang On 8/4/16, 7:49 AM, "ceph-users on behalf of Jason Dillaman"wrote: With exclusive-lock, only a single client can have write access to the image at a time. Therefore, if you are using multiple fio processes against the same image, they will be passing the lock back and forth between each other and you can expect bad performance. If you have a use-case where you really need to share the same image between multiple concurrent clients, you will need to disable the exclusive-lock feature (this can be done with the RBD cli on existing images or by passing "--image-shared" when creating new images). On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER wrote: Hi, I think this is because of exclusive-lock feature enabled by default since jessie on rbd image - Mail original - De: "Zhiyuan Wang" À: "ceph-users" Envoyé: Jeudi 4 Août 2016 11:37:04 Objet: [ceph-users] Bad performance when two fio write to the same image Hi Guys I am testing the performance of Jewel (10.2.2) with FIO, but found the performance would drop dramatically when two process write to the same image. My environment: 1. Server: One mon and four OSDs running on the same server. Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal (journal size is 10GB) Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one osd data (each data is 90GB) 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 2. Client 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 3. Ceph Default configuration expect use async messager (have tried simple messager, got nearly the same result) 10GB image with 256 pg num Test Case 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite The performance is nearly 60MB/s and IOPS is nearly 15K Four osd are nearly the same busy 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite (write to the same image) The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible And I found that only one osd is busy, the other three are much more idle on CPU And I also run FIO on two clients, the same result 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite (one to image1, one to image2) The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable Four osd are nearly the same busy Could someone help to explain the reason of TEST 2 Thanks Email Disclaimer & Confidentiality Notice This message is confidential and intended solely for the use of the recipient to whom they are addressed. If you are not the intended recipient you should not deliver, distribute or copy this e-mail. Please notify the sender immediately by e-mail and delete this e-mail from your system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights reserved. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performance when two fio write to the same image
Wow, thanks. I think that¹s the tidbit of info I needed to explain why increasing numjobs doesn¹t (anymore) scale performance as expected. Warren Wang On 8/4/16, 7:49 AM, "ceph-users on behalf of Jason Dillaman"wrote: >With exclusive-lock, only a single client can have write access to the >image at a time. Therefore, if you are using multiple fio processes >against the same image, they will be passing the lock back and forth >between each other and you can expect bad performance. > >If you have a use-case where you really need to share the same image >between multiple concurrent clients, you will need to disable the >exclusive-lock feature (this can be done with the RBD cli on existing >images or by passing "--image-shared" when creating new images). > >On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER >wrote: >> Hi, >> >> I think this is because of exclusive-lock feature enabled by default >>since jessie on rbd image >> >> >> - Mail original - >> De: "Zhiyuan Wang" >> À: "ceph-users" >> Envoyé: Jeudi 4 Août 2016 11:37:04 >> Objet: [ceph-users] Bad performance when two fio write to the same image >> >> >> >> Hi Guys >> >> I am testing the performance of Jewel (10.2.2) with FIO, but found the >>performance would drop dramatically when two process write to the same >>image. >> >> My environment: >> >> 1. Server: >> >> One mon and four OSDs running on the same server. >> >> Intel P3700 400GB SSD which have 4 partitions, and each for one osd >>journal (journal size is 10GB) >> >> Inter P3700 400GB SSD which have 4 partitions, and each format to XFS >>for one osd data (each data is 90GB) >> >> 10GB network >> >> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) >> >> Memory: 256GB (it is not the bottleneck) >> >> 2. Client >> >> 10GB network >> >> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) >> >> Memory: 256GB (it is not the bottleneck) >> >> 3. Ceph >> >> Default configuration expect use async messager (have tried simple >>messager, got nearly the same result) >> >> 10GB image with 256 pg num >> >> Test Case >> >> 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; >>randwrite >> >> The performance is nearly 60MB/s and IOPS is nearly 15K >> >> Four osd are nearly the same busy >> >> 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; >>randwrite (write to the same image) >> >> The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each >>Terrible >> >> And I found that only one osd is busy, the other three are much more >>idle on CPU >> >> And I also run FIO on two clients, the same result >> >> 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd >>randwrite (one to image1, one to image2) >> >> The performance is nearly 35MB/s each and IOPS is nearly 8.5K each >>Reasonable >> >> Four osd are nearly the same busy >> >> >> >> >> >> Could someone help to explain the reason of TEST 2 >> >> >> >> Thanks >> >> >> Email Disclaimer & Confidentiality Notice >> >> This message is confidential and intended solely for the use of the >>recipient to whom they are addressed. If you are not the intended >>recipient you should not deliver, distribute or copy this e-mail. Please >>notify the sender immediately by e-mail and delete this e-mail from your >>system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights >>reserved. >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >-- >Jason >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Troubleshooting] I have a watcher I can't get rid of...
Thank you, Jason. While I can't find the culprit for the watcher (the watcher never expired, and survived a reboot. udev, maybe?), blacklisting the host did allow me to remove the device. Much appreciated, -kc > On Aug 4, 2016, at 4:50 AM, Jason Dillamanwrote: > > If the client is no longer running the watch should expire within 30 > seconds. If you are still experiencing this issue, you can blacklist > the mystery client via "ceph osd blacklist add". > > On Wed, Aug 3, 2016 at 6:06 PM, K.C. Wong wrote: >> I'm having a hard time removing an RBD that I no longer need. >> >> # rbd rm / >> 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not >> removing >> Removing image: 0% complete...failed. >> rbd: error: image still has watchers >> This means the image is still open or the client using it crashed. Try again >> after closing/unmapping it or waiting 30s for the crashed client to timeout. >> >> So, I use `rbd status` to identify the watcher: >> >> # rbd status / >> Watchers: >>watcher=:0/705293879 client.1076985 cookie=1 >> >> I log onto that host, and did >> >> # rbd showmapped >> >> which returns nothing >> >> I don't use snapshot and I don't use cloning, so, there shouldn't >> be any image sharing. I ended up rebooting that host and the >> watcher is still around, and my problem persist: I can't remove >> the RBD. >> >> At this point, I'm all out of ideas on how to troubleshoot this >> problem. I'm running infernalis: >> >> # ceph --version >> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) >> >> in my set up, on CentOS 7.2 hosts >> >> # uname -r >> 3.10.0-327.22.2.el7.x86_64 >> >> I appreciate any assistance, >> >> -kc >> >> K.C. Wong >> kcw...@verseon.com >> 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE >> hkps://hkps.pool.sks-keyservers.net >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Jason K.C. Wong kcw...@verseon.com 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] fast-diff map is always invalid
Can you run "rbd info vm-208-disk-2@initial.20160729-220225"? You most likely need to rebuild the object map for that specific snapshot via "rbd object-map rebuild vm-208-disk-2@initial.20160729-220225". On Sat, Jul 30, 2016 at 7:17 AM, Christoph Adomeitwrote: > Hi there, > > I upgraded my cluster to jewel recently, built object maps for every image and > recreated all snapshots du use fast-diff feature for backups. > > Unfortunately i am still getting the following error message on rbd du: > > root@host:/backups/ceph# rbd du vm-208-disk-2 > warning: fast-diff map is invalid for vm-208-disk-2@initial.20160729-220225. > operation may be slow. > > What might be wrong ? > > root@1host:/backups/ceph# rbd --version > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > > root@host:/backups/ceph# rbd info vm-208-disk-2 > rbd image 'vm-208-disk-2': > size 275 GB in 70400 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.35ea4ac2ae8944a > format: 2 > features: layering, exclusive-lock, object-map, fast-diff > flags: > > Thanks > Christoph > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd-mirror questions
Hello, I am thinking about setting up a second Ceph cluster in the near future, and I was wondering about the current status of rbd-mirror. 1)is it production ready at this point? 2)can it be used when you have a cluster with existing data in order to replicate onto a new cluster? 3)we have some rather large rbd images at this point..several in the 90TB range...would there be any concern using rbd-mirror given the size of our images? Thanks, Shain -- NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] openATTIC 2.0.13 beta has been released
Hi all, FYI, a few days ago, we released openATTIC 2.0.13 beta. On the Ceph management side, we've made some progress with the cluster and pool monitoring backend, which lays the foundation for the dashboard that will display graphs generated from this data. We also added some more RBD management functionality to the Web UI. For more details, please see the release announcement here: https://blog.openattic.org/posts/openattic-2.0.13-beta-has-been-released/ We're still in the early stages of development of the Ceph management and monitoring functionality, so we're very eager on receiving feedback and comments. Thanks! Lenz signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] question about ceph-deploy osd create
Hi All, With ceph jewel, I'm pretty stuck with ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}] Because when i specify a journal path like this: ceph-deploy osd prepare ceph-osd1:sdd:sdf7 And then: ceph-deploy osd activate ceph-osd1:sdd:sdf7 I end up with "wrong permission" on the osd when activating, complaining about "tmp" directory where the files are owned by root, and it seems it tryes to do stuff as ceph user. It works when i don't specify a separate journal Any idea of what i'm doing wrong ? thks -- *Guillaume Comte* 06 25 85 02 02 | guillaume.co...@blade-group.com90 avenue des Ternes, 75 017 Paris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 14.04 Striping / RBD / Single Thread Performance
If you are attempting to use RBD "fancy" striping (e.g. stripe unit != object size and stripe count != 1) with krbd, the answer is that it is still unsupported. On Wed, Aug 3, 2016 at 8:41 AM, w...@globe.dewrote: > Hi List, > i am using Ceph Infernalis and Ubuntu 14.04 Kernel 3.13. > 18 Data Server / 3 MON / 3 RBD Clients > > I want to use RBD on the Client with image format 2 and Striping. > Is it supported? > > I want to create rbd with: > rbd create testrbd -s 2T --image-format=2 --image-feature=striping > --image-feature=exclusive-lock --stripe-unit 65536B --stripe-count 8 > > Do i become better single Thread performance with a higher stripe count? > If not: Should i use Ubuntu 16.04 with Kernel 4.4 ? Is it with that Kernel > supported? > > The manpage says: > > http://manpages.ubuntu.com/manpages/wily/man8/rbd.8.html > > PARAMETERS > >--image-format format > Specifies which object layout to use. The default is 1. > > · format 2 - Use the second rbd format, which is supported > by > librbd and kernel since version 3.11 (except for > striping). > This adds support for cloning and is more easily extensible > to > allow more features in the future. > > > Regards > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Troubleshooting] I have a watcher I can't get rid of...
If the client is no longer running the watch should expire within 30 seconds. If you are still experiencing this issue, you can blacklist the mystery client via "ceph osd blacklist add". On Wed, Aug 3, 2016 at 6:06 PM, K.C. Wongwrote: > I'm having a hard time removing an RBD that I no longer need. > > # rbd rm / > 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not > removing > Removing image: 0% complete...failed. > rbd: error: image still has watchers > This means the image is still open or the client using it crashed. Try again > after closing/unmapping it or waiting 30s for the crashed client to timeout. > > So, I use `rbd status` to identify the watcher: > > # rbd status / > Watchers: > watcher=:0/705293879 client.1076985 cookie=1 > > I log onto that host, and did > > # rbd showmapped > > which returns nothing > > I don't use snapshot and I don't use cloning, so, there shouldn't > be any image sharing. I ended up rebooting that host and the > watcher is still around, and my problem persist: I can't remove > the RBD. > > At this point, I'm all out of ideas on how to troubleshoot this > problem. I'm running infernalis: > > # ceph --version > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > > in my set up, on CentOS 7.2 hosts > > # uname -r > 3.10.0-327.22.2.el7.x86_64 > > I appreciate any assistance, > > -kc > > K.C. Wong > kcw...@verseon.com > 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE > hkps://hkps.pool.sks-keyservers.net > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performance when two fio write to the same image
With exclusive-lock, only a single client can have write access to the image at a time. Therefore, if you are using multiple fio processes against the same image, they will be passing the lock back and forth between each other and you can expect bad performance. If you have a use-case where you really need to share the same image between multiple concurrent clients, you will need to disable the exclusive-lock feature (this can be done with the RBD cli on existing images or by passing "--image-shared" when creating new images). On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIERwrote: > Hi, > > I think this is because of exclusive-lock feature enabled by default since > jessie on rbd image > > > - Mail original - > De: "Zhiyuan Wang" > À: "ceph-users" > Envoyé: Jeudi 4 Août 2016 11:37:04 > Objet: [ceph-users] Bad performance when two fio write to the same image > > > > Hi Guys > > I am testing the performance of Jewel (10.2.2) with FIO, but found the > performance would drop dramatically when two process write to the same image. > > My environment: > > 1. Server: > > One mon and four OSDs running on the same server. > > Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal > (journal size is 10GB) > > Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one > osd data (each data is 90GB) > > 10GB network > > CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) > > Memory: 256GB (it is not the bottleneck) > > 2. Client > > 10GB network > > CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) > > Memory: 256GB (it is not the bottleneck) > > 3. Ceph > > Default configuration expect use async messager (have tried simple messager, > got nearly the same result) > > 10GB image with 256 pg num > > Test Case > > 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite > > The performance is nearly 60MB/s and IOPS is nearly 15K > > Four osd are nearly the same busy > > 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite > (write to the same image) > > The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible > > And I found that only one osd is busy, the other three are much more idle on > CPU > > And I also run FIO on two clients, the same result > > 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite > (one to image1, one to image2) > > The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable > > Four osd are nearly the same busy > > > > > > Could someone help to explain the reason of TEST 2 > > > > Thanks > > > Email Disclaimer & Confidentiality Notice > > This message is confidential and intended solely for the use of the recipient > to whom they are addressed. If you are not the intended recipient you should > not deliver, distribute or copy this e-mail. Please notify the sender > immediately by e-mail and delete this e-mail from your system. Copyright © > 2016 by Istuary Innovation Labs, Inc. All rights reserved. > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Small Ceph cluster
I will read your cache thread, thnx. Now we have the following setup in mind: X10SRH-CLN4F E5-2620v4 32GB/64GB RAM 6x 2TB (start with 4 drives) 1x S3710 200GB for Journaling In the future adding 2 SSD's for caching, or is it an option to use a P3700 400GB (or two) for journaling and caching ? Kind regards, Tom On Mon, Aug 1, 2016 at 2:46 PM, Christian Balzerwrote: > > Hello, > > On Mon, 1 Aug 2016 14:34:43 +0200 Tom T wrote: > > > Hi Christian, > > > > Thnx for your reply. > > > > Case: > > CSE-825TQC-600LPB > > > > I made a typo with the CPU, it's a E3-1240 v5 3.5Ghz. > > So a E5-2620 v4 is recommended when i want to add SSD for caching ? > > > It's more a question of core count (and speed per core), an E5-1650 v3 for > example would do the trick (in all situations) with 4 HDD OSDs with SSD > journal and 1-2 SSD OSDs. > > > With a caching tier, is the data on the caching tier a copy from data on > > the normal tier ? > You want to re-read the respective documentation and the various threads > on this ML about cache tiering, including my > "Cache tier operation clarifications" one. > > In a reasonably busy cluster a cache pool will be very different from the > base pool and some hot data may never reach the base pool, ever. > > Meaning that your cache pool needs to be just as reliable as everything > else. > > > Is a caching tier with one SSD recommended or should i always have two > SSD > > in replicated mode ? > > > See above. > > Christian > > > > Kind regards, > > Tom > > > > > > > > On Mon, Aug 1, 2016 at 2:00 PM, Christian Balzer wrote: > > > > > > > > Hello, > > > > > > On Mon, 1 Aug 2016 11:09:00 +0200 Tom T wrote: > > > > > > > Hi Ceph users > > > > > > > > We are planning to setup a small ceph cluster, starting with 3 nodes > for > > > > VM's. > > > > I have some question about CPU and caching > > > > > > > > We would like to start with the following config: > > > > > > > > > > > > Supermicro X11SSI-LN4F > > > In which case? > > > > > > > Intel E3-1246 v3 3.5Ghz > > > A bit dated, but fast enough. > > > > > > > 32GB RAM > > > While enough for 4 OSDs, don't skimp on RAM if you can afford, reads > will > > > thank you for it. > > > > > > > S3500 80GB M.2 for OS > > > If you're short on money, maybe use a 535 (or 2!) for that purpose. > > > > > > > AOC-S3008L-L8e (LSI SAS3008) > > > > 4x 2TB ST2000NM0034 SAS12Gb > > > > > > I fail to see the need/point for 7.2k RPM HDDs with a mere 128MB of > cache > > > hanging of a 12Gb/s bus, but maybe that's just me. > > > > > > > 1x Intel 200GB S3710 for journal (via onboard SATA) > > > Good enough. > > > > > > > 4x 1Gb for networking > > > > > > > Unless all your clients also are limited to GbE and you have no budget > to > > > change that, don't. > > > > > > For VM's latency will be one of your biggest nemesis (nemesii?), use > > > faster (lower latency) networking. > > > > > > > Questions: > > > > Is the CPU enough ? > > > See above. > > > > > > > I would like to run the monitoring deamon on the same host, would > this > > > be a > > > > problem ? > > > > > > > Just within the normal usage needs, more RAM in that case anyway. > > > > > > > Optionally i would like to add an extra SSD for caching > > > Not really recommended with that server and not particular helpful with > > > that network. > > > A single SSD of any caliber will/can eat one of your CPU cores by > itself > > > and then ask for seconds. > > > > > > > Does write-back caching also optimize the reads ? > > > Yes, subject to "correct" configuration of course. > > > > > > > Do I need two SSD's per node > > > > > > > From a performance point of view, not so much. > > > Your network can't even saturate one 200GB DC S3710. > > > > > > From a redundancy point of view you might be better off with more > nodes. > > > > > > Christian > > > > > > > > > > > Kind regards, > > > > Tom > > > > > > > > > -- > > > Christian BalzerNetwork/Systems Engineer > > > ch...@gol.com Global OnLine Japan/Rakuten Communications > > > http://www.gol.com/ > > > > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performance when two fio write to the same image
Hi, I think this is because of exclusive-lock feature enabled by default since jessie on rbd image - Mail original - De: "Zhiyuan Wang"À: "ceph-users" Envoyé: Jeudi 4 Août 2016 11:37:04 Objet: [ceph-users] Bad performance when two fio write to the same image Hi Guys I am testing the performance of Jewel (10.2.2) with FIO, but found the performance would drop dramatically when two process write to the same image. My environment: 1. Server: One mon and four OSDs running on the same server. Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal (journal size is 10GB) Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one osd data (each data is 90GB) 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 2. Client 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 3. Ceph Default configuration expect use async messager (have tried simple messager, got nearly the same result) 10GB image with 256 pg num Test Case 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite The performance is nearly 60MB/s and IOPS is nearly 15K Four osd are nearly the same busy 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite (write to the same image) The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible And I found that only one osd is busy, the other three are much more idle on CPU And I also run FIO on two clients, the same result 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite (one to image1, one to image2) The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable Four osd are nearly the same busy Could someone help to explain the reason of TEST 2 Thanks Email Disclaimer & Confidentiality Notice This message is confidential and intended solely for the use of the recipient to whom they are addressed. If you are not the intended recipient you should not deliver, distribute or copy this e-mail. Please notify the sender immediately by e-mail and delete this e-mail from your system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights reserved. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bad performance when two fio write to the same image
Hi Guys I am testing the performance of Jewel (10.2.2) with FIO, but found the performance would drop dramatically when two process write to the same image. My environment: 1. Server: One mon and four OSDs running on the same server. Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal (journal size is 10GB) Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one osd data (each data is 90GB) 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 2. Client 10GB network CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) Memory: 256GB (it is not the bottleneck) 3. Ceph Default configuration expect use async messager (have tried simple messager, got nearly the same result) 10GB image with 256 pg num Test Case 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite The performance is nearly 60MB/s and IOPS is nearly 15K Four osd are nearly the same busy 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite (write to the same image) The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible And I found that only one osd is busy, the other three are much more idle on CPU And I also run FIO on two clients, the same result 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite (one to image1, one to image2) The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable Four osd are nearly the same busy Could someone help to explain the reason of TEST 2 Thanks Email Disclaimer & Confidentiality Notice This message is confidential and intended solely for the use of the recipient to whom they are addressed. If you are not the intended recipient you should not deliver, distribute or copy this e-mail. Please notify the sender immediately by e-mail and delete this e-mail from your system. Copyright (c) 2016 by Istuary Innovation Labs, Inc. All rights reserved. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph and SMI-S
Hi all, I was being asked if CEPH supports the Storage Management Initiative Specification (SMI-S)? This for the context of monitoring our ceph clusters/environments. I've tried looking and find no references to supporting it. But does it? thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrading a "conservative" [tm] cluster from Hammer to Jewel, a nightmare in the making
Hello, This is going to have some level of ranting, bear with me as the points are all valid and poignant. Backstory: I currently run 3 Ceph clusters all on Debian Jessie but with SysV init, as they all predate any systemd supporting Ceph packages. - A crappy test one running Hammer, manually deployed (OSDs mounted via fstab, a mix of Ext4 and XFS), MBR/DOS partitions. - Our main production cluster, also with fstab mounted OSDs, all Ext4. Extra bonus points for OSDs being entire, unpartitioned disk and journal holding SSDs being MBR/DOS partitioned. - Non critical production cluster installed with ceph-deploy and GPT OSDs. The later obviously is going to be the least problematic, though I'm sure there will be enough entertainment. Now we just finally got our real staging/testing cluster that actually resembles the production ones so I was going to give it a few spins before installing something that equals the production cluster. First try was Hammer using ceph-deploy. Complete fail, due to lack of systemd unit files/targets: --- [ceph-01][INFO ] Running command: systemctl enable ceph.target [ceph-01][WARNIN] Failed to execute operation: No such file or directory --- Allrite, lets try with Jewel. This blew up in my face when trying to use previously created partitions (GPT, mind ya), as documented here: --- http://tracker.ceph.com/issues/13833 --- Incidentally ceph-deploy once again is trying to be too helpful, when I gave it a "/dev/sda4" as journal target with the wrong GUID but a chown'ed dev file it created things and linked to the partition as stated. With partitions that have the correct GUID it will make a smarty-pants link to /dev/disk/by-partuuid/ (good intention!), even when given a "/dev/disk/by-id/" input. Oh well. And ceph-deploy of course still activates new OSDs half of the time, the other half it actually needs the activate step, thanks to udev I'm sure. Now I'd like to repeat the question in the issue above, at what point did GPT partitions (and udev magic) become mandatory? And if it is indeed mandatory, where are the painless and data safe transition tools? I'll re-create the main production cluster (that is hammer, sysv-init, fstab mounted OSDs) on the staging system next and see how what blows up and how violently when trying a Jewel upgrade. My guess is that it won't be systemd (as Jewel actually has the targets now), but the inability to deal with a manually deployed environment like mine. Expect news about that next week the latest. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com