[ceph-users] rbd image-meta

2015-07-23 Thread Maged Mokhtar
Hello i am trying to use the rbd image-meta set. i get an error from rbd that this command is not recognized yet it is documented in rdb documentation: http://ceph.com/docs/next/man/8/rbd/ I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04 Is image-meta set supported in rbd in

[ceph-users] rbd image-meta

2015-07-23 Thread Maged Mokhtar
Hello i am trying to use the rbd image-meta set. i get an error from rbd that this command is not recognized yet it is documented in rdb documentation: http://ceph.com/docs/next/man/8/rbd/ I am using Hammer release deployed using ceph_deploy on Ubutnu 14.04 Is image-meta set supported in rbd in

[ceph-users] Number of PGs: fix from start or change as we grow ?

2016-08-03 Thread Maged Mokhtar
Hello, I would like to build a small cluster with 20 disks to start but in the future would like to gradually increase it to maybe 200 disks. Is it better to fix the number of PGs in the pool from the beginning or is it better to start with a small number and then gradually change the number of

[ceph-users] watch timeout on failure

2017-01-21 Thread Maged Mokhtar
Hi, If a host with a kernel mapped rbd image dies, it still keeps a watch on the rbd image header for a timeout that seems to be determined by ms_tcp_read_timeout ( default 15 minutes ) rather than osd_client_watch_timeout whereas according to the docs: "If the client loses its connection to the

Re: [ceph-users] watch timeout on failure

2017-01-21 Thread Maged Mokhtar
Thanks for the clarification.. > On Sat, Jan 21, 2017 at 1:18 PM, Maged Mokhtar <mmokh...@petasan.org> > wrote: >> Hi, >> >> If a host with a kernel mapped rbd image dies, it still keeps a watch on >> the rbd image header for a timeout that seems to be

[ceph-users] help with crush rule

2017-02-18 Thread Maged Mokhtar
Hi, I have a need to support a small cluster with 3 hosts and 3 replicas given that in normal operation each replica will be placed on a separate host but in case one host dies then its replicas could be stored on separate osds on the 2 live hosts. I was hoping to write a rule that in case it

Re: [ceph-users] help with crush rule

2017-02-27 Thread Maged Mokhtar
Thank you for the clarification. apology for my late reply /maged From: Brian Andrus Sent: Wednesday, February 22, 2017 2:23 AM To: Maged Mokhtar Cc: ceph-users Subject: Re: [ceph-users] help with crush rule I don't think a CRUSH rule exception is currently possible, but it makes sense

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-18 Thread Maged Mokhtar
Thank you Mike for this update. I sent you and Dave the relevant changes we found for hyper-v. Cheers /maged -- From: "Mike Christie" <mchri...@redhat.com> Sent: Monday, October 17, 2016 9:40 PM To: "Maged Mokhtar" <

Re: [ceph-users] Ceph Blog Articles

2016-11-12 Thread Maged Mokhtar
gt;> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Maged Mokhtar >> Sent: 11 November 2016 21:48 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Ceph Blog Articles >> >> &

Re: [ceph-users] Ceph Blog Articles

2016-11-14 Thread Maged Mokhtar
-- From: "Nick Fisk" <n...@fisk.me.uk> Sent: Monday, November 14, 2016 11:41 AM To: "'Maged Mokhtar'" <mmokh...@petasan.org>; <ceph-users@lists.ceph.com> Subject: RE: [ceph-users] Ceph Blog Articles Hi Maged, I would i

Re: [ceph-users] Ceph Blog Articles

2016-11-11 Thread Maged Mokhtar
Nice article on write latency. If i understand correctly, this latency is measured while there is no overflow of the journal caused by long sustained writes else you will start hitting the HDD latency. Also queue depth you use is 1 ? Will be interested to see your article on hardware. /Maged

[ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-16 Thread Maged Mokhtar
Hello, I am happy to announce PetaSAN, an open source scale-out SAN that uses Ceph storage and LIO iSCSI Target. visit us at: www.petasan.org your feedback will be much appreciated. maged mokhtar ___ ceph-users mailing list ceph-users

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-17 Thread Maged Mokhtar
ive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 16.10.2016 um 18:57 schrieb Maged Mokhtar: Hello, I am happy to announce PetaSAN, an

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-17 Thread Maged Mokhtar
Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 17.10.2016 um 13:37 schrieb Maged Mokhtar: Hi Oliver, if you are refering to clustering reservations through VAAI. We are using upstream code from SUSE Enterprise Storage which adds clustered support for VAAI (compare and write, wri

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-17 Thread Maged Mokhtar
Thank you David very much and thank you for the correction. -- From: "David Disseldorp" <dd...@suse.de> Sent: Monday, October 17, 2016 5:24 PM To: "Maged Mokhtar" <mmokh...@petasan.org> Cc: <ceph-users@lists.c

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-17 Thread Maged Mokhtar
ay, October 17, 2016 4:21 PM To: <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] new Open Source Ceph based iSCSI SAN project On 2016-10-17T13:37:29, Maged Mokhtar <mmokh...@petasan.org> wrote: Hi Maged, glad to see our patches caught your attention. You're aware that th

Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread Maged Mokhtar
Max iops depends on the hardware type/configuration for disks/cpu/network. For disks, the theoretical iops limit is read = physical disk iops x number of disks write (with journal on same disk) = physical disk iops x number of disks / num of replicas / 3 in practice real benchmarks will vary

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-07 Thread Maged Mokhtar
Adding more nodes is best if you have unlimited budget :)You should add more osds per node until you start hitting cpu or network bottlenecks. Use a perf tool like atop/sysstat to know when this happens. Original message From: kevin parrikar

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-08 Thread Maged Mokhtar
Why would you still be using journals when running fully OSDs on SSDs? When using a journal the data is first written to a journal, and then that same data is (later on) written again to disk. This in the assumption that the time to write the journal is only a fraction of the time it costs to

Re: [ceph-users] Estimate Max IOPS of Cluster

2017-01-04 Thread Maged Mokhtar
if you are asking about what tools to use: http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance You should run many concurrent processes on different clients From: Maged Mokhtar Sent: Wednesday, January 04, 2017 6:45 PM To: John Petrini ; ceph-users Subject: Re

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-06 Thread Maged Mokhtar
The numbers are very low. I would first benchmark the system without the vm client using rbd 4k test such as: rbd bench-write image01  --pool=rbd --io-threads=32 --io-size 4096 --io-pattern rand --rbd_cache=false Original message From: kevin parrikar

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Maged Mokhtar
We were beta till early Feb. so we are relatively young. If there are issues/bugs, we'd certainly be interested to know through our forum. Note that with us you can always use the cli and bypass the UI and it will be straight Ceph/LIO commands if you wish. From: Brady Deetz Sent: Thursday,

Re: [ceph-users] rbd iscsi gateway question

2017-04-06 Thread Maged Mokhtar
The io hang (it is actually a pause not hang) is done by Ceph only in case of a simultaneous failure of 2 hosts or 2 osds on separate hosts. A single host/osd being out will not cause this. In PetaSAN project www.petasan.org we use LIO/krbd. We have done a lot of tests on VMWare, in case of io

Re: [ceph-users] Book & questions

2017-08-13 Thread Maged Mokhtar
i would recommend getting all 3 books, they are all very good. i particularly like Nick's book, it has a lot of hands on issues and quite recent. /Maged On 2017-08-13 09:43, Sinan Polat wrote: > Hi all, > > I am quite new with Ceph Storage. Currently we have a Ceph environment > running,

Re: [ceph-users] RBD journaling benchmarks

2017-07-13 Thread Maged Mokhtar
-- From: "Jason Dillaman" <jdill...@redhat.com> Sent: Thursday, July 13, 2017 4:45 AM To: "Maged Mokhtar" <mmokh...@petasan.org> Cc: "Mohamad Gebai" <mge...@suse.com>; "ceph-users"

Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Maged Mokhtar
On 2017-07-10 20:06, Mohamad Gebai wrote: > On 07/10/2017 01:51 PM, Jason Dillaman wrote: On Mon, Jul 10, 2017 at 1:39 > PM, Maged Mokhtar <mmokh...@petasan.org> wrote: These are significant > differences, to the point where it may not make sense > to use rbd journaling

Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Maged Mokhtar
On 2017-07-10 18:14, Mohamad Gebai wrote: > Resending as my first try seems to have disappeared. > > Hi, > > We ran some benchmarks to assess the overhead caused by enabling > client-side RBD journaling in Luminous. The tests consists of: > - Create an image with journaling enabled

[ceph-users] krbd journal support

2017-07-06 Thread Maged Mokhtar
Hi all, Are there any plans to support rbd journal feature in kernel krbd ? Cheers /Maged ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kernel mounted RBD's hanging

2017-06-30 Thread Maged Mokhtar
On 2017-06-29 16:30, Nick Fisk wrote: > Hi All, > > Putting out a call for help to see if anyone can shed some light on this. > > Configuration: > Ceph cluster presenting RBD's->XFS->NFS->ESXi > Running 10.2.7 on the OSD's and 4.11 kernel on the NFS gateways in a > pacemaker cluster > Both

Re: [ceph-users] How to force "rbd unmap"

2017-07-05 Thread Maged Mokhtar
On 2017-07-05 20:42, Ilya Dryomov wrote: > On Wed, Jul 5, 2017 at 8:32 PM, David Turner wrote: > >> I had this problem occasionally in a cluster where we were regularly mapping >> RBDs with KRBD. Something else we saw was that after this happened for >> un-mapping RBDs,

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Maged Mokhtar
On 2017-07-05 23:22, David Clarke wrote: > On 07/05/2017 08:54 PM, Massimiliano Cuttini wrote: > >> Dear all, >> >> luminous is coming and sooner we should be allowed to avoid double writing. >> This means use 100% of the speed of SSD and NVMe. >> Cluster made all of SSD and NVMe will not be

Re: [ceph-users] Iscsi configuration

2017-08-09 Thread Maged Mokhtar
Hi Sam, Pacemaker will take care of HA failover but you will need to progagate the PR data yourself. If you are interested in a solution that works out of the box with Windows, have a look at PetaSAN www.petasan.org It works well with MS hyper-v/storage spaces/Scale Out File Server. Cheers

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-19 Thread Maged Mokhtar
Hi Nick, Interesting your note on PG locking, but I would be surprised if its effect is that bad. I would think that in your example the 2 ms is a total latency, the lock will probably be applied to small portion of that, so the concurrent operations are not serialized for the entire time..but

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread Maged Mokhtar
It is likely your 2 spinning disks cannot keep up with the load. Things are likely to improve if you double your OSDs hooking them up to your existing SSD journal. Technically it would be nice to run a load/performance tool (either atop/collectl/sysstat) and measure how busy your resources are,

[ceph-users] Config parameters for system tuning

2017-06-20 Thread Maged Mokhtar
recommendation or can could we use more threads than cores ? Cheers Maged Mokhtar PetaSAN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Config parameters for system tuning

2017-06-22 Thread Maged Mokhtar
: filestore_queue_max_delay_multiple filestore_queue_high_delay_multiple filestore_queue_low_threshhold filestore_queue_high_threshhold again it will be good to update the docs: http://docs.ceph.com/docs/master/rados/configuration/filestore-config-ref/ I guess all eyes are on Bluestore now :) Maged Mokhtar

Re: [ceph-users] Squeezing Performance of CEPH

2017-06-22 Thread Maged Mokhtar
cluster. Cheers, Maged Mokhtar PetaSAN On 2017-06-22 19:19, Massimiliano Cuttini wrote: > Hi everybody, > > I want to squeeze all the performance of CEPH (we are using jewel 10.2.7). > We are testing a testing environment with 2 nodes having the same > configuration: &g

Re: [ceph-users] Ceph random read IOPS

2017-06-26 Thread Maged Mokhtar
you get from the > QD=1 test. To achieve lower latency you need faster cpu freq. Yes it is > expensive and as you said you need lower latency switches and so on but you > just have to pay more to achieve this. > > /Maged > > On Sun, Jun 25, 2017 at 4:53 AM, Willem Jan

Re: [ceph-users] Ceph random read IOPS

2017-06-24 Thread Maged Mokhtar
My understanding was this test is targeting latency more than IOPS. This is probably why its was run using QD=1. It also makes sense that cpu freq will be more important than cores. On 2017-06-24 12:52, Willem Jan Withagen wrote: > On 24-6-2017 05:30, Christian Wuerdig wrote: > >> The

Re: [ceph-users] VMware + CEPH Integration

2017-06-15 Thread Maged Mokhtar
Hi, Please check the PetaSAN project www.petasan.org We provide clustered iSCSI using LIO/Ceph rbd and Consul for HA. Works well with VMWare. /Maged From: Osama Hasebou Sent: Thursday, June 15, 2017 12:29 PM To: ceph-users Subject: [ceph-users] VMware + CEPH Integration Hi Everyone,

Re: [ceph-users] trying to understanding crush more deeply

2017-09-22 Thread Maged Mokhtar
Per section 3.4.4 The default bucket type straw computes the hash of (PG number, replica number, bucket id) for all buckets using the Jenkins integer hashing function, then multiply this by bucket weight (for OSD disks the weight of 1 is for 1 TB, for higher level it is the sum of contained

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-21 Thread Maged Mokhtar
On 2017-09-21 07:56, Lazuardi Nasution wrote: > Hi, > > I'm still looking for the answer of these questions. Maybe someone can share > their thought on these. Any comment will be helpful too. > > Best regards, > > On Sat, Sep 16, 2017 at 1:39 AM, Lazuardi Nasution

Re: [ceph-users] trying to understanding crush more deeply

2017-09-22 Thread Maged Mokhtar
lain something > about this ? Apologize for my dummy. And thank you very much . : ) > > On Fri, Sep 22, 2017 at 3:50 PM, Maged Mokhtar <mmokh...@petasan.org> wrote: > >> Per section 3.4.4 The default bucket type straw computes the hash of (PG >> number, replica number, b

Re: [ceph-users] Bluestore disk colocation using NVRAM, SSD and SATA

2017-09-21 Thread Maged Mokhtar
/lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com My guess is for wal: you are dealing with a 2 step io operation so in case

Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-14 Thread Maged Mokhtar
On 2017-10-14 17:50, Kashif Mumtaz wrote: > Hello Dear, > > I am trying to configure the Ceph iscsi gateway on Ceph Luminious . As per > below > > Ceph iSCSI Gateway -- Ceph Documentation [1] > > [1] > > CEPH ISCSI GATEWAY — CEPH DOCUMENTATION > > Ceph is iscsi gateway are configured

Re: [ceph-users] osd max scrubs not honored?

2017-10-15 Thread Maged Mokhtar
On 2017-10-14 05:02, J David wrote: > Thanks all for input on this. > > It's taken a couple of weeks, but based on the feedback from the list, > we've got our version of a scrub-one-at-a-time cron script running and > confirmed that it's working properly. > > Unfortunately, this hasn't really

Re: [ceph-users] osd max scrubs not honored?

2017-10-15 Thread Maged Mokhtar
correction, i limit it to 128K: echo 128 > /sys/block/sdX/queue/read_ahead_kb On 2017-10-15 13:14, Maged Mokhtar wrote: > On 2017-10-14 05:02, J David wrote: > >> Thanks all for input on this. >> >> It's taken a couple of weeks, but based on the feedback from

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-17 Thread Maged Mokhtar
>> Would it be 4 objects of 24M and 4 objects of 250KB? Or will the last 4 objects be artificially padded (with 0's) to meet the stripe_unit? It will be 4 object of 24M + 1M stored on the 5th object If you write 104M : 4 object of 24M + 8M stored on the 5th object If you write 105M : 4

Re: [ceph-users] Ceph-ISCSI

2017-10-17 Thread Maged Mokhtar
The issue with active/active is the following condition: client initiator sends write operation to gateway server A server A does not respond within client timeout client initiator re-sends failed write operation to gateway server B client initiator sends another write operation to gateway server

Re: [ceph-users] Ceph-ISCSI

2017-10-12 Thread Maged Mokhtar
On 2017-10-11 14:57, Jason Dillaman wrote: > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López > wrote: > >> As far as I am able to understand there are 2 ways of setting iscsi for ceph >> >> 1- using kernel (lrbd) only able on SUSE, CentOS, fedora... > > The

Re: [ceph-users] Ceph-ISCSI

2017-10-12 Thread Maged Mokhtar
On 2017-10-12 11:32, David Disseldorp wrote: > On Wed, 11 Oct 2017 14:03:59 -0400, Jason Dillaman wrote: > > On Wed, Oct 11, 2017 at 1:10 PM, Samuel Soulard > wrote: Hmmm, If you failover the identity of the > LIO configuration including PGRs > (I believe they are

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Maged Mokhtar
One of the things to watch out in small clusters is OSDs can get full rather unexpectedly in recovery/backfill cases: In your case you have 2 OSD nodes with 5 disks each. Since you have a replica of 2, each PG will have 1 copy on each host, so if an OSD fails, all its PGs will have to be

Re: [ceph-users] RBD features(kernel client) with kernel version

2017-09-26 Thread Maged Mokhtar
On 2017-09-25 14:29, Ilya Dryomov wrote: > On Sat, Sep 23, 2017 at 12:07 AM, Muminul Islam Russell > wrote: > >> Hi Ilya, >> >> Hope you are doing great. >> Sorry for bugging you. I did not find enough resources for my question. I >> would be really helped if you could

Re: [ceph-users] osd create returns duplicate ID's

2017-09-29 Thread Maged Mokhtar
On 2017-09-29 10:44, Adrian Saul wrote: > Do you mean that after you delete and remove the crush and auth entries for > the OSD, when you go to create another OSD later it will re-use the previous > OSD ID that you have destroyed in the past? > > Because I have seen that behaviour as well -

Re: [ceph-users] osd create returns duplicate ID's

2017-09-29 Thread Maged Mokhtar
On 2017-09-29 11:31, Maged Mokhtar wrote: > On 2017-09-29 10:44, Adrian Saul wrote: > > Do you mean that after you delete and remove the crush and auth entries for > the OSD, when you go to create another OSD later it will re-use the previous > OSD ID that you have destro

Re: [ceph-users] rados_read versus rados_aio_read performance

2017-10-01 Thread Maged Mokhtar
On 2017-10-01 16:47, Alexander Kushnirenko wrote: > Hi, Gregory! > > Thanks for the comment. I compiled simple program to play with write speed > measurements (from librados examples). Underline "write" functions are: > rados_write(io, "hw", read_res, 1048576, i*1048576); >

Re: [ceph-users] Get rbd performance stats

2017-09-29 Thread Maged Mokhtar
On 2017-09-29 17:13, Matthew Stroud wrote: > Is there a way I could get a performance stats for rbd images? I'm looking > for iops and throughput. > > This issue we are dealing with is that there was a sudden jump in throughput > and I want to be able to find out with rbd volume might be

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Maged Mokhtar
I would suggest either adding 1 new disk on each of the 2 machines increasing the osd_backfill_full_ratio to something like 90 or 92 from default 85. /Maged On 2017-08-28 08:01, hjcho616 wrote: > Hello! > > I've been using ceph for long time mostly for network CephFS storage, even >

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-18 Thread Maged Mokhtar
First a general comment: local RAID will be faster than Ceph for a single threaded (queue depth=1) io operation test. A single thread Ceph client will see at best same disk speed for reads and for writes 4-6 times slower than single disk. Not to mention the latency of local disks will much better.

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-18 Thread Maged Mokhtar
0 > Total writes made: 31032 > Write size: 4096 > Object size:4096 > Bandwidth (MB/sec): 3.93282 > Stddev Bandwidth: 3.66265 > Max bandwidth (MB/sec): 13.668 > Min bandwidth (MB/sec): 0 > Average IOPS: 1006 > Stddev

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-18 Thread Maged Mokhtar
de.servers.com/ssd-performance-2017-c4307a92dea [2] > > On Wed, Oct 18, 2017 at 11:53 AM, Maged Mokhtar <mmokh...@petasan.org> wrote: > > Check out the following link: some SSDs perform bad in Ceph due to sync > writes to journal > > https://www.sebastien-han.fr

Re: [ceph-users] Backup VM (Base image + snapshot)

2017-10-20 Thread Maged Mokhtar
Hi all, Can export-diff work effectively without the fast-diff rbd feature as it is not supported in kernel rbd ? Maged On 2017-10-19 23:18, Oscar Segarra wrote: > Hi Richard, > > Thanks a lot for sharing your experience... I have made deeper investigation > and it looks export-diff is

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-18 Thread Maged Mokhtar
in which in the dd infile can > be read? > And I assume the best test should be run with no other load. > > How does one run the rados bench "as stress"? > > -RG > > On Wed, Oct 18, 2017 at 1:33 PM, Maged Mokhtar <mmokh...@petasan.org> wrote:

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-15 Thread Maged Mokhtar
On 2017-11-14 21:54, Milanov, Radoslav Nikiforov wrote: > Hi > > We have 3 node, 27 OSDs cluster running Luminous 12.2.1 > > In filestore configuration there are 3 SSDs used for journals of 9 OSDs on > each hosts (1 SSD has 3 journal paritions for 3 OSDs). > > I've converted filestore to

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-27 Thread Maged Mokhtar
On 2017-11-27 15:02, German Anders wrote: > Hi All, > > I've a performance question, we recently install a brand new Ceph cluster > with all-nvme disks, using ceph version 12.2.0 with bluestore configured. The > back-end of the cluster is using a bond IPoIB (active/passive) , and for the >

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Maged Mokhtar
I tend to agree with Wido. May of us still reply on ceph-disk and hope to see it live a little longer. Maged On 2017-11-28 13:54, Alfredo Deza wrote: > On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander wrote: > Op 27 november 2017 om 14:36 schreef Alfredo Deza

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-29 Thread Maged Mokhtar
yeah, we are using the same nvme disk with an additional >> partition to use as journal/wal. We double check the c-state and it was >> not configure to use c1, so we change that on all the osd nodes and mon >> nodes and we're going to make some new tests, and see how it goes. I'll >

Re: [ceph-users] Performance, and how much wiggle room there is with tunables

2017-11-10 Thread Maged Mokhtar
Hi Mark, It will be interesting to know: The impact of replication. I guess it will decrease by a higher factor than the replica count. I assume you mean the 30K IOPS per OSD is what the client sees, if so the OSD raw disk itself will be doing more IOPS, is this correct and if so what is the

Re: [ceph-users] Performance, and how much wiggle room there is with tunables

2017-11-10 Thread Maged Mokhtar
rados benchmark is a client application that simulates client io to stress the cluster. This applies whether you run the test from an external client or from a cluster server that will act as a client. For fast clusters it the client will saturate (cpu/net) before the cluster does. To get accurate

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-11-03 Thread Maged Mokhtar
On 2017-11-03 15:59, Wido den Hollander wrote: > Op 3 november 2017 om 14:43 schreef Mark Nelson : > > On 11/03/2017 08:25 AM, Wido den Hollander wrote: > Op 3 november 2017 om 13:33 schreef Mark Nelson : > > On 11/03/2017 02:44 AM, Wido den Hollander

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-12-08 Thread Maged Mokhtar
4M block sizes you will only need 22.5 iops On 2017-12-08 09:59, Maged Mokhtar wrote: > Hi Russell, > > It is probably due to the difference in block sizes used in the test vs your > cluster load. You have a latency problem which is limiting your max write > iops to around

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-12-08 Thread Maged Mokhtar
ur actual writes are low, as most of our Ceph Cluster based images are > low-write, high-memory. So a 20GB/day life/write capacity is a non-issue for > us. Only write speed is the concern. Our write-intensive images are locked on > non-ceph disks. > What are others using for SSD drives in thei

[ceph-users] Single disk per OSD ?

2017-12-01 Thread Maged Mokhtar
Hi all, I believe most exiting setups use 1 disk per OSD. Is this going to be the most common setup in the future ? With the move to lvm, will this prefer the use of multiple disks per OSD ? On the other side i also see nvme vendors recommending multiple OSDs ( 2,4 ) per disk as disks are

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-25 Thread Maged Mokhtar
rver, this was the statistics for about 8 of 9 >>>>> disks, with the 9th disk not far behind the others. >>>>> >>>>> I cannot believe all 9 disks are bad >>>>> They are the same disks as the newest 1st server, Crucial_CT960M500SSD1,

Re: [ceph-users] OSDs wrongly marked down

2017-12-20 Thread Maged Mokhtar
Could also be your hardware under powered for the io you have. try to check your resource load during peak workload together with recovery and scrubbing going on at same time. On 2017-12-20 17:03, David Turner wrote: > When I have OSDs wrongly marked down it's usually to do with the >

Re: [ceph-users] Issues with RBD when rebooting

2018-05-25 Thread Maged Mokhtar
On 2018-05-25 12:11, Josef Zelenka wrote: > Hi, we are running a jewel cluster (54OSDs, six nodes, ubuntu 16.04) that > serves as a backend for openstack(newton) VMs. TOday we had to reboot one of > the nodes(replicated pool, x2) and some of our VMs oopsed with issues with > their FS(mainly

Re: [ceph-users] How to use libradostriper to improve I/O bandwidth?

2018-06-12 Thread Maged Mokhtar
On 2018-06-12 01:01, Jialin Liu wrote: > Hello Ceph Community, > > I used libradosstriper api to test the striping feature, it doesn't seem to > improve the performance at all, can anyone advise what's wrong with my > settings: > > The rados object store testbed at my center has > osd:

Re: [ceph-users] CephFS+NFS For VMWare

2018-07-02 Thread Maged Mokhtar
Hi Nick, With iSCSI we reach over 150 MB/s vmotion for single vm, 1 GB/s for 7-8 vm migrations. Since these are 64KB block sizes, latency/iops is a large factor, you need either controllers with write back cache or all flash . hdds without write cache will suffer even with external wal/db on

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Maged Mokhtar
and a busy% of below 90% during rados 4k test. Maged On 2017-10-26 16:44, Russell Glaue wrote: > On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar <mmokh...@petasan.org> wrote: > >> It depends on what stage you are in: >> in production, probably the best thing is to

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-27 Thread Maged Mokhtar
ng but small toy-test clusters. > > On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue <rgl...@cait.org> wrote: > On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar <mmokh...@petasan.org> wrote: > It depends on what stage you are in: > in production, probably the best thing is

Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Maged Mokhtar
On 2018-01-22 08:39, Wido den Hollander wrote: > On 01/20/2018 02:02 PM, Marc Roos wrote: > >> If I test my connections with sockperf via a 1Gbit switch I get around >> 25usec, when I test the 10Gbit connection via the switch I have around >> 12usec is that normal? Or should there be a

Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Maged Mokhtar
ved, 0% packet loss, time 2363ms > rtt min/avg/max/mdev = 0.014/0.015/0.322/0.006 ms, ipg/ewma 0.023/0.016 ms > > On 22 January 2018 at 22:37, Nick Fisk <n...@fisk.me.uk> wrote: > >> Anyone with 25G ethernet willing to do the test? Would love to see what the >> laten

Re: [ceph-users] How ceph client read data from ceph cluster

2018-01-26 Thread Maged Mokhtar
On 2018-01-26 09:09, shadow_lin wrote: > Hi List, > I read a old article about how ceph client read from ceph cluster.It said the > client only read from the primary osd. Since ceph cluster in replicate mode > have serveral copys of data only read from one copy seems waste the > performance

Re: [ceph-users] How ceph client read data from ceph cluster

2018-01-26 Thread Maged Mokhtar
ph(12.2.2) the client only read from the primary > osd(one copy),is that true? > > 2018-01-27 > - > > lin.yunfan > --------- > > 发件人:Maged Mokhtar <mmokh...@petasan.org> > 发送时间:2018-01-26 20:27 > 主题:Re: [ceph-users] Ho

Re: [ceph-users] troubleshooting ceph performance

2018-01-30 Thread Maged Mokhtar
On 2018-01-31 08:14, Manuel Sopena Ballesteros wrote: > Dear Ceph community, > > I have a very small ceph cluster for testing with this configuration: > > · 2x compute nodes each with: > > · dual port of 25 nic > > · 2x socket (56 cores with hyperthreading) > > ·

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-02-01 Thread Maged Mokhtar
- > > lin.yunfan > ----- > > 发件人:Maged Mokhtar <mmokh...@petasan.org> > 发送时间:2018-02-01 14:22 > 主题:Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it > is bluestore) ? > 收件人:"David Turner"<drakonst...@gmail.com

Re: [ceph-users] Ceph - incorrect output of ceph osd tree

2018-01-31 Thread Maged Mokhtar
try setting: mon_osd_min_down_reporters = 1 On 2018-01-31 20:46, Steven Vacaroaia wrote: > Hi, > > Why is ceph osd tree reports that osd.4 is up when the server on which osd.4 > is running is actually down ?? > > Any help will be appreciated > > [root@osd01 ~]# ping -c 2 osd02 > PING

Re: [ceph-users] How to clean data of osd with ssd journal(wal, db if it is bluestore) ?

2018-01-31 Thread Maged Mokhtar
I would recommend as Wido to use the dd command. block db device holds the metada/allocation of objects stored in data block, not cleaning this is asking for problems, besides it does not take any time. In our testing building new custer on top of older installation, we did see many cases where

Re: [ceph-users] Newbie question: stretch ceph cluster

2018-02-14 Thread Maged Mokhtar
Hi, You need to set the min_size to 2 in crush rule. The exact location and replication flow when a client writes data depends on the object name and num of pgs. the crush rule determines which osds will serve a pg, the first is the primary osd for that pg. The client computes the pg from the

Re: [ceph-users] Ceph luminous performance - how to calculate expected results

2018-02-14 Thread Maged Mokhtar
On 2018-02-14 20:14, Steven Vacaroaia wrote: > Hi, > > It is very useful to "set up expectations" from a performance perspective > > I have a cluster using 3 DELL R620 with 64 GB RAM and 10 GB cluster network > > I've seen numerous posts and articles about the topic mentioning the >

Re: [ceph-users] WAL/DB size

2018-09-07 Thread Maged Mokhtar
On 2018-09-07 14:36, Alfredo Deza wrote: > On Fri, Sep 7, 2018 at 8:27 AM, Muhammad Junaid > wrote: > >> Hi there >> >> Asking the questions as a newbie. May be asked a number of times before by >> many but sorry, it is not clear yet to me. >> >> 1. The WAL device is just like journaling

Re: [ceph-users] advice with erasure coding

2018-09-07 Thread Maged Mokhtar
On 2018-09-07 13:52, Janne Johansson wrote: > Den fre 7 sep. 2018 kl 13:44 skrev Maged Mokhtar : > >> Good day Cephers, >> >> I want to get some guidance on erasure coding, the docs do state the >> different plugins and settings but to really understand th

[ceph-users] advice with erasure coding

2018-09-07 Thread Maged Mokhtar
Good day Cephers, I want to get some guidance on erasure coding, the docs do state the different plugins and settings but to really understand them all and their use cases is not easy: -Are the majority of implementations using jerasure and just configuring k and m ? -For jerasure: when/if

Re: [ceph-users] Performance tuning for SAN SSD config

2018-07-06 Thread Maged Mokhtar
On 2018-06-29 18:30, Matthew Stroud wrote: > We back some of our ceph clusters with SAN SSD disk, particularly VSP G/F and > Purestorage. I'm curious what are some settings we should look into modifying > to take advantage of our SAN arrays. We had to manually set the class for the > luns to

Re: [ceph-users] Performance tuning for SAN SSD config

2018-07-06 Thread Maged Mokhtar
grow as you need, you > generally dont need hba/fc enclosed disks but nothing stopping you > from using your existing system. Also you generally dont need any raid > mirroring configurations in the backend since ceph will handle the > redundancy for you. scale out systems have more

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-12 Thread Maged Mokhtar
On 2018-03-12 14:23, David Disseldorp wrote: > On Fri, 09 Mar 2018 11:23:02 +0200, Maged Mokhtar wrote: > >> 2)I undertand that before switching the path, the initiator will send a >> TMF ABORT can we pass this to down to the same abort_request() function >> in

Re: [ceph-users] Fwd: [ceph bad performance], can't find a bottleneck

2018-03-12 Thread Maged Mokhtar
Hi, Try increasing the queue depth from default 128 to 1024: rbd map image-XX -o queue_depth=1024 Also if you run multiple rbd images/fio tests, do you get higher combined performance ? Maged On 2018-03-12 17:16, Sergey Kotov wrote: > Dear moderator, i subscribed to ceph list today,

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-12 Thread Maged Mokhtar
On 2018-03-12 21:00, Ilya Dryomov wrote: > On Mon, Mar 12, 2018 at 7:41 PM, Maged Mokhtar <mmokh...@petasan.org> wrote: > >> On 2018-03-12 14:23, David Disseldorp wrote: >> >> On Fri, 09 Mar 2018 11:23:02 +0200, Maged Mokhtar wrote: >> >> 2)

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-10 Thread Maged Mokhtar
-- From: "Jason Dillaman" Sent: Sunday, March 11, 2018 1:46 AM To: "shadow_lin" Cc: "Lazuardi Nasution" ; "Ceph Users" Subject: Re: [ceph-users] iSCSI

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-09 Thread Maged Mokhtar
Hi Mike, > For the easy case, the SCSI command is sent directly to krbd and so if > osd_request_timeout is less than M seconds then the command will be > failed in time and we would not hit the problem above. > If something happens in the target stack like the SCSI command gets > stuck/queued

  1   2   >