Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Daleep Bais
Hi Shinobu, I have 1 X 1TB HDD on each node. The network bandwidth between nodes is 1Gbps. Thanks for the info. I will also try to go through discussion mails related to performance. Thanks. Daleep Singh Bais On Wed, Sep 9, 2015 at 2:09 PM, Shinobu Kinjo wrote: > How

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Shinobu
Did you unmount filesystem using? umount -l Shinobu On Wed, Sep 9, 2015 at 4:31 PM, Goncalo Borges wrote: > Dear Ceph / CephFS gurus... > > Bare a bit with me while I give you a bit of context. Questions will > appear at the end. > > 1) I am currently running

[ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Daleep Bais
Hi, I have made a test ceph cluster of 6 OSD's and 03 MON. I am testing the read write performance for the test cluster and the read IOPS is poor. When I individually test it for each HDD, I get good performance, whereas, when I test it for ceph cluster, it is poor. Between nodes, using iperf,

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Daleep Bais > Sent: 09 September 2015 09:18 > To: Ceph-User > Subject: [ceph-users] Poor IOPS performance with Ceph > > Hi, > > I have made a test ceph cluster of 6

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Shinobu Kinjo
Anyhow this page would help you: http://ceph.com/docs/master/cephfs/disaster-recovery/ Shinobu - Original Message - From: "Shinobu Kinjo" To: "Goncalo Borges" Cc: "ceph-users" Sent: Wednesday, September 9,

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Shinobu Kinjo
Are you using that hdd as also for storing journal data? Or are you using ssd for that purpose? Shinobu - Original Message - From: "Daleep Bais" To: "Shinobu Kinjo" Cc: "Ceph-User" Sent: Wednesday, September 9, 2015 5:59:33

[ceph-users] radula - radosgw(s3) cli tool

2015-09-09 Thread Andrew Bibby (lists)
Hey cephers, Just wanted to briefly announce the release of a radosgw CLI tool that solves some of our team's minor annoyances. Called radula, a nod to the patron animal, this utility acts a lot like s3cmd with some tweaks to meet the expectations of our researchers.

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Shinobu Kinjo
Did you try to identify what kind of processes were accessing filesystem using fuser or lsof and then kill them? If not, you had to do that first. Shinobu - Original Message - From: "Goncalo Borges" To: ski...@redhat.com Sent: Wednesday, September 9, 2015

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Nick Fisk
It looks like you are using the kernel RBD client, ie you ran "rbd map " In which case the librbd settings in the ceph.conf won't have any affect as they are only for if you are using fio with the librbd engine. There are several things you may have to do to improve Kernel client

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Shinobu Kinjo
How many disks does each osd node have? How about networking layer? There are several factors to make your cluster much more stronger. Probably you may need to take a look at other discussion on this mailing list. There was a bunch of discussion about performance. Shinobu - Original Message

Re: [ceph-users] ensuring write activity is finished

2015-09-09 Thread Jan Schermer
I never played much with rados bench but it doesn't seem to have for example settings for synchronous/asynchronous workloads, thus it probably just benchmarks the OSD throughput and ability to write to journal (in write mode) unless you let it run for a longer time. So when you stop rados bench

[ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Goncalo Borges
Dear Ceph / CephFS gurus... Bare a bit with me while I give you a bit of context. Questions will appear at the end. 1) I am currently running ceph 9.0.3 and I have install it to test the cephfs recovery tools. 2) I've created a situation where I've deliberately (on purpose) lost some

Re: [ceph-users] maximum object size

2015-09-09 Thread HEWLETT, Paul (Paul)
By setting a parameter osd_max_write_size to 2047Š This normally defaults to 90 Setting to 2048 exposes a bug in Ceph where signed overflow occurs... Part of the problem is my expectations. Ilya pointed out that one can use libradosstriper to stripe a large object over many OSD¹s. I expected

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Daleep Bais
Hi Nick, I dont have separate SSD / HDD for journal. I am using a 10 G partition on the same HDD for journaling. They are rotating HDD's and not SSD's. I am using below command to run the test: fio --name=test --filename=test --bs=4k --size=4G --readwrite=read / write I did few kernel tuning

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Jan Schermer
Just to recapitulate - the nodes are doing "nothing" when it drops to zero? Not flushing something to drives (iostat)? Not cleaning pagecache (kswapd and similiar)? Not out of any type of memory (slab, min_free_kbytes)? Not network link errors, no bad checksums (those are hard to spot, though)?

Re: [ceph-users] jemalloc and transparent hugepage

2015-09-09 Thread Jan Schermer
This is great, thank you! Jan > On 09 Sep 2015, at 12:37, HEWLETT, Paul (Paul) > wrote: > > Hi Jan > > If I can suggest that you look at: > > http://engineering.linkedin.com/performance/optimizing-linux-memory-managem >

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Jan Schermer
Sorry if I wasn't clear. Going from 2GB to 8GB is not normal, although some slight bloating is expected. In your case it just got much worse than usual for reasons yet unknown. Jan > On 09 Sep 2015, at 12:40, Mariusz Gronczewski > wrote: > > > well I was

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Daleep Bais
That same HDD is used for journal also on a separate 10 G partition. Thanks. Daleep Singh Bais On Wed, Sep 9, 2015 at 2:37 PM, Shinobu Kinjo wrote: > Are you using that hdd as also for storing journal data? > Or are you using ssd for that purpose? > > Shinobu > > -

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Shinobu Kinjo
They may be more help to do something for performance analysis -; http://ceph.com/docs/master/start/hardware-recommendations/ http://www.sebastien-han.fr/blog/2013/10/03/quick-analysis-of-the-ceph-io-layer/ Shinobu - Original Message - From: "Shinobu Kinjo" To:

[ceph-users] RBD with iSCSI

2015-09-09 Thread Daleep Bais
Hi, I am following steps from URL *http://www.sebastien-han.fr/blog/2014/07/07/start-with-the-rbd-support-for-tgt/ * to create a RBD pool and share to another initiator. I am not able to get rbd in the backstore

Re: [ceph-users] jemalloc and transparent hugepage

2015-09-09 Thread Jan Schermer
I looked at THP before. It comes enabled on RHEL6 and on our KVM hosts it merges a lot (~300GB hugepages on a 400GB KVM footprint). I am probably going to disable it and see if it introduces any problems for me - the most important gain here is better processor memory lookup table (cache)

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Jan Schermer
The memory gets used for additional PGs on the OSD. If you were to "swap" PGs between two OSDs, you'll get memory wasted on both of them because tcmalloc doesn't release it.* It usually gets stable after few days even during backfills, so it does get reused if needed. If for some reason your

[ceph-users] EC pool design

2015-09-09 Thread Luis Periquito
I'm in the process of adding more resources to an existing cluster. I'll have 38 hosts, with 2 HDD each, for an EC pool. I plan on adding a cache pool in front of it (is it worth it? S3 data, mostly writes and objects are usually 200kB upwards to several MB/GB...); all of the hosts are on the

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
well I was going by http://ceph.com/docs/master/start/hardware-recommendations/ and planning for 2GB per OSD so that was a suprise maybe there should be warning somewhere ? On Wed, 9 Sep 2015 12:21:15 +0200, Jan Schermer wrote: > The memory gets used for additional PGs

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
On Tue, 08 Sep 2015 16:14:15 -0500, Chad William Seys wrote: > Does 'ceph tell osd.* heap release' help with OSD RAM usage? > > From > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003932.html > > Chad. it did help now, but cluster is in clean state

Re: [ceph-users] Poor IOPS performance with Ceph

2015-09-09 Thread Jan Schermer
For the record --direct=1 (or any O_DIRECT IO anywhere) is by itselt not guaranteed to be unbuffered and synchronous. you need to add --direct=1 --sync=1 --fsync=1 to make sure you are actually flushing the data somewhere. (This puts additional OPS in the queue though) In case of RBD this is

Re: [ceph-users] Ceph Tuning + KV backend

2015-09-09 Thread Jan Schermer
You actually can't know what the network contention is like - you see virtual NICs, but those are overprovisioned on the physical hosts, and the backbone between AWS racks/datacenters are overprovisioned as well (likely). The same goes for CPU and RAM - depending on your kernel and how AWS is

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mariusz Gronczewski
sadly I dont have any from when it was taking excess amounts of memory duing rebuild. But I will remember to do that next time, thanks On Tue, 8 Sep 2015 18:28:48 -0400 (EDT), Shinobu Kinjo wrote: > Have you ever? > >

Re: [ceph-users] jemalloc and transparent hugepage

2015-09-09 Thread HEWLETT, Paul (Paul)
Hi Jan If I can suggest that you look at: http://engineering.linkedin.com/performance/optimizing-linux-memory-managem ent-low-latency-high-throughput-databases where LinkedIn ended up disabling some of the new kernel features to prevent memory thrashing. Search for Transparent Huge Pages..

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread goncalo
Hi Shinobu I did check that page but I do not think that in its current state it helps much. If you look to my email, I did try the operations documented there but nothing substantial really happened. The tools do not produce any output so I am not sure what they did, if they did

Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

2015-09-09 Thread Mark Nelson
On 08/28/2015 10:55 AM, Somnath Roy wrote: Yeah, that means tcmalloc probably caching those as I suspected.. There are some discussion going on in that front, but, unfortunately we concluded to have tcmalloc as default and if somebody needs performance should move to jemalloc. One of the

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Jan Schermer
You can sort of simulate it: * E.g. if you do something silly like "ceph osd crush reweight osd.1 1" you will see the RSS of osd.28 skyrocket. Reweighting it back down will not release the memory until you do "heap release". But this is expected, methinks. Jan > On 09 Sep

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Chad William Seys
> Going from 2GB to 8GB is not normal, although some slight bloating is > expected. If I recall correctly, Mariusz's cluster had a period of flapping OSDs? I experienced a a similar situation using hammer. My OSDs went from 10GB in RAM in a Healthy state to 24GB RAM + 10GB swap in a

Re: [ceph-users] CephFS and caching

2015-09-09 Thread Kyle Hutson
We are using Hammer - latest released version. How do I check if it's getting promoted into the cache? We're using the latest ceph kernel client. Where do I poke at readahead settings there? On Tue, Sep 8, 2015 at 8:29 AM, Gregory Farnum wrote: > On Thu, Sep 3, 2015 at

Re: [ceph-users] CephFS and caching

2015-09-09 Thread Gregory Farnum
On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson wrote: > We are using Hammer - latest released version. How do I check if it's > getting promoted into the cache? Umm...that's a good question. You can run rados ls on the cache pool, but that's not exactly scalable; you can turn up

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Chad William Seys
On Tuesday, September 08, 2015 18:28:48 Shinobu Kinjo wrote: > Have you ever? > > http://ceph.com/docs/master/rados/troubleshooting/memory-profiling/ No. But the command 'ceph tell osd.* heap release' did cause my OSDs to consume the "normal" amount of RAM. ("normal" in this case means the

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Mark Nelson
Yes, under no circumstances is it really ok for an OSD to consume 8GB of RSS! :) It'd be really swell if we could replicate that kind of memory growth in-house on demand. Mark On 09/09/2015 05:56 AM, Jan Schermer wrote: Sorry if I wasn't clear. Going from 2GB to 8GB is not normal, although

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Shinobu Kinjo
Hi Goncalo, >> a./ Under a situation as the one describe above, how can we safely >> terminate cephfs in the clients? I have had situations where >> umount simply hangs and there is no real way to unblock the >> situation unless I reboot the client. If we have hundreds of >>

Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

2015-09-09 Thread Chad William Seys
Thanks Somnath! I found a bug in the tracker to follow: http://tracker.ceph.com/issues/12681 Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS and caching

2015-09-09 Thread Kyle Hutson
On Wed, Sep 9, 2015 at 9:34 AM, Gregory Farnum wrote: > On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson wrote: > > We are using Hammer - latest released version. How do I check if it's > > getting promoted into the cache? > > Umm...that's a good question. You

Re: [ceph-users] CephFS and caching

2015-09-09 Thread Gregory Farnum
On Wed, Sep 9, 2015 at 4:26 PM, Kyle Hutson wrote: > > > On Wed, Sep 9, 2015 at 9:34 AM, Gregory Farnum wrote: >> >> On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson wrote: >> > We are using Hammer - latest released version. How do I check

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Ben Hines
The Ceph docs in general could use a lot of improvement, IMO. There are many, many settings listed, but one must dive into the mailing list to learn which ones are worth tweaking (And often, even *what they do*!) -Ben On Wed, Sep 9, 2015 at 3:51 PM, Mark Kirkwood

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Shinobu Kinjo
That's good point actually. Probably saves our life -; Shinobu - Original Message - From: "Ben Hines" To: "Mark Kirkwood" Cc: "ceph-users" Sent: Thursday, September 10, 2015 8:23:26 AM Subject: Re:

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Mark Kirkwood
On 16/09/14 17:10, pragya jain wrote: > Hi all! > > As document says, ceph has some default pools for radosgw instance. These > pools are: > * .rgw.root > * .rgw.control > * .rgw.gc > * .rgw.buckets > * .rgw.buckets.index > * .log > * .intent-log >

Re: [ceph-users] Question on cephfs recovery tools

2015-09-09 Thread Goncalo Borges
Hey Shinobu Thanks for the replies. a./ Under a situation as the one describe above, how can we safely terminate cephfs in the clients? I have had situations where umount simply hangs and there is no real way to unblock the situation unless I reboot the client. If we have

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Mark Kirkwood
On 10/09/15 11:27, Shinobu Kinjo wrote: > That's good point actually. > Probably saves our life -; > > Shinobu > > - Original Message - > From: "Ben Hines" > To: "Mark Kirkwood" > Cc: "ceph-users" > Sent:

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Vickey Singh
Hey Lincoln On Tue, Sep 8, 2015 at 7:26 PM, Lincoln Bryant wrote: > For whatever it’s worth, my problem has returned and is very similar to > yours. Still trying to figure out what’s going on over here. > > Performance is nice for a few seconds, then goes to 0. This is

[ceph-users] Ceph/Radosgw v0.94 Content-Type versus Content-type

2015-09-09 Thread Chang, Fangzhe (Fangzhe)
I noticed that S3 Java SDK for getContentType() no longer works in Ceph/Radosgw v0.94 (Hammer). It seems that S3 SDK expects the metadata “Content-Type” whereas ceph responds with “Content-type”. Does anyone know how to make a request for having this issue fixed? Fangzhe

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Bill Sanders
We were experiencing something similar in our setup (rados bench does some work, then comes to a screeching halt). No pattern to which OSD's were causing the problem, though. Sounds like similar hardware (This was on Dell R720xd, and yeah, that controller is suuuper frustrating). For us,

Re: [ceph-users] rebalancing taking very long time

2015-09-09 Thread Vickey Singh
Agreed with Alphe , Ceph Hammer (0.94.2) sucks when it comes to recovery and rebalancing. Here is my Ceph Hammer cluster , which is like this for more than 30 hours. You might be thinking about that one OSD which is down and not in. Its intentional, i want to remove that OSD. I want the cluster

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Vickey Singh
Hello Jan On Wed, Sep 9, 2015 at 11:59 AM, Jan Schermer wrote: > Just to recapitulate - the nodes are doing "nothing" when it drops to > zero? Not flushing something to drives (iostat)? Not cleaning pagecache > (kswapd and similiar)? Not out of any type of memory (slab, >

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-09 Thread Lincoln Bryant
Hi Jan, I’ll take a look at all of those things and report back (hopefully :)) I did try setting all of my OSDs to writethrough instead of writeback on the controller, which was significantly more consistent in performance (from 1100MB/s down to 300MB/s, but still occasionally dropping to

Re: [ceph-users] Ceph/Radosgw v0.94 Content-Type versus Content-type

2015-09-09 Thread Robin H. Johnson
On Wed, Sep 09, 2015 at 05:28:26PM +, Chang, Fangzhe (Fangzhe) wrote: > I noticed that S3 Java SDK for getContentType() no longer works in > Ceph/Radosgw v0.94 (Hammer). It seems that S3 SDK expects the metadata > “Content-Type” whereas ceph responds with “Content-type”. > Does anyone know

Re: [ceph-users] rebalancing taking very long time

2015-09-09 Thread Sage Weil
On Wed, 9 Sep 2015, Vickey Singh wrote: > Agreed with Alphe , Ceph Hammer (0.94.2) sucks when it comes to recovery and > rebalancing. > > Here is my Ceph Hammer cluster , which is like this for more than 30 hours. > > You might be thinking about that one OSD which is down and not in.  Its >

[ceph-users] backfilling on a single OSD and caching controllers

2015-09-09 Thread Lionel Bouton
Hi, just a tip I just validated on our hardware. I'm currently converting an OSD from xfs with journal on same platter to btrfs with journal on SSD. To avoid any unwanted movement, I reused the same OSD number, weight and placement : so Ceph is simply backfilling all PGs previously stored on the