Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread negillen negillen
Thanks everyone, update: I tried running on node A: # vmtouch -ev /storage/ # sync; sync The problem persisted; one minute needed to 'ls -Ral' the dir (from node B). After that I ran on node A: # echo 2 /proc/sys/vm/drop_caches And everything became suddenly fast on node B. ls, du, tar, all

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread negillen negillen
It is not only ls; even du or tar are extremely slow. Example with tar (from node B): # time tar c /storage/test10/installed-tests/pts/pgbench-1.5.1//dev/null real1m45.291s user0m0.023s sys 0m0.143s While on the node that originally wrote the dir (node A): # time tar c

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread John Spray
On 16/06/2015 12:11, negillen negillen wrote: This is quite a problem because we have several applications that need to access a large number of files and when we set them to work on CephFS latency skyrockets. The question of what access means here is key. Especially, whether you need the

[ceph-users] НА: Slightly OT question - LSI SAS 2308 / 9207-8i performance

2015-06-16 Thread Межов Игорь Александрович
Hi! As I know: C60X = SAS2 = 3Gbps LS2308 = 6Gbps Onboard SATA3 = 6Gbps (usually only 2 ports) Onboard SATA2 = 3Gbps (4-6 ports) We use Intel S2600 motherboards and R2224GZ4 platforms in our Hammer evaluation instance. C60X connected to 4-drive 2.5 bay: 2 small SAS drives for OS. 2xS3700

Re: [ceph-users] Ceph OSD with OCFS2

2015-06-16 Thread Somnath Roy
Okay…I think the extra layers you have will add some delay, but 1m is high probably (I never tested Ceph on HDD though). We can minimize it probably by optimizing the cluster setup. Please monitor your backend cluster or even the rbd nodes to see if anything is bottleneck there. Also, check if

Re: [ceph-users] Fwd: Too many PGs

2015-06-16 Thread Marek Dohojda
This worked like a champ, thank you kindly! On Jun 15, 2015, at 2:16 PM, Somnath Roy somnath@sandisk.com wrote: If you want to suppress the warning, do this in the conf file.. mon_pg_warn_max_per_osd = 0 or mon_pg_warn_max_per_osd = something big Thanks Regards

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
I forgot to ask, is this with the patched version of tcmalloc that theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? Yes, the patched version of tcmalloc, but also the last version from gperftools git. (I'm talking about qemu here, not osds). I have tried to increased

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread negillen negillen
Fixed! At least looks like fixed. It seems that after migrating every node (both servers and clients) from kernel 3.10.80-1 to 4.0.4-1 the issue disappeared. Now I get decent speeds both for reading files and for getting stats from every node. Thanks everyone! On Tue, Jun 16, 2015 at 1:00 PM,

[ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Cephers, While looking at disk utilization on OSD, I noticed the disk was constantly busy with large number of small writes, further investigation showed that, as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which made the xattrs get from local to extents, which

Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Somnath Roy
Guang, Try to play around with the following conf attributes specially filestore_max_inline_xattr_size and filestore_max_inline_xattrs // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
Hi, some news about qemu with tcmalloc vs jemmaloc. I'm testing with multiple disks (with iothreads) in 1 qemu guest. And if tcmalloc is a little faster than jemmaloc, I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug. increasing

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Mark Nelson
Hi Alexandre, Excellent find! Have you also informed the QEMU developers of your discovery? Mark On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote: Hi, some news about qemu with tcmalloc vs jemmaloc. I'm testing with multiple disks (with iothreads) in 1 qemu guest. And if tcmalloc is a

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Mark Nelson
I forgot to ask, is this with the patched version of tcmalloc that theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue? Mark On 06/16/2015 11:46 AM, Mark Nelson wrote: Hi Alexandre, Excellent find! Have you also informed the QEMU developers of your discovery? Mark On

Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Sage Weil
On Wed, 17 Jun 2015, Zhou, Yuan wrote: FWIW, there was some discussion in OpenStack Swift and their performance tests showed 255 is not the best in recent XFS. They decided to use large xattr boundary size(65535). https://gist.github.com/smerritt/5e7e650abaa20599ff34 If I read this

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-16 Thread Alexandre DERUMIER
Hi, I finally fix it with tcmalloc with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 LD_PRELOAD} = /usr/lib/libtcmalloc_minimal.so.4 qemu I got almost same result than jemmaloc in this case, maybe a littleb it faster Here the iops results for 1qemu vm with iothread by disk

Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
After back-porting Sage's patch to Giant, with radosgw, the xattrs can get inline. I haven't run extensive testing yet, will update once I have some performance data to share. Thanks, Guang Date: Tue, 16 Jun 2015 15:51:44 -0500 From: mnel...@redhat.com To: yguan...@outlook.com;

Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Yuan, Thanks for sharing the link, it is interesting to read. My understanding of the test results, is that with a fixed size of xattrs, using smaller stripe size will incur larger latency for read, which kind of makes sense since there are more k-v pairs, and with the size, it needs to get

[ceph-users] ceph osd out trigerred the pg recovery process, but by the end, why pgs in the out osd as the last replica are kept as active+degraded?

2015-06-16 Thread Cory
Hi ceph experts, I did some test on my ceph cluster recently with following steps: 1. at the beginning, all pgs are active+clean; 2. stop a osd. I observed a lot of pgs are degraded. 3. ceph osd out. 4. then I observed ceph is doing recovery process. the ceph cluster is configured as three

Re: [ceph-users] CephFS client issue

2015-06-16 Thread Christian Balzer
Hello, On Tue, 16 Jun 2015 07:21:54 + Matteo Dacrema wrote: Hi, I've shutoff the node without take any cautions for simulate a real case. Normal shutdown (as opposed to simulating a crash by pulling cables) should not result in any delays due to Ceph timeouts. The

Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-16 Thread Makkelie, R (ITCDCC) - KLM
anyone else who can help me with this? -Original Message- From: Irek Fasikhov malm...@gmail.commailto:irek%20fasikhov%20%3cmalm...@gmail.com%3e To: Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.commailto:%22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e Cc:

Re: [ceph-users] CephFS client issue

2015-06-16 Thread Matteo Dacrema
Hi, I've shutoff the node without take any cautions for simulate a real case. The osd_pool_default_min_size is 2 . Regards, Matteo Da: Christian Balzer ch...@gol.com Inviato: martedì 16 giugno 2015 01:44 A: ceph-users Cc: Matteo Dacrema Oggetto: Re:

Re: [ceph-users] CephFS client issue

2015-06-16 Thread Matteo Dacrema
Hello, you're right. I misunderstood the meaning of the two configuration params: size and min_size. Now it works correctly. Thanks, Matteo Da: Christian Balzer ch...@gol.com Inviato: martedì 16 giugno 2015 09:42 A: ceph-users Cc: Matteo Dacrema

Re: [ceph-users] CephFS client issue

2015-06-16 Thread John Spray
That's expected behaviour. If RADOS can't make your writes safe by replicating them (because no other OSD is available) then clients will pause their writes. See the min_size setting on a pool. John On 16/06/2015 00:11, Matteo Dacrema wrote: With 3.16.3 kernel it seems to be stable but

Re: [ceph-users] Ceph OSD with OCFS2

2015-06-16 Thread gjprabu
Somnath , Yes , we are cloning repository in ceph client shared directory. Please refer the time analyzation Ceph Client Shared Directory : [s...@cephclient2.csez.zohocorpin.com~/ceph/]$ time git clone https://github.com/elastic/elasticsearch.git Cloning into 'elasticsearch'...

Re: [ceph-users] CephFS: delayed objects deletion ?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 11:38 AM, Florent B flor...@coppint.com wrote: I still have this problem on Hammer. My CephFS directory contains 46MB of data, but the pool (configured with layout, not default one) is 6.59GB... How to debug this ? On Mon, Mar 16, 2015 at 4:14 PM, John Spray

[ceph-users] A cache tier issue with rate only at 20MB/s when data move from cold pool to hot pool

2015-06-16 Thread liukai
Hi all, A cache tier, 2 hot node with 8 ssd osd, and 2 cold node with 24 sata osd. The public network rate is 1Mb/s and cluster network rate is 1000Mb/s. Using fuse-client to access the files. The issue is: When the files are in hot pool, the copy rate is very fash.

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer j...@schermer.cz wrote: Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them Well, you're getting past my expertise on the subject, but deleting snapshots can sometimes be expensive,

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
On 16 Jun 2015, at 12:59, Gregory Farnum g...@gregs42.com wrote: On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer j...@schermer.cz wrote: Well, I see mons dropping out when deleting large amount of snapshots, and it leats a _lot_ of CPU to delete them Well, you're getting past my

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
Ping :-) Looks like nobody is bothered by this, can I assume it is normal, doesn’t hurt anything and will grow to millions in time? Jan On 15 Jun 2015, at 10:32, Jan Schermer j...@schermer.cz wrote: Hi, I have ~1800 removed_snaps listed in the output of “ceph osd dump”. Is that

[ceph-users] qemu (or librbd in general) - very high load on client side

2015-06-16 Thread Nikola Ciprich
Hello dear ceph developers and users, I've spent some time tuning and measuring our ceph cluster performance, and noticed quite strange thing.. I've been using fio (using both rbd engine on hosts and direct block (aio) engine inside qemu-kvm guests (qemu connected to ceph storage using rbd))

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread negillen negillen
Thank you very much for your reply! Is there anything I can do to go around that? e.g. setting access caps to be released after a short while? Or is there a command to manually release access caps (so that I could run it in cron)? This is quite a problem because we have several applications that

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 12:03 PM, Jan Schermer j...@schermer.cz wrote: On 16 Jun 2015, at 12:59, Gregory Farnum g...@gregs42.com wrote: On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer j...@schermer.cz wrote: Well, I see mons dropping out when deleting large amount of snapshots, and it leats

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 3:30 AM, Jan Schermer j...@schermer.cz wrote: Thanks for the answer. So it doesn’t hurt performance if it grows to ridiculous size - e.g. no lookup table overhead, stat()ing additional files etc.? Nope, definitely nothing like that. If it gets sufficiently fragmented

[ceph-users] RBD image can ignore the pool limit

2015-06-16 Thread Vickie ch
Hello Cephers, I have a question about pool quota. Is pool quota support RBD? My cluster is Hammer 0.94.1 that have 1 Mon and 3 OSD. Each OSD server have 3 disk. My question is when I set pool quota size 1G on pool rbd. I still can create a image abc = 3G. After I mount and

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Gregory Farnum
Every time you delete a snapshot it goes in removed_snaps. The set of removed snaps is stored as an interval set, so it uses up two integers in the OSDMap for each range. There are some patterns of usage that work out badly for this, but generally if you're creating snapshots as time goes forward

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Gregory Farnum
On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen negil...@gmail.com wrote: Hello everyone, something very strange is driving me crazy with CephFS (kernel driver). I copy a large directory on the CephFS from one node. If I try to perform a 'time ls -alR' on that directory it gets executed

Re: [ceph-users] removed_snaps in ceph osd dump?

2015-06-16 Thread Jan Schermer
Thanks for the answer. So it doesn’t hurt performance if it grows to ridiculous size - e.g. no lookup table overhead, stat()ing additional files etc.? Jan On 16 Jun 2015, at 11:51, Gregory Farnum g...@gregs42.com wrote: Every time you delete a snapshot it goes in removed_snaps. The set of

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen negil...@gmail.com wrote: Thank you very much for your reply! Is there anything I can do to go around that? e.g. setting access caps to be released after a short while? Or is there a command to manually release access caps (so that I could

Re: [ceph-users] A cache tier issue with rate only at 20MB/s when data move from cold pool to hot pool

2015-06-16 Thread Kenneth Waegeman
Hi! We also see this at our site: When we cat a large file from cephfs to /dev/null, we get about 10MB/s data transfer. I also do not see a system resource bottleneck. Our cluster consists of 14 servers with each 16 disks, together forming a EC coded pool. We also have 2SSDs per server for

Re: [ceph-users] CephFS: delayed objects deletion ?

2015-06-16 Thread Gregory Farnum
On Tue, Jun 16, 2015 at 11:55 AM, Florent B flor...@coppint.com wrote: On 06/16/2015 12:47 PM, Gregory Farnum wrote: On Tue, Jun 16, 2015 at 11:38 AM, Florent B flor...@coppint.com wrote: I still have this problem on Hammer. My CephFS directory contains 46MB of data, but the pool (configured

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread negillen negillen
Thanks again, even 'du' performance is terrible on node B (testing on a directory taken from Phoronix): # time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/ 73M /storage/test9/installed-tests/pts/pgbench-1.5.1/ real0m21.044s user0m0.010s sys 0m0.067s Reading the

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Adam Boyhan
You could use vmtouch to drop the specific directory from cache. It has the ability to evict directories/files from memory. From: negillen negillen negil...@gmail.com To: Gregory Farnum g...@gregs42.com Cc: ceph-users ceph-us...@ceph.com Sent: Tuesday, June 16, 2015 7:37:23 AM Subject: Re:

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-16 Thread Jan Schermer
Have you tried just running “sync;sync” on the originating node? Does that achieve the same thing or not? (I guess it could/should). Jan On 16 Jun 2015, at 13:37, negillen negillen negil...@gmail.com wrote: Thanks again, even 'du' performance is terrible on node B (testing on a