Re: [ceph-users] Hammer reduce recovery impact

2015-09-11 Thread GuangYang
If we are talking about requests being blocked 60+ seconds, those tunings might not help (they help a lot for average latency during recovering/backfilling). It would be interesting to see the logs for those blocked requests at OSD side (they have level 0), pattern to search might be "slow

Re: [ceph-users] Hammer reduce recovery impact

2015-09-11 Thread Paweł Sadowski
On 09/10/2015 10:56 PM, Robert LeBlanc wrote: > Things I've tried: > > * Lowered nr_requests on the spindles from 1000 to 100. This reduced > the max latency sometimes up to 3000 ms down to a max of 500-700 ms. > it has also reduced the huge swings in latency, but has also reduced > throughput

Re: [ceph-users] bad perf for librbd vs krbd using FIO

2015-09-11 Thread Bill Sanders
Is there a thread on the mailing list (or LKML?) with some background about tcp_low_latency and TCP_NODELAY? Bill On Fri, Sep 11, 2015 at 2:30 AM, Jan Schermer wrote: > Can you try > > echo 1 > /proc/sys/net/ipv4/tcp_low_latency > > And see if it improves things? I remember

Re: [ceph-users] bad perf for librbd vs krbd using FIO

2015-09-11 Thread Somnath Roy
Check this.. http://www.spinics.net/lists/ceph-users/msg16294.html http://tracker.ceph.com/issues/9344 Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bill Sanders Sent: Friday, September 11, 2015 11:17 AM To: Jan Schermer Cc: Rafael Lopez;

[ceph-users] 5Tb useful space based on Erasure Coded Pool

2015-09-11 Thread Mike
Hello Cephers! I have interesting a task from our client. The client have 3000+ video cams (monitoring streets, porchs, entrance, etc.), we need store data from these cams for 30 days. Each cam generating 1.3Tb data for 30 days, total bandwidth is 14Gbit/s. In total we need ( 1.3+3000 ) ~4Pb+

Re: [ceph-users] 9 PGs stay incomplete

2015-09-11 Thread Brad Hubbard
- Original Message - > From: "Wido den Hollander" > To: "ceph-users" > Sent: Friday, 11 September, 2015 6:46:11 AM > Subject: [ceph-users] 9 PGs stay incomplete > > Hi, > > I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery >

Re: [ceph-users] RBD with iSCSI

2015-09-11 Thread Nick Fisk
It’s a long shot, but check if librados is installed. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daleep Bais Sent: 11 September 2015 10:18 To: Jake Young ; p...@daystrom.com Cc: Ceph-User Subject: Re: [ceph-users] RBD with

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Mariusz Gronczewski
Well, if you plan for OSD to have 2GB per daemon and suddenly it eats 4x as much RAM you might get cluster to a unrecoverable state if you can't just increase amount of RAM at will. I managed to recover it because I had only 4 OSDs per machine but I cant imagine what would happen on 36 OSD

Re: [ceph-users] bad perf for librbd vs krbd using FIO

2015-09-11 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Somnath Roy > Sent: 11 September 2015 06:23 > To: Rafael Lopez > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO >

Re: [ceph-users] higher read iop/s for single thread

2015-09-11 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mark Nelson > Sent: 10 September 2015 16:20 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] higher read iop/s for single thread > > I'm not sure you will be able to get there with

Re: [ceph-users] RBD with iSCSI

2015-09-11 Thread Daleep Bais
Hi Jake, Hello Paul, I was able to mount the iscsi target to another initiator. However, after installing the tgt and tgt-rbd, my rbd was not working. Getting error message : *root@ceph-node1:~# rbd ls test1* *rbd: symbol lookup error: rbd: undefined symbol: _ZTIN8librados9WatchCtx* I am using

Re: [ceph-users] bad perf for librbd vs krbd using FIO

2015-09-11 Thread Jan Schermer
Can you try echo 1 > /proc/sys/net/ipv4/tcp_low_latency And see if it improves things? I remember there being an option to disable nagle completely, but it's gone apparently. Jan > On 11 Sep 2015, at 10:43, Nick Fisk wrote: > > > > > >> -Original Message- >>

Re: [ceph-users] RadosGW not working after upgrade to Hammer

2015-09-11 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Arnoud On 26/05/15 16:53, Arnoud de Jonge wrote: > Hi, [...] > > 2015-05-26 17:43:37.352569 7f0fce0ff840 0 ceph version 0.94.1 > (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process radosgw, pid > 4259 2015-05-26 17:43:37.435921 7f0f8a4f2700 0

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Mariusz Gronczewski
On Wed, 09 Sep 2015 08:59:53 -0500, Chad William Seys wrote: > > > Going from 2GB to 8GB is not normal, although some slight bloating is > > expected. > > If I recall correctly, Mariusz's cluster had a period of flapping OSDs? NIC got packet loss under traffic which

Re: [ceph-users] maximum object size

2015-09-11 Thread Ilya Dryomov
On Wed, Sep 9, 2015 at 11:22 AM, HEWLETT, Paul (Paul) wrote: > By setting a parameter osd_max_write_size to 2047Š > This normally defaults to 90 > > Setting to 2048 exposes a bug in Ceph where signed overflow occurs... > > Part of the problem is my expectations.

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-11 Thread Shinobu Kinjo
If you really want to improve performance of *distributed* filesystem like Ceph, Lustre, GPFS, you must consider from networking of the linux kernel. L5: Socket L4: TCP L3: IP L2: Queuing In this discussion, problem could be in L2 which is queuing in descriptor. We may have to take a closer

Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Chad William Seys
> note that I've only did it after most of pg were recovered My guess / hope is that heap free would also help during the recovery process. Recovery causing failures does not seem like the best outcome. :) C. ___ ceph-users mailing list

Re: [ceph-users] ceph-fuse auto down

2015-09-11 Thread Shinobu Kinjo
There should be some complains in /var/log/messages. Can you attach? Shinobu - Original Message - From: "谷枫" To: "ceph-users" Sent: Saturday, September 12, 2015 1:30:49 PM Subject: [ceph-users] ceph-fuse auto down Hi,all My cephfs

Re: [ceph-users] ceph-fuse auto down

2015-09-11 Thread Shinobu Kinjo
Ah, you are using ubuntu, sorry for that. How about: /var/log/dmesg I believe you can attach file not paste. Pasting a bunch of logs would not be good for me -; And when did you notice that cephfs was hung? Shinobu - Original Message - From: "谷枫" To: "Shinobu

Re: [ceph-users] CephFS and caching

2015-09-11 Thread Ilya Dryomov
On Wed, Sep 9, 2015 at 5:34 PM, Gregory Farnum wrote: > On Wed, Sep 9, 2015 at 3:27 PM, Kyle Hutson wrote: >> We are using Hammer - latest released version. How do I check if it's >> getting promoted into the cache? > > Umm...that's a good question. You

[ceph-users] ceph-fuse auto down

2015-09-11 Thread 谷枫
Hi,all My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu 14.04 the kernal version is 3.19.0. I mount the cephfs with ceph-fuse on 9 clients,but some of them (ceph-fuse process) auto down sometimes and i can't find the reason seems like there is no other logs can be found

Re: [ceph-users] Question on cephfs recovery tools

2015-09-11 Thread Shinobu Kinjo
> In your procedure, the umount problems have nothing to do with > corruption. It's (sometimes) hanging because the MDS is offline. If How did you notice that the MDS was offline? It's just because ceph client could not unmount filesystem, or anything? I would like to see logs in mds and osd.

Re: [ceph-users] ceph-fuse auto down

2015-09-11 Thread 谷枫
hi,Shinobu There is no /var/log/messages on my system but i saw the /var/log/syslog and no useful messages be found. I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the "fuse" on the system. Below is the message in it : ProcStatus: Name: ceph-fuse State: D (disk sleep) Tgid:

Re: [ceph-users] higher read iop/s for single thread

2015-09-11 Thread Gregory Farnum
On Fri, Sep 11, 2015 at 9:52 AM, Nick Fisk wrote: >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Mark Nelson >> Sent: 10 September 2015 16:20 >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] higher read

Re: [ceph-users] 9 PGs stay incomplete

2015-09-11 Thread Gregory Farnum
On Thu, Sep 10, 2015 at 9:46 PM, Wido den Hollander wrote: > Hi, > > I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery > test 9 PGs stay incomplete: > > osdmap e78770: 2294 osds: 2294 up, 2294 in > pgmap v1972391: 51840 pgs, 7 pools, 220 TB data, 185 Mobjects

Re: [ceph-users] Hi all Very new to ceph

2015-09-11 Thread John Spray
On Fri, Sep 11, 2015 at 11:57 AM, M.Tarkeshwar Rao wrote: > Hi all, > > We have a product which is written in c++ on Red hat. > > In production our customers using our product with Veritas cluster file > system for HA and as sharded storage(EMC). > > Initially this product

Re: [ceph-users] Hi all Very new to ceph

2015-09-11 Thread Nick Fisk
Hi Tarkeshwar, CephFS is not considered ready for production use currently mainly due to there being no fsck tool. There are people using it so YMMV. However if this app is written in house, is there any chance you could change it to write objects directly into the RADOS layer? The RADOS

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-11 Thread Shinobu Kinjo
Dropwatch.stp would help us see who dropped packets, here packets were dropped at. To do further investigation regarding to networking, I always check: /sys/class/net//statistics/* tc command also is quite useful. Have we already check if there is any bo or not using vmstat? Using vmstat

Re: [ceph-users] 9 PGs stay incomplete

2015-09-11 Thread Wido den Hollander
On 11-09-15 12:22, Gregory Farnum wrote: > On Thu, Sep 10, 2015 at 9:46 PM, Wido den Hollander wrote: >> Hi, >> >> I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery >> test 9 PGs stay incomplete: >> >> osdmap e78770: 2294 osds: 2294 up, 2294 in >> pgmap