Re: [ceph-users] cant get cluster to become healthy. "stale+undersized+degraded+peered"

2015-09-17 Thread Goncalo Borges
Hello Stefan... Those 64 PGs refer to the default rbd pool which is created. Can you please give us the output of # ceph osd pool ls detail # ceph pg dump_stuck The degraded / stale status means that the PGs can not be replicated according to your policies. My guess is that you

[ceph-users] ceph-fuse failed with mount connection time out

2015-09-17 Thread Fulin Sun
Hi, experts While doing the command ceph-fuse /home/ceph/cephfs I got the following error : ceph-fuse[28460]: starting ceph client 2015-09-17 16:03:33.385602 7fabf999b780 -1 init, newargv = 0x2c730c0 newargc=11 ceph-fuse[28460]: ceph mount failed with (110) Connection timed out

Re: [ceph-users] cant get cluster to become healthy. "stale+undersized+degraded+peered"

2015-09-17 Thread Stefan Eriksson
hi here is the info, I have added "ceph osd pool set rbd pg_num 128" but that locks up aswell it seems. Here are the details your after: [cephcluster@ceph01-adm01 ceph-deploy]$ ceph osd pool ls detail pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128

[ceph-users] leveldb compaction error

2015-09-17 Thread Selcuk TUNC
hello, we have noticed leveldb compaction on mount causes a segmentation fault in hammer release(0.94). It seems related to this pull request (github.com/ceph/ceph/pull/4372). Are you planning to backport this fix to next hammer release? -- st ___

[ceph-users] Important security noticed regarding release signing key

2015-09-17 Thread Sage Weil
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Last week, Red Hat investigated an intrusion on the sites of both the Ceph community project (ceph.com) and Inktank (download.inktank.com), which were hosted on a computer system outside of Red Hat infrastructure. Ceph.com provided Ceph community

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hello again, Well, I disabled offloads on the NIC -- didn’t work for me. I also tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere in the thread to no avail. Today I was watching iostat on an OSD box ('iostat -xm 5') when the cluster got into “slow” state: Device:

Re: [ceph-users] benefit of using stripingv2

2015-09-17 Thread Corin Langosch
Hi Greg, Am 17.09.2015 um 16:42 schrieb Gregory Farnum: > Briefly, if you do a lot of small direct IOs (for instance, a database > journal) then striping lets you send each sequential write to a > separate object. This means they don't pile up behind each other > grabbing write locks and can

Re: [ceph-users] cant get cluster to become healthy. "stale+undersized+degraded+peered"

2015-09-17 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 What are your iptable rules? - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Sep 17, 2015 at 1:01 AM, Stefan Eriksson wrote: > hi here is the info, I have added "ceph osd pool set rbd

Re: [ceph-users] benefit of using stripingv2

2015-09-17 Thread Ilya Dryomov
On Thu, Sep 17, 2015 at 6:01 PM, Gregory Farnum wrote: > On Thu, Sep 17, 2015 at 7:55 AM, Corin Langosch > wrote: >> Hi Greg, >> >> Am 17.09.2015 um 16:42 schrieb Gregory Farnum: >>> Briefly, if you do a lot of small direct IOs (for instance, a

Re: [ceph-users] cant get cluster to become healthy. "stale+undersized+degraded+peered"

2015-09-17 Thread Vasu Kulkarni
This happens if you didn't have right ceph.configuratio when you deployed your cluster using ceph-deploy , those 64 pgs are from the default config, Since this is a fresh installation you can delete all default pools, check cluster state for no objects and clean state, setup ceph.conf based on

[ceph-users] help! Ceph Manual Depolyment

2015-09-17 Thread wikison
Is there any detailed manual deployment document? I downloaded the source and built ceph, then installed ceph on 7 computers. I used three as monitors and four as OSD. I followed the official document on ceph.com. But it didn't work and it seemed to be out-dated. Could anybody help me? --

Re: [ceph-users] ceph-fuse failed with mount connection time out

2015-09-17 Thread Gregory Farnum
On Thu, Sep 17, 2015 at 1:15 AM, Fulin Sun wrote: > Hi, experts > > While doing the command > ceph-fuse /home/ceph/cephfs > > I got the following error : > > ceph-fuse[28460]: starting ceph client > 2015-09-17 16:03:33.385602 7fabf999b780 -1 init, newargv = 0x2c730c0

Re: [ceph-users] ceph osd won't boot, resource shortage?

2015-09-17 Thread Peter Sabaini
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 16.09.15 16:41, Peter Sabaini wrote: > Hi all, > > I'm having trouble adding OSDs to a storage node; I've got > about 28 OSDs running, but adding more fails. So, it seems the requisite knob was sysctl fs.aio-max-nr By default, this was set to 64K

Re: [ceph-users] benefit of using stripingv2

2015-09-17 Thread Gregory Farnum
On Wed, Sep 16, 2015 at 11:56 AM, Corin Langosch wrote: > Hi guys, > > afaik rbd always splits the image into chunks of size 2^order (2^22 = 4MB by > default). What's the benefit of specifying > the feature flag "STRIPINGV2"? I couldn't find any documenation about it

Re: [ceph-users] benefit of using stripingv2

2015-09-17 Thread Gregory Farnum
On Thu, Sep 17, 2015 at 7:55 AM, Corin Langosch wrote: > Hi Greg, > > Am 17.09.2015 um 16:42 schrieb Gregory Farnum: >> Briefly, if you do a lot of small direct IOs (for instance, a database >> journal) then striping lets you send each sequential write to a >> separate

Re: [ceph-users] leveldb compaction error

2015-09-17 Thread Gregory Farnum
On Thu, Sep 17, 2015 at 12:41 AM, Selcuk TUNC wrote: > hello, > > we have noticed leveldb compaction on mount causes a segmentation fault in > hammer release(0.94). > It seems related to this pull request (github.com/ceph/ceph/pull/4372). Are > you planning to backport >

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hi Nick, Thanks for responding. Yes, I am. —Lincoln > On Sep 17, 2015, at 11:53 AM, Nick Fisk wrote: > > You are getting a fair amount of reads on the disks whilst doing these > writes. You're not using cache tiering are you? > >> -Original Message- >> From:

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Just a small update — the blocked ops did disappear after doubling the target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this blocked ops problem about 10 times now :) Assuming this is the issue, is there any workaround for this problem (or is it working as intended)? (Should

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-17 Thread Robin H. Johnson
On Thu, Sep 17, 2015 at 09:29:35AM -0700, Sage Weil wrote: > Last week, Red Hat investigated an intrusion on the sites of both the Ceph > community project (ceph.com) and Inktank (download.inktank.com), which > were hosted on a computer system outside of Red Hat infrastructure. > > Ceph.com

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Nick Fisk
You are getting a fair amount of reads on the disks whilst doing these writes. You're not using cache tiering are you? > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Lincoln Bryant > Sent: 17 September 2015 17:42 > To:

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Nick Fisk
Ah rightthis is where it gets interesting. You are probably hitting a cache full on a PG somewhere which is either making everything wait until it flushes or something like that. What cache settings have you got set? I assume you have SSD's for the cache tier? Can you share the size of

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
We have CephFS utilizing a cache tier + EC backend. The cache tier and ec pool sit on the same spinners — no SSDs. Our cache tier has a target_max_bytes of 5TB and the total storage is about 1PB. I do have a separate test pool with 3x replication and no cache tier, and I still see significant

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Lincoln Bryant
Hi Nick, Thanks for the detailed response and insight. SSDs are indeed definitely on the to-buy list. I will certainly try to rule out any hardware issues in the meantime. Cheers, Lincoln > On Sep 17, 2015, at 12:53 PM, Nick Fisk wrote: > > It's probably helped but I fear

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-17 Thread Robin H. Johnson
On Thu, Sep 17, 2015 at 11:19:28AM -0700, Sage Weil wrote: > > Please revoke the old keys, so that if they were taken by the attacker, > > they cannot be used (you can't un-revoke a key generally). > Done: > http://pgp.mit.edu/pks/lookup?search=ceph=index Thank you! -- Robin Hugh Johnson

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Andrija Panic
Another one bites the dust... This is Samsung 850 PRO 256GB... (6 journals on this SSDs just died...) [root@cs23 ~]# smartctl -a /dev/sda smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.66-1.el6.elrepo.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen,

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-17 Thread Michael Kuriger
Thanks for the notice!   Michael Kuriger Sr. Unix Systems Engineer r mk7...@yp.com |  818-649-7235 -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage Weil Sent: Thursday, September 17, 2015 9:30 AM To: ceph-annou...@ceph.com;

Re: [ceph-users] cant get cluster to become healthy. "stale+undersized+degraded+peered"

2015-09-17 Thread Stefan Eriksson
Thanks, I purged all nodes and did purgedata aswell and restarted, after this Everything was fine. You are most certainly right, if anyone else have this error, reinitialize the cluster might be the fastest way forward. Från: Vasu Kulkarni Skickat: den 17 september 2015 17:47 Till: Stefan

Re: [ceph-users] Ceph cluster NO read / write performance :: Ops are blocked

2015-09-17 Thread Nick Fisk
It's probably helped but I fear that your overall design is not going to work well for you. Cache Tier + Base tier + journals on the same disks is going to really hurt. The problem when using cache tiering (especially with EC pools in future releases) is that to modify a block that isn't in

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-17 Thread Sage Weil
On Thu, 17 Sep 2015, Robin H. Johnson wrote: > On Thu, Sep 17, 2015 at 09:29:35AM -0700, Sage Weil wrote: > > Last week, Red Hat investigated an intrusion on the sites of both the Ceph > > community project (ceph.com) and Inktank (download.inktank.com), which > > were hosted on a computer

Re: [ceph-users] ceph-fuse failed with mount connection time out

2015-09-17 Thread Fulin Sun
Hi there Thanks a lot for your reply. Turns out I forget to install ceph medata server. Sorry for causing troubles. Best, Sun. From: Gregory Farnum Date: 2015-09-17 22:39 To: Fulin Sun CC: ceph-users Subject: Re: [ceph-users] ceph-fuse failed with mount connection time out On Thu, Sep

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread James (Fei) Liu-SSI
Hi Quentin, Samsung has so different type of SSD for different type of workload with different SSD media like SLC,MLC,TLC ,3D NAND etc. They were designed for different workloads for different purposes. Thanks for your understanding and support. Regards, James From: ceph-users

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Andrija Panic
" came to the conclusion they we put to an "unintended use". " wtf ? : Best to install them inside shutdown workstation... :) On 18 September 2015 at 01:04, Quentin Hartman wrote: > I ended up having 7 total die. 5 while in service, 2 more when I hooked

[ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet
Hi, I have a cluster with lot of blocked operations each time I try to move data (by reweighting a little an OSD). It's a full SSD cluster, with 10GbE network. In logs, when I have blocked OSD, on the main OSD I can see that : 2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 slow

[ceph-users] pgmap question

2015-09-17 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 My understanding was that pgmap changes only when the location of a PG changes due to backfill or recovery. However, watching ceph -w shows that it increments about every second even with a healthly cluster and client I/O. If there is no client I/O,

[ceph-users] radosgw and keystone version 3 domains

2015-09-17 Thread Robert Duncan
Hi It seems that radosgw cannot find users in Keystone V3 domains, that is, When keystone is configured for domain specific drivers radossgw cannot find the users in the keystone users table (as they are not there) I have a deployment in which ceph providers object block ephemeral and user

Re: [ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet
Some additionnal informations : - I have 4 SSD per node. - the CPU usage is near 0 - IO wait is near 0 too - bandwith usage is also near 0 The whole cluster seems waiting for something... but I don't see what. Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit : > Hi, > > I

[ceph-users] erasure pool, ruleset-root

2015-09-17 Thread Deneau, Tom
I see that I can create a crush rule that only selects osds from a certain node by this: ceph osd crush rule create-simple byosdn1 myhostname osd and if I then create a replicated pool that uses that rule, it does indeed select osds only from that node. I would like to do a similar thing with

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Quentin Hartman
I ended up having 7 total die. 5 while in service, 2 more when I hooked them up to a test machine to collect information from them. To Samsung's credit, they've been great to deal with and are replacing the failed drives, on the condition that I don't use them for ceph again. Apparently they sent

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-17 Thread Quentin Hartman
Well, if you look at the very very fine print on their warranty statement and some spec sheets they say they are only supposed to be used in "Client PCs" and if the application exceeds certain write amounts per day, even if it's below the total volume of writes the drive is supposed to handle, it

[ceph-users] Using cephfs with hadoop

2015-09-17 Thread Fulin Sun
Hi, guys I am wondering if I am able to deploy ceph and hadoop into different cluster nodes and I can still use cephfs as the backend for hadoop access. For example, ceph in cluster 1 and hadoop in cluster 2, while cluster 1 and cluster 2 can be mutally accessed. If so, what would be done

Re: [ceph-users] Lot of blocked operations

2015-09-17 Thread Christian Balzer
Hello, On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote: The items below help, but be a s specific as possible, from OS, kernel version to Ceph version, "ceph -s", any other specific details (pool type, replica size). > Some additionnal informations : > - I have 4 SSD per node. Type,

Re: [ceph-users] pgmap question

2015-09-17 Thread GuangYang
IIRC, the version got increased once the stats of the PG got changed, that is properly the reason why you saw changing with client I/O. Thanks, Guang > Date: Thu, 17 Sep 2015 16:55:41 -0600 > From: rob...@leblancnet.us > To: ceph-users@lists.ceph.com >

Re: [ceph-users] Lot of blocked operations

2015-09-17 Thread GuangYang
Which version are you using? My guess is that the request (op) is waiting for lock (might be ondisk_read_lock of the object, but a debug_osd=20 should be helpful to tell what happened to the op). How do you tell the IO wait is near to 0 (by top?)?  Thanks, Guang

Re: [ceph-users] radosgw and keystone version 3 domains

2015-09-17 Thread Abhishek L
On Fri, Sep 18, 2015 at 4:38 AM, Robert Duncan wrote: > > Hi > > > > It seems that radosgw cannot find users in Keystone V3 domains, that is, > > When keystone is configured for domain specific drivers radossgw cannot find > the users in the keystone users table (as they