[ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic
Hi, I am setting up a test ceph cluster, on decommissioned hardware (hence : not optimal, I know). I have installed CentOS7, installed and setup ceph mons and OSD machines using puppet, and now I'm trying to add OSDs with the servers OSD disks... and I have issues (of course ;) ) I used the

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic
::profile::params::osds: '/dev/disk/by-path/pci-\:0a\:00.0-scsi-0\:2\:': (I tried without the backslashes too) -Message d'origine- De : Loic Dachary [mailto:l...@dachary.org] Envoyé : jeudi 9 octobre 2014 15:01 À : SCHAER Frederic; ceph-users@lists.ceph.com Objet : Re: [ceph-users] ceph

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic
(5.0 GiB) Attribute flags: Partition name: 'ceph journal' Puzzling, isn't it ? -Message d'origine- De : Loic Dachary [mailto:l...@dachary.org] Envoyé : jeudi 9 octobre 2014 15:37 À : SCHAER Frederic; ceph-users@lists.ceph.com Objet : Re: [ceph-users] ceph-dis prepare

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic
-Message d'origine- De : Loic Dachary [mailto:l...@dachary.org] Envoyé : jeudi 9 octobre 2014 16:20 À : SCHAER Frederic; ceph-users@lists.ceph.com Objet : Re: [ceph-users] ceph-dis prepare : UUID=---- On 09/10/2014 16:04, SCHAER Frederic wrote: Hi

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-10 Thread SCHAER Frederic
-Message d'origine- De : Loic Dachary [mailto:l...@dachary.org] The failure journal check: ondisk fsid ---- doesn't match expected 244973de-7472-421c-bb25-4b09d3f8d441 and the udev logs DEBUG:ceph-disk:Journal /dev/sdc2 has OSD UUID

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-10 Thread SCHAER Frederic
;) Regards -Message d'origine- De : Loic Dachary [mailto:l...@dachary.org] Envoyé : vendredi 10 octobre 2014 14:37 À : SCHAER Frederic; ceph-users@lists.ceph.com Objet : Re: [ceph-users] ceph-dis prepare : UUID=---- Hi Frederic, To be 100% sure it would

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-30 Thread SCHAER Frederic
Hi loic, Back on this issue... Using the epel package, I still get prepared-only disks, e.g : /dev/sdc : /dev/sdc1 ceph data, prepared, cluster ceph, journal /dev/sdc2 /dev/sdc2 ceph journal, for /dev/sdc1 Looking at udev output, I can see that there is no ACTION=add with

[ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread SCHAER Frederic
Hi, I'm used to RAID software giving me the failing disks slots, and most often blinking the disks on the disk bays. I recently installed a DELL 6GB HBA SAS JBOD card, said to be an LSI 2008 one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) . Since this is an LSI, I

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-18 Thread SCHAER Frederic
do that) Regards De : Craig Lewis [mailto:cle...@centraldesktop.com] Envoyé : lundi 17 novembre 2014 22:32 À : SCHAER Frederic Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? I use `dd` to force activity to the disk I want to replace

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-19 Thread SCHAER Frederic
to find a way. Regards -Message d'origine- De : Carl-Johan Schenström [mailto:carl-johan.schenst...@gu.se] Envoyé : lundi 17 novembre 2014 14:14 À : SCHAER Frederic; Scottix; Erik Logtenberg Cc : ceph-users@lists.ceph.com Objet : RE: [ceph-users] jbod + SMART : how to identify failing

[ceph-users] rogue mount in /var/lib/ceph/tmp/mnt.eml1yz ?

2014-11-19 Thread SCHAER Frederic
Hi, I rebooted a node (I'm doing some tests, and breaking many things ;) ), I see I have : [root@ceph0 ~]# mount|grep sdp1 /dev/sdp1 on /var/lib/ceph/tmp/mnt.eml1yz type xfs (rw,noatime,attr2,inode64,noquota) /dev/sdp1 on /var/lib/ceph/osd/ceph-55 type xfs (rw,noatime,attr2,inode64,noquota)

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-12-11 Thread SCHAER Frederic
...@uni.lu] Envoyé : mercredi 19 novembre 2014 13:42 À : SCHAER Frederic Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? Hello again, So whatever magic allows the Dell MD1200 to report the slot position for each disk isn't present in your JBODs

[ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-03 Thread SCHAER Frederic
Hi, I am attempting to test the cephfs filesystem layouts. I created a user with rights to write only in one pool : client.puppet key:zzz caps: [mon] allow r caps: [osd] allow rwx pool=puppet I also created another pool in which I would assume this user is allowed to do

[ceph-users] XFS recovery on boot : rogue mounts ?

2015-03-02 Thread SCHAER Frederic
Hi, I rebooted a failed server, which is now showing a rogue filesystem mount. Actually, there were also several disks missing in the node, all reported as prepared by ceph-disk, but not activated. [root@ceph2 ~]# grep /var/lib/ceph/tmp /etc/mtab /dev/sdo1 /var/lib/ceph/tmp/mnt.usVRe8 xfs

Re: [ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-04 Thread SCHAER Frederic
] cephfs filesystem layouts : authentication gotchas ? On 03/03/2015 15:21, SCHAER Frederic wrote: By the way : looks like the ceph fs ls command is inconsistent when the cephfs is mounted (I used a locally compiled kmod-ceph rpm): [root@ceph0 ~]# ceph fs ls name: cephfs_puppet, metadata pool

[ceph-users] read performance VS network usage

2015-04-23 Thread SCHAER Frederic
Hi again, On my testbed, I have 5 ceph nodes, each containing 23 OSDs (2TB btrfs drives). For these tests, I've setup a RAID0 on the 23 disks. For now, I'm not using SSDs as I discovered my vendor apparently decreased their perfs on purpose... So : 5 server nodes of which 3 are MONS too. I

Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
: jeudi 23 avril 2015 17:21 À : SCHAER Frederic; ceph-users@lists.ceph.com Objet : RE: read performance VS network usage Hi Frederic, If you are using EC pools, the primary OSD requests the remaining shards of the object from the other OSD's, reassembles it and then sends the data to the client

Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
They are also receiving much more data than what rados bench reports (around 275MB/s each)... would that be some sort of data amplification ?? Regards De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic Envoyé : vendredi 24 avril 2015 10:03 À : Nick Fisk

Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
And to reply to myslef... The client apparent network bandwidth is just the fact that dstat aggregates the bridge network interface and the physical interface, thus doubling the data... Ah ah ah. Regards De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic

[ceph-users] ceph-crush-location + SSD detection ?

2015-04-22 Thread SCHAER Frederic
Hi, I've seen and read a few things about ceph-crush-location and I think that's what I need. What I need (want to try) is : a way to have SSDs in non-dedicated hosts, but also to put those SSDs in a dedicated ceph root. From what I read, using ceph-crush-location, I could add a hostname with

Re: [ceph-users] ceph-crush-location + SSD detection ?

2015-04-22 Thread SCHAER Frederic
-Message d'origine- (...) So I just have to associate the mountpoint with the device... provided OSD is mounted when the tool is called. Anyone willing to share experience with ceph-crush-location ? Something like this? https://gist.github.com/wido/5d26d88366e28e25e23d I've used

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-28 Thread SCHAER Frederic
-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic Envoyé : vendredi 24 juillet 2015 16:04 À : Christian Balzer; ceph-users@lists.ceph.com Objet : [PROVENANCE INTERNET] Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ?? Hi, Thanks. I did not know about atop

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-08-07 Thread SCHAER Frederic
De : Jake Young [mailto:jak3...@gmail.com] Envoyé : mercredi 29 juillet 2015 17:13 À : SCHAER Frederic frederic.sch...@cea.fr Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ?? On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic frederic.sch

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-22 Thread SCHAER Frederic
-Message d'origine- De : Gregory Farnum [mailto:g...@gregs42.com] Envoyé : mercredi 22 juillet 2015 16:01 À : Florent MONTHEL Cc : SCHAER Frederic; ceph-users@lists.ceph.com Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ?? We might also be able to help you

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-24 Thread SCHAER Frederic
 : ceph-users@lists.ceph.com Cc : Gregory Farnum; SCHAER Frederic Objet : Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ?? On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote: Your note that dd can do 2GB/s without networking makes me think that you should explore that. As you

Re: [ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-23 Thread SCHAER Frederic
Hi, Well I think the journaling would still appear in the dstat output, as that's still IOs : even if the user-side bandwidth indeed is cut in half, that should not be the case of disks IO. For instance I just tried a replicated pool for the test, and got around 1300MiB/s in dstat for about

[ceph-users] Ceph 0.94 (and lower) performance on 1 hosts ??

2015-07-20 Thread SCHAER Frederic
Hi, As I explained in various previous threads, I'm having a hard time getting the most out of my test ceph cluster. I'm benching things with rados bench. All Ceph hosts are on the same 10GB switch. Basically, I know I can get about 1GB/s of disk write performance per host, when I bench things

[ceph-users] Erasure Coding pool stuck at creation because of pre-existing crush ruleset ?

2015-09-30 Thread SCHAER Frederic
Hi, With 5 hosts, I could successfully create pools with k=4 and m=1, with the failure domain being set to "host". With 6 hosts, I could also create k=4,m=1 EC pools. But I suddenly failed with 6 hosts k=5 and m=1, or k=4,m=2 : the PGs were never created - I reused the pool name for my tests,

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread SCHAER Frederic
, SCHAER Frederic wrote: > Hi, > > Forgive the question if the answer is obvious... It's been more than "an hour > or so" and eu.ceph.com apparently still hasn't been re-signed or at least > what I checked wasn't : > > # rpm -qp --qf '%{RSAHEADER:pgpsig}' > http:/

[ceph-users] unfound objects - why and how to recover ? (bonus : jewel logs)

2016-05-27 Thread SCHAER Frederic
Hi, -- First, let me start with the bonus... I migrated from hammer => jewel and followed the migration instructions... but migrations instructions are missing this : #chown -R ceph:ceph /var/log/ceph I just discoved this was the reason I found no log nowhere about my current issue :/ -- This

Re: [ceph-users] OSD Restart results in "unfound objects"

2016-06-02 Thread SCHAER Frederic
Hi, Same for me... unsetting the bitwise flag considerably lowered the number of unfound objects. I'll have to wait/check for the remaining 214 though... Cheers -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Samuel Just Envoyé : jeudi 2

Re: [ceph-users] OSD Restart results in "unfound objects"

2016-06-01 Thread SCHAER Frederic
I do… In my case, I have collocated the MONs with some OSDs, and no later than Saturday when I lost data again, I found out that one of the MON+OSD nodes ran out of memory and started killing ceph-mon on that node… At the same moment, all OSDs started to complain about not being able to see

Re: [ceph-users] osds udev rules not triggered on reboot (jewel, jessie)

2016-06-24 Thread SCHAER Frederic
Hi, I'm facing the same thing after I reinstalled a node directly in jewel... Reading : http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/31917 I can confirm that running : "udevadm trigger -c add -s block " fires the udev rules and gets ceph-osd up. Thing is : I now have reinstalled

[ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread SCHAER Frederic
Hi, I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here. I built several pools, using pool tiering: - A small replicated SSD pool (5 SSDs only, but I thought it'd be better for IOPS, I intend to test the difference with disks only) - Overlaying a larger

Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-27 Thread SCHAER Frederic
 : mercredi 24 février 2016 19:16 À : SCHAER Frederic <frederic.sch...@cea.fr> Cc : ceph-us...@ceph.com; HONORE Pierre-Francois <pierre-francois.hon...@cea.fr> Objet : Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools) If you run "ra

[ceph-users] ceph startup issues : OSDs don't start

2016-04-21 Thread SCHAER Frederic
Hi, I'm sure I'm doing something wrong, I hope someone can enlighten me... I'm encountering many issues when I restart a ceph server (any ceph server). This is on CentOS 7.2, ceph-0.94.6-0.el7.x86_64. Firt : I have disabled abrt. I don't need abrt. But when I restart, I see these logs in the

[ceph-users] ceph OSD down+out =>health ok => remove => PGs backfilling... ?

2016-04-26 Thread SCHAER Frederic
Hi, One simple/quick question. In my ceph cluster, I had a disk wich was in predicted failure. It was so much in predicted failure that the ceph OSD daemon crashed. After the OSD crashed, ceph moved data correctly (or at least that's what I thought), and a ceph -s was giving a "HEALTH_OK".

[ceph-users] ceph startup issues : OSDs don't start

2016-04-21 Thread SCHAER Frederic
Hi, I'm sure I'm doing something wrong, I hope someone can enlighten me... I'm encountering many issues when I restart a ceph server (any ceph server). This is on CentOS 7.2, ceph-0.94.6-0.el7.x86_64. Firt : I have disabled abrt. I don't need abrt. But when I restart, I see these logs in the

[ceph-users] jewel upgrade : MON unable to start

2016-05-02 Thread SCHAER Frederic
Hi, I'm < sort of > following the upgrade instructions on CentOS 7.2. I upgraded 3 OSD nodes without too many issues, even if I would rewrite those upgrade instructions to : #chrony has ID 167 on my systems... this was set at install time ! but I use NTP anyway. yum remove chrony sed -i -e

Re: [ceph-users] jewel upgrade : MON unable to start

2016-05-02 Thread SCHAER Frederic
I believe this is because I did not read the instruction thoroughly enough... this is my first "live upgrade" -Message d'origine- De : Oleksandr Natalenko [mailto:oleksa...@natalenko.name] Envoyé : lundi 2 mai 2016 16:39 À : SCHAER Frederic <frederic.sch...@cea

[ceph-users] ceph crush map rules for EC pools and out OSDs ?

2017-03-01 Thread SCHAER Frederic
Hi, I have 5 data nodes (bluestore, kraken), each with 24 OSDs. I enabled the optimal crush tunables. I'd like to try to "really" use EC pools, but until now I've faced cluster lockups when I was using 3+2 EC pools with a host failure domain. When a host was down for instance ;) Since I'd like

[ceph-users] bluestore behavior on disks sector read errors

2017-06-27 Thread SCHAER Frederic
Hi, Every now and then , sectors die on disks. When this happens on my bluestore (kraken) OSDs, I get 1 PG that becomes degraded. The exact status is : HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 12.127 is active+clean+inconsistent, acting [141,67,85] If I do a # rados

Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-25 Thread SCHAER Frederic
 : mardi 24 juillet 2018 16:50 À : SCHAER Frederic Cc : ceph-users Objet : Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors `ceph versions` -- you're sure all the osds are running 12.2.7 ? osd_skip_data_digest = true is supposed to skip any crc checks during reads

Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-25 Thread SCHAER Frederic
On the good side : this update is forcing us to dive into ceph internals : we'll be more ceph-aware tonight than this morning ;) Cheers Fred -Message d'origine- De : SCHAER Frederic Envoyé : mercredi 25 juillet 2018 09:57 À : 'Dan van der Ster' Cc : ceph-users Objet : RE: [ceph-user

Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-25 Thread SCHAER Frederic
d I don't see object rbd_data.1920e2238e1f29.0dfc (:head ?) in the unflush-able objects... Cheers -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic Envoyé : mercredi 25 juillet 2018 10:28 À : Dan van der Ster Cc : ceph-us

Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread SCHAER Frederic
ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic Envoyé : mardi 24 juillet 2018 15:01 À : ceph-users Objet : [PROVENANCE INTERNET] [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors Hi, I read the 12.2.7 upgrade notes, and set "osd skip data digest =

[ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread SCHAER Frederic
Hi, I read the 12.2.7 upgrade notes, and set "osd skip data digest = true" before I started upgrading from 12.2.6 on my Bluestore-only cluster. As far as I can tell, my OSDs all got restarted during the upgrade and all got the option enabled : This is what I see for a specific OSD taken at

[ceph-users] luminous 12.2.6 -> 12.2.7 active+clean+inconsistent PGs workaround (or wait for 12.2.8+ ?)

2018-09-03 Thread SCHAER Frederic
Hi, For those facing (lots of) active+clean+inconsistent PGs after the luminous 12.2.6 metadata corruption and 12.2.7 upgrade, I'd like to explain how I finally got rid of those. Disclaimer : my cluster doesn't contain highly valuable data, and I can sort of recreate what is actually contains