[ceph-users] Bluestore zetascale vs rocksdb

2017-02-13 Thread Deepak Naidu
Folks, Has anyone been using Bluestore with CephFS. If so, did you'll test with zetascale vs rocksdb. Any install steps/best practice is appreciated. PS: I still see that Bluestore is "experimental feature" any timeline, when will it be GA stable. -- Deepak

[ceph-users] Ceph server with errors while deployment -- on jewel

2017-02-13 Thread frank
Hi, We have a minimal ceph cluster setup 1 admin node 1 mon node and 2 osds. We use centos 7 on all OS on all servers. Currently while deploying the servers, we recieve the below errors. = [root@admin-ceph ~]# ceph health detail 2017-02-13 16:14:49.652786 7f6b8c6b6700 0 --

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-13 Thread Brad Hubbard
Could one of the reporters open a tracker for this issue and attach the requested debugging data? On Mon, Feb 13, 2017 at 11:18 PM, Donny Davis wrote: > I am having the same issue. When I looked at my idle cluster this morning, > one of the nodes had 400% cpu utilization,

[ceph-users] After upgrading from 0.94.9 to Jewel 10.2.5 on Ubuntu 14.04 OSDs fail to start with a crash dump

2017-02-13 Thread Alfredo Colangelo
Hi Ceph experts, after updating from ceph 0.94.9 to ceph 10.2.5 on Ubuntu 14.04, 2 out of 3 osd processes are unable to start. On another machine the same happened but only on 1 out of 3 OSDs. The update procedure is done via ceph-deploy 1.5.37. Shouldn’t be a permissions problem, because

[ceph-users] radosgw 100-continue problem

2017-02-13 Thread Z Will
Hi: I used nginx + fastcti + radosgw , when configure radosgw with "rgw print continue = true " In RFC 2616 , it says An origin server that sends a 100 (Continue) response MUST ultimately send a final status code, once the request body is received and processed, unless it terminates the

Re: [ceph-users] 答复: 答复: mon is stuck in leveldb and costs nearly 100% cpu

2017-02-13 Thread Shinobu Kinjo
> 2 active+clean+scrubbing+deep * Set noscrub and nodeep-scrub # ceph osd set noscrub # ceph osd set nodeep-scrub * Wait for scrubbing+deep to complete * Do `ceph -s` If still you would be seeing high CPU usage, please identify who is/are eating CPU resource. * ps aux | sort -rk 3,4 |

Re: [ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Bernhard J . M . Grün
Hi Wido, no I did not set special flags - I've used ceph-deploy without further parameters apart from the journal disk/partition that these OSDs should use. Bernhard Wido den Hollander schrieb am Mo., 13. Feb. 2017 um 17:47 Uhr: > > > Op 13 februari 2017 om 16:49 schreef

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread Piotr Dzionek
Ok, Partition GUID code was the same like Partition unique GUID. I used the|sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal' --partition-guid=1:$journal_uuid --typecode=1:$journal_uuid --mbrtogpt -- /dev/sdk| to recreate my journal. However, typecode part should be the

Re: [ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Wido den Hollander
> Op 13 februari 2017 om 16:49 schreef "Bernhard J. M. Grün" > : > > > Hi, > > we are using SMR disks for backup purposes in our Ceph cluster. > We have had massive problems with those disks prior to upgrading to Kernel > 4.9.x. We also dropped XFS as filesystem and

Re: [ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Bernhard J . M . Grün
Hi, we are using SMR disks for backup purposes in our Ceph cluster. We have had massive problems with those disks prior to upgrading to Kernel 4.9.x. We also dropped XFS as filesystem and we now use btrfs (only for those disks). Since we did this we don't have such problems anymore. If you don't

Re: [ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Eugen Block
Thanks for your quick responses, while I was writing my answer we had a rebalancing going on because I started a new crush reweight to get rid of the old re-activated OSDs again, and now that it finished, the cluster is back in healthy state. Thanks, Eugen Zitat von Gregory Farnum

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread ulembke
Hi Piotr, is your partition GUID right? Look with sgdisk: # sgdisk --info=2 /dev/sdd Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown) Partition unique GUID: 396A0C50-738C-449E-9FC6-B2D3A4469E51 First sector: 2048 (at 1024.0 KiB) Last sector: 10485760 (at 5.0 GiB) Partition

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread Piotr Dzionek
I run it on CentOS Linux release 7.3.1611. After running "udevadm test /sys/block/sda/sda1" I don't see that this rule apply to this disk. Hmm I remember that it used to work properly, but some time ago I retested journal disk recreation. I followed the same tutorial like the one pasted here

Re: [ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Gregory Farnum
On Mon, Feb 13, 2017 at 7:05 AM Wido den Hollander wrote: > > > Op 13 februari 2017 om 16:03 schreef Eugen Block : > > > > > > Hi experts, > > > > I have a strange situation right now. We are re-organizing our 4 node > > Hammer cluster from LVM-based OSDs to HDDs.

Re: [ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Wido den Hollander
> Op 13 februari 2017 om 16:03 schreef Eugen Block : > > > Hi experts, > > I have a strange situation right now. We are re-organizing our 4 node > Hammer cluster from LVM-based OSDs to HDDs. When we did this on the > first node last week, everything went smoothly, I removed

Re: [ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Wido den Hollander
> Op 13 februari 2017 om 15:57 schreef Peter Maloney > : > > > Then you're not aware of what the SMR disks do. They are just slow for > all writes, having to read the tracks around, then write it all again > instead of just the one thing you really wanted to

[ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Eugen Block
Hi experts, I have a strange situation right now. We are re-organizing our 4 node Hammer cluster from LVM-based OSDs to HDDs. When we did this on the first node last week, everything went smoothly, I removed the OSDs from the crush map and the rebalancing and recovery finished

Re: [ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Peter Maloney
Then you're not aware of what the SMR disks do. They are just slow for all writes, having to read the tracks around, then write it all again instead of just the one thing you really wanted to write, due to overlap. Then to partially mitigate this, they have some tiny write buffer like 8GB flash,

[ceph-users] SMR disks go 100% busy after ~15 minutes

2017-02-13 Thread Wido den Hollander
Hi, I have a odd case with SMR disks in a Ceph cluster. Before I continue, yes, I am fully aware of SMR and Ceph not playing along well, but there is something happening which I'm not able to fully explain. On a 2x replica cluster with 8TB Seagate SMR disks I can write with about 30MB/sec to

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-13 Thread Donny Davis
I am having the same issue. When I looked at my idle cluster this morning, one of the nodes had 400% cpu utilization, and ceph-mgr was 300% of that. I have 3 AIO nodes, and only one of them seemed to be affected. On Sat, Jan 14, 2017 at 12:18 AM, Brad Hubbard wrote: > Want

Re: [ceph-users] OSDs cannot match up with fast OSD map changes (epochs) during recovery

2017-02-13 Thread Wido den Hollander
> Op 13 februari 2017 om 12:57 schreef Muthusamy Muthiah > : > > > Hi All, > > We also have same issue on one of our platforms which was upgraded from > 11.0.2 to 11.2.0 . The issue occurs on one node alone where CPU hits 100% > and OSDs of that node marked down.

[ceph-users] 答复: 答复: mon is stuck in leveldb and costs nearly 100% cpu

2017-02-13 Thread Chenyehua
Thanks for the response, Shinobu The warning disappears due to your suggesting solution, however the nearly 100% cpu cost still exists and concerns me a lot. So, do you know why the cpu cost is so high? Are there any solutions or suggestions to this problem? Cheers -邮件原件- 发件人: Shinobu

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread koukou73gr
On 2017-02-13 13:47, Wido den Hollander wrote: > > The udev rules of Ceph should chown the journal to ceph:ceph if it's set to > the right partition UUID. > > This blog shows it partially: > http://ceph.com/planet/ceph-recover-osds-after-ssd-journal-failure/ > > This is done by

Re: [ceph-users] OSDs cannot match up with fast OSD map changes (epochs) during recovery

2017-02-13 Thread Muthusamy Muthiah
Hi All, We also have same issue on one of our platforms which was upgraded from 11.0.2 to 11.2.0 . The issue occurs on one node alone where CPU hits 100% and OSDs of that node marked down. Issue not seen on cluster which was installed from scratch with 11.2.0.

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread Wido den Hollander
> Op 13 februari 2017 om 12:06 schreef Piotr Dzionek : > > > Hi, > > I am running ceph Jewel 10.2.5 with separate journals - ssd disks. It > runs pretty smooth, however I stumble upon an issue after system reboot. > Journal disks become owned by root and ceph failed

Re: [ceph-users] - permission denied on journal after reboot

2017-02-13 Thread Craig Chi
Hi, What is your OS? The permission of journal partition should be changed by udev rules: /lib/udev/rules.d/95-ceph-osd.rules In this file, it is described as: # JOURNAL_UUID ACTION=="add", SUBSYSTEM=="block", \ ENV{DEVTYPE}=="partition", \

[ceph-users] - permission denied on journal after reboot

2017-02-13 Thread Piotr Dzionek
Hi, I am running ceph Jewel 10.2.5 with separate journals - ssd disks. It runs pretty smooth, however I stumble upon an issue after system reboot. Journal disks become owned by root and ceph failed to start. /starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4

Re: [ceph-users] Anyone using LVM or ZFS RAID1 for boot drives?

2017-02-13 Thread Willem Jan Withagen
On 13-2-2017 04:22, Alex Gorbachev wrote: > Hello, with the preference for IT mode HBAs for OSDs and journals, > what redundancy method do you guys use for the boot drives. Some > options beyond RAID1 at hardware level we can think of: > > - LVM > > - ZFS RAID1 mode Since it is not quite Ceph,

Re: [ceph-users] 答复: mon is stuck in leveldb and costs nearly 100% cpu

2017-02-13 Thread kefu chai
On Mon, Feb 13, 2017 at 10:53 AM, Shinobu Kinjo wrote: > O.k, that's reasonable answer. Would you do on all hosts which the MON > are running on: > > #* ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname -s`.asok > config show | grep leveldb_log > > Anyway you can compact