Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco
Thanks, I'll check it. I want to search also if there is any way to cache file metadata on client, to lower the MDS load. I suppose that files are cached but the client check with MDS if there are changes on files. On my server files are the most of time read-only so MDS data can be also cached

Re: [ceph-users] identifying public buckets

2018-02-21 Thread Robin H. Johnson
On Wed, Feb 21, 2018 at 10:19:58AM +, Dave Holland wrote: > Hi, > > We would like to scan our users' buckets to identify those which are > publicly-accessible, to avoid potential embarrassment (or worse), e.g. > http://www.bbc.co.uk/news/technology-42839462 > > I didn't find a way to use

Re: [ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent

2018-02-21 Thread Brad Hubbard
On Wed, Feb 21, 2018 at 6:40 PM, Yoann Moulin wrote: > Hello, > > I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible > playbook), a few days after, ceph status told me "PG_DAMAGED > Possible data damage: 1 pg inconsistent", I tried to repair the PG

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Patrick Donnelly
Hello Daniel, On Wed, Feb 21, 2018 at 10:26 AM, Daniel Carrasco wrote: > Is possible to make a better distribution on the MDS load of both nodes?. We are aware of bugs with the balancer which are being worked on. You can also manually create a partition if the workload can

Re: [ceph-users] PG overdose protection causing PG unavailability

2018-02-21 Thread David Turner
You could set the flag noin to prevent the new osds from being calculated by crush until you are ready for all of them in the host to be marked in. You can also set initial crush weight to 0 for new pads so that they won't receive any PGs until you're ready for it. On Wed, Feb 21, 2018, 5:46 PM

Re: [ceph-users] How to really change public network in ceph

2018-02-21 Thread David Turner
Osds can change their IP every time they start. When they start and check in with the mons, they tell the mons where they are. Changing your public network requires restarting every daemon. Likely you will want to schedule downtime for this. Clients can be routed and on whatever subnet you want,

Re: [ceph-users] Luminous v12.2.3 released

2018-02-21 Thread Sergey Malinin
Sadly, have to keep going with http://tracker.ceph.com/issues/22510 On Wednesday, February 21, 2018 at 22:50, Abhishek Lekshmanan wrote: > We're happy to announce the third bugfix release of Luminous v12.2.x ___ ceph-users mailing list

Re: [ceph-users] Help with Bluestore WAL

2018-02-21 Thread David Turner
There WAL sis a required party of the osd. If you remove that, then the osd is missing a crucial part of itself and it will be unable to start until the WAL is back online. If the SSD were to fail, then all osds using it would need to be removed and recreated on the cluster. On Tue, Feb 20, 2018,

Re: [ceph-users] Migrating to new pools

2018-02-21 Thread David Turner
I recently migrated several VMs from an HDD pool to an SSD pool without any downtime with proxmox. It is definitely possible with qemu to do no downtime migrations between pools. On Wed, Feb 21, 2018, 8:32 PM Alexandre DERUMIER wrote: > Hi, > > if you use qemu, it's also

Re: [ceph-users] Upgrading inconvenience for Luminous

2018-02-21 Thread David Turner
Having all of the daemons in your cluster able to restart themselves at will sounds terrifying. What's preventing every osd from restarting at the same time? Also, ceph dot releases have been known to break environments. It's the nature of such a widely used software. I would recommend pinning the

Re: [ceph-users] Migrating to new pools

2018-02-21 Thread Alexandre DERUMIER
Hi, if you use qemu, it's also possible to use drive-mirror feature from qemu. (can mirror and migrate from 1 storage to another storage without downtime). I don't known if openstack has implemented it, but It's working fine on proxmox. - Mail original - De: "Anthony D'Atri"

Re: [ceph-users] rm: cannot remove dir and files (cephfs)

2018-02-21 Thread Deepak Naidu
>> rm: cannot remove '/design/4695/8/6-50kb.jpg': No space left on device “No space left on device” issue typically in ceph FS might be caused if you have files > 1million(10) in “single directory”. To mitigate this try increasing the "mds_bal_fragment_size_max" to a higher value, example 7

Re: [ceph-users] Migrating to new pools

2018-02-21 Thread Anthony D'Atri
>> I was thinking we might be able to configure/hack rbd mirroring to mirror to >> a pool on the same cluster but I gather from the OP and your post that this >> is not really possible? > > No, it's not really possible currently and we have no plans to add > such support since it would not be of

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Oliver Freyermuth
Am 21.02.2018 um 14:24 schrieb Alfredo Deza: [snip] >> Are there plans to have something like >> "ceph-volume discover-and-activate" >> which would effectively do something like: >> ceph-volume list and activate all OSDs which are re-discovered from LVM >> metadata? > > This is a good idea, I

[ceph-users] Upgrading inconvenience for Luminous

2018-02-21 Thread Oliver Freyermuth
Dear Cephalopodians, we had our cluster (still in testing phase) configured for automatic updates so we got 12.2.3 "automagically" when it was released. In /etc/sysconfig/ceph, we still have the default: CEPH_AUTO_RESTART_ON_UPGRADE=no so as expected, services were not restarted. However,

[ceph-users] SSD Bluestore Backfills Slow

2018-02-21 Thread Reed Dier
Hi all, I am running into an odd situation that I cannot easily explain. I am currently in the midst of destroy and rebuild of OSDs from filestore to bluestore. With my HDDs, I am seeing expected behavior, but with my SSDs I am seeing unexpected behavior. The HDDs and SSDs are set in crush

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Oliver Freyermuth
Am 21.02.2018 um 15:58 schrieb Alfredo Deza: > On Wed, Feb 21, 2018 at 9:40 AM, Dan van der Ster wrote: >> On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: >>> On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth >>> wrote:

[ceph-users] PG overdose protection causing PG unavailability

2018-02-21 Thread Oliver Freyermuth
Dear Cephalopodians, in a Luminous 12.2.3 cluster with a pool with: - 192 Bluestore OSDs total - 6 hosts (32 OSDs per host) - 2048 total PGs - EC profile k=4, m=2 - CRUSH failure domain = host which results in 2048*6/192 = 64 PGs per OSD on average, I run into issues with PG overdose

[ceph-users] SRV mechanism for looking up mons lacks IPv6 support

2018-02-21 Thread Simon Leinen
We just upgraded our last cluster to Luminous. Since we might need to renumber our mons in the not-too-distant future, it would be nice to remove the literal IP addresses of the mons from ceph.conf. Kraken and above support a DNS-based mechanism for this based on SRV records[1]. Unfortunately

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-21 Thread David Turner
I've been out sick for a couple days. I agree with Bryan Stillwell about setting those flags and doing a rolling restart of all of the osds is a good next step. On Wed, Feb 21, 2018, 3:49 PM Bryan Stillwell wrote: > Bryan, > > > > The good news is that there is progress

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-21 Thread Bryan Stillwell
Bryan, The good news is that there is progress being made on making this harder to screw up. Read this article for example: https://ceph.com/community/new-luminous-pg-overdose-protection/ The bad news is that I don't have a great solution for you regarding your peering problem. I've run

[ceph-users] Ceph auth caps - make it more user error proof

2018-02-21 Thread Enrico Kern
Hey all, i would suggest some changes to the ceph auth caps command. Today i almost fucked up half of one of our openstack regions with i/o errors because of user failure. I tried to add osd blacklist caps to a cinder keyring after luminous upgrade. I did so by issuing ceph auth caps

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco
2018-02-21 19:26 GMT+01:00 Daniel Carrasco : > Hello, > > I've created a Ceph cluster with 3 nodes to serve files to an high traffic > webpage. I've configured two MDS as active and one as standby, but after > add the new system to production I've noticed that MDS are not

[ceph-users] Luminous v12.2.3 released

2018-02-21 Thread Abhishek Lekshmanan
We're happy to announce the third bugfix release of Luminous v12.2.x long term stable release series. It contains a range of bug fixes and stability improvements across Bluestore, CephFS, RBD & RGW. We recommend all the users of 12.2.x series update. Notable Changes --- *CephFS*:

Re: [ceph-users] mon service failed to start

2018-02-21 Thread Brian :
Hello Wasn't this originally an issue with mon store now you are getting a checksum error from an OSD? I think some hardware here in this node is just hosed. On Wed, Feb 21, 2018 at 5:46 PM, Behnam Loghmani wrote: > Hi there, > > I changed SATA port and cable of SSD

[ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco
Hello, I've created a Ceph cluster with 3 nodes to serve files to an high traffic webpage. I've configured two MDS as active and one as standby, but after add the new system to production I've noticed that MDS are not balanced and one server get the most of clients petitions (One MDS about 700 or

Re: [ceph-users] mon service failed to start

2018-02-21 Thread Behnam Loghmani
Hi there, I changed SATA port and cable of SSD disk and also update ceph to version 12.2.3 and rebuild OSDs but when recovery starts OSDs failed with this error: 2018-02-21 21:12:18.037974 7f3479fe2d00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _verify_csum bad crc32c/0x1000 checksum at blob offset

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Alfredo Deza
On Wed, Feb 21, 2018 at 9:40 AM, Dan van der Ster wrote: > On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: >> On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth >> wrote: >>> Many thanks for your replies! >>> >>> Are

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Dan van der Ster
On Wed, Feb 21, 2018 at 2:24 PM, Alfredo Deza wrote: > On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth > wrote: >> Many thanks for your replies! >> >> Are there plans to have something like >> "ceph-volume discover-and-activate" >> which would

Re: [ceph-users] Missing clones

2018-02-21 Thread Karsten Becker
So - here is the feedback. After a long night... The plain copying did not help... it then complains about the Snaps of another VM (also with old Snapshots). I remembered about a thread I read that the problem could solved by converting back to filestore, because you then have access of the data

Re: [ceph-users] mon service failed to start

2018-02-21 Thread Behnam Loghmani
but disks pass all the tests with smartctl, badblocks and there isn't any error on disks. because the ssd has contain WAL/DB of OSDs it's difficult to test it on other cluster nodes On Wed, Feb 21, 2018 at 4:58 PM, wrote: > Could the problem be related with some faulty

Re: [ceph-users] mon service failed to start

2018-02-21 Thread knawnd
Could the problem be related with some faulty hardware (RAID-controller, port, cable) but not disk? Does "faulty" disk works OK on other server? Behnam Loghmani wrote on 21/02/18 16:09: Hi there, I changed the SSD on the problematic node with the new one and reconfigure OSDs and MON service

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-21 Thread Alfredo Deza
On Tue, Feb 20, 2018 at 9:33 PM, nokia ceph wrote: > Hi Alfredo Deza, > > I understand the point between lvm and simple however we see issue , was > it issue in luminous because we use same ceph config and workload from > client. The graphs i attached in previous mail

Re: [ceph-users] ceph-volume activation

2018-02-21 Thread Alfredo Deza
On Tue, Feb 20, 2018 at 9:05 PM, Oliver Freyermuth wrote: > Many thanks for your replies! > > Am 21.02.2018 um 02:20 schrieb Alfredo Deza: >> On Tue, Feb 20, 2018 at 5:56 PM, Oliver Freyermuth >> wrote: >>> Dear Cephalopodians, >>>

Re: [ceph-users] mon service failed to start

2018-02-21 Thread Behnam Loghmani
Hi there, I changed the SSD on the problematic node with the new one and reconfigure OSDs and MON service on it. but the problem occurred again with: "rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2" I get fully confused now. On Tue, Feb 20, 2018 at 5:16 PM,

Re: [ceph-users] Migrating to new pools

2018-02-21 Thread Jason Dillaman
On Tue, Feb 20, 2018 at 8:35 PM, Rafael Lopez wrote: >> There is also work-in-progress for online >> image migration [1] that will allow you to keep using the image while >> it's being migrated to a new destination image. > > > Hi Jason, > > Is there any recommended

Re: [ceph-users] Luminous 12.2.3 Changelog ?

2018-02-21 Thread Wido den Hollander
On 02/21/2018 01:39 PM, Konstantin Shalygin wrote: Is there any changelog for this release ? https://github.com/ceph/ceph/pull/20503 And this one: https://github.com/ceph/ceph/pull/20500 Wido k ___ ceph-users mailing list

Re: [ceph-users] Luminous 12.2.3 Changelog ?

2018-02-21 Thread Konstantin Shalygin
Is there any changelog for this release ? https://github.com/ceph/ceph/pull/20503 k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous 12.2.3 Changelog ?

2018-02-21 Thread Wido den Hollander
They aren't there yet: http://docs.ceph.com/docs/master/release-notes/ And no Git commit yet: https://github.com/ceph/ceph/commits/master/doc/release-notes.rst I think the Release Manager is doing its best to release them asap. 12.2.3 packages were released this morning :) Wido On

[ceph-users] Luminous 12.2.3 Changelog ?

2018-02-21 Thread Christoph Adomeit
Hi there, I noticed that luminous 12.2.3 is already released. Is there any changelog for this release ? Thanks Christoph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to really change public network in ceph

2018-02-21 Thread Mario Giammarco
I try to ask a simpler question: when I change monitors network and the network of osds, how can monitors know the new addresses of osds? Thanks, Mario 2018-02-19 10:22 GMT+01:00 Mario Giammarco : > Hello, > I have a test proxmox/ceph cluster with four servers. > I need to

[ceph-users] identifying public buckets

2018-02-21 Thread Dave Holland
Hi, We would like to scan our users' buckets to identify those which are publicly-accessible, to avoid potential embarrassment (or worse), e.g. http://www.bbc.co.uk/news/technology-42839462 I didn't find a way to use radosgw-admin to report ACL information for a given bucket. And using the API

[ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent

2018-02-21 Thread Yoann Moulin
Hello, I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible playbook), a few days after, ceph status told me "PG_DAMAGED Possible data damage: 1 pg inconsistent", I tried to repair the PG without success, I tried to stop the OSD, flush the journal and restart the OSDs