Re: [ceph-users] Creating a monmap with V1 & V2 using monmaptool

2019-10-01 Thread Lars Fenneberg
Hey Alberto! Quoting Corona, Alberto (alberto_cor...@comcast.com): > While practicing some disaster recovery I noticed that it currently seems > impossible to add both a v1 and v2 monitor to a monmap using monmaptool. > Is there any way to create a monmap manually to include both protocol >

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard
Removed ceph-de...@vger.kernel.org and added d...@ceph.io On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak wrote: > > Hellow everyone, > > Can you shed the line on the cause of the crash? Could actually client > request trigger it? > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30

[ceph-users] OSD crashed during the fio test

2019-10-01 Thread Alex Litvak
Hellow everyone, Can you shed the line on the cause of the crash? Could actually client request trigger it? Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio_submit retries 16 Sep 30 22:52:58

Re: [ceph-users] cluster network down

2019-10-01 Thread Lars Täuber
Mon, 30 Sep 2019 15:21:18 +0200 Janne Johansson ==> Lars Täuber : > > > > I don't remember where I read it, but it was told that the cluster is > > migrating its complete traffic over to the public network when the cluster > > networks goes down. So this seems not to be the case? > > > > Be

[ceph-users] ceph-osd@n crash dumps

2019-10-01 Thread Del Monaco, Andrea
Hi list, After the nodes ran OOM and after reboot, we are not able to restart the ceph-osd@x services anymore. (Details about the setup at the end). I am trying to do this manually, so we can see the error but all i see is several crash dumps - this is just one of the OSDs which is not

Re: [ceph-users] NFS

2019-10-01 Thread Marc Roos
Yes indeed cephfs and rgw backends. I think you can run into problems with a multi user environment of RGW and nfs-ganesha. I am not getting this working on Luminous. Your rgw config seems ok. Add file logging to debug rgw etc something like this. LOG { ## Default log level for all

[ceph-users] ceph pg repair fails...?

2019-10-01 Thread Jake Grimmett
Dear All, I've just found two inconsistent pg that fail to repair. This might be the same bug as shown here: Cluster is running Nautilus 14.2.2 OS is Scientific Linux 7.6 DB/WAL on NVMe, Data on 12TB HDD Logs below cab also be

Re: [ceph-users] NFS

2019-10-01 Thread Daniel Gryniewicz
Ganesha can export CephFS or RGW. It cannot export anything else (like iscsi or RBD). Config for RGW looks like this: EXPORT { Export_ID=1; Path = "/"; Pseudo = "/rgw"; Access_Type = RW; Protocols = 4; Transports = TCP; FSAL {

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Sasha Litvak
It was hardware indeed. Dell server reported a disk being reset with power on. Checking the usual suspects i.e. controller firmware, controller event log (if I can get one), drive firmware. I will report more when I get a better idea Thank you! On Tue, Oct 1, 2019 at 2:33 AM Brad Hubbard

Re: [ceph-users] Have you enabled the telemetry module yet?

2019-10-01 Thread Stefan Kooman
Quoting Wido den Hollander (w...@42on.com): > Hi, > > The Telemetry [0] module has been in Ceph since the Mimic release and > when enabled it sends back a anonymized JSON back to > https://telemetry.ceph.com/ every 72 hours with information about the > cluster. > > For example: > > - Version(s)

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Robert LeBlanc
On Mon, Sep 30, 2019 at 5:12 PM Sasha Litvak wrote: > > At this point, I ran out of ideas. I changed nr_requests and readahead > parameters to 128->1024 and 128->4096, tuned nodes to performance-throughput. > However, I still get high latency during benchmark testing. I attempted to >

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Robert LeBlanc
On Tue, Oct 1, 2019 at 7:54 AM Robert LeBlanc wrote: > > On Mon, Sep 30, 2019 at 5:12 PM Sasha Litvak > wrote: > > > > At this point, I ran out of ideas. I changed nr_requests and readahead > > parameters to 128->1024 and 128->4096, tuned nodes to > > performance-throughput. However, I still

Re: [ceph-users] Have you enabled the telemetry module yet?

2019-10-01 Thread Mattia Belluco
Hi all, Same situation here: Ceph 13.2.6 on Ubuntu 16.04. Best Mattia On 10/1/19 4:38 PM, Stefan Kooman wrote: > Quoting Wido den Hollander (w...@42on.com): >> Hi, >> >> The Telemetry [0] module has been in Ceph since the Mimic release and >> when enabled it sends back a anonymized JSON back

Re: [ceph-users] ceph pg repair fails...?

2019-10-01 Thread Mattia Belluco
Hi Jake, I am curious to see if your problem is similar to ours (despite the fact we are still on Luminous). Could you post the output of: rados list-inconsistent-obj and rados list-inconsistent-snapset Thanks, Mattia On 10/1/19 1:08 PM, Jake Grimmett wrote: > Dear All, > > I've just

Re: [ceph-users] Nautilus minor versions archive

2019-10-01 Thread Volodymyr Litovka
It's quite pity. We tested solution Ceph 14.2.3 + RDMA in lab, got it working, but at the moment of moving it to the production there was 14.2.4 and we got few hours outage trying to launch. On 02.10.2019 00:28, Paul Emmerich wrote: On Tue, Oct 1, 2019 at 11:21 PM Volodymyr Litovka wrote:

Re: [ceph-users] Nautilus minor versions archive

2019-10-01 Thread Volodymyr Litovka
On 02.10.2019 00:56, Paul Emmerich wrote: There's virtually no difference between 14.2.3 and 14.2.4, it's only a bug fix for running ceph-volume without a terminal on stderr on some environments. Nevetheless. There is one more minor change - kernel 5.0.xx -> 5.0.yy (which probably can imply

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Sasha Litvak
All, Thank you for your suggestions. During the last night test, I had at least one drive on one node doing a power-on reset by the controller. It caused a couple of OSDs asserting / timing out on that node. I am testing and updating the usual suspects on this node and after that on a whole

Re: [ceph-users] NFS

2019-10-01 Thread Brent Kennedy
We might have to backup a step here so I can understand. Are you saying stand up a new VM with just those packages installed, then configure the export file ( the file location isn’t mentioned in the ceph docs ) and supposedly a client can connect to them? ( only linux clients or any NFS

[ceph-users] Issues with data distribution on Nautilus / weird filling behavior

2019-10-01 Thread Philippe D'Anjou
Hi,this is a fresh Nautilus cluster, but there is a second old one that was upgraded from Luminous to Nautilus, both experience the same symptoms. First of all the data distribution on the OSDs is very bad. Now that could be due to low PGs although I get no recommendation to raise the PG number

Re: [ceph-users] ceph pg repair fails...?

2019-10-01 Thread Brad Hubbard
On Wed, Oct 2, 2019 at 1:15 AM Mattia Belluco wrote: > > Hi Jake, > > I am curious to see if your problem is similar to ours (despite the fact > we are still on Luminous). > > Could you post the output of: > > rados list-inconsistent-obj > > and > > rados list-inconsistent-snapset Make sure

Re: [ceph-users] Nautilus minor versions archive

2019-10-01 Thread Paul Emmerich
There's virtually no difference between 14.2.3 and 14.2.4, it's only a bug fix for running ceph-volume without a terminal on stderr on some environments. Unrelated: Do you really want to run the RDMA messenger in production? I wouldn't dare if the system is in any way critical. Paul -- Paul

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard
If it is only this one osd I'd be inclined to be taking a hard look at the underlying hardware and how it behaves/performs compared to the hw backing identical osds. The less likely possibility is that you have some sort of "hot spot" causing resource contention for that osd. To investigate that

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Sasha Litvak
I updated firmware and kernel, running torture tests. So far no assert, but I still noticed this on the same osd as yesterday Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd05d7700' had timed out

[ceph-users] hanging/stopped recovery/rebalance in Nautilus

2019-10-01 Thread Philippe D'Anjou
Hi,I often observed now that the recovery/rebalance in Nautilus starts quite fast but gets extremely slow (2-3 objects/s) even if there are like 20 OSDs involved. Right now I am moving (reweighted to 0) 16x8TB disks, it's running since 4 days and since 12h it's kind of stuck now at   cluster:   

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-10-01 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > Hi List, > > We are planning to move a filesystem workload (currently nfs) to CephFS. > It's around 29 TB. The unusual thing here is the amount of directories > in use to host the files. In order to combat a "too many files in one > directory" scenario a

Re: [ceph-users] ceph-osd@n crash dumps

2019-10-01 Thread Brad Hubbard
On Tue, Oct 1, 2019 at 10:43 PM Del Monaco, Andrea < andrea.delmon...@atos.net> wrote: > Hi list, > > After the nodes ran OOM and after reboot, we are not able to restart the > ceph-osd@x services anymore. (Details about the setup at the end). > > I am trying to do this manually, so we can see

Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-01 Thread Paul Mezzanini
You could also: ceph osd set norebalance -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technology o:(585) 475-3245 | pfm...@rit.edu CONFIDENTIALITY NOTE: The information transmitted,

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Maged Mokhtar
Some suggestions: monitor raw resources such as cpu %util raw disk %util/busy, raw disk iops. instead of running a mix of workloads at this stage, narrow it down first, for example using rbd rand writes and 4k block sizes, then change 1 param at a time for example change the block size. See

[ceph-users] Nautilus minor versions archive

2019-10-01 Thread Volodymyr Litovka
Dear colleagues, is archive of previos ceph minor versions  is available? To be more specific - I'm looking for 14.2.1 or 14.2.2 for Ubuntu, but @download.ceph.com only 14.2.4 is available. Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison

[ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-01 Thread Satish Patel
Folks, Method: 1 In my lab i am playing with ceph and trying to understand how to add new OSD without starting rebalancing. I want to add this option on fly so i don't need to restart any services or anything. $ ceph tell mon.* injectargs '--osd_crush_initial_weight 0' $ ceph daemon

Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-01 Thread Satish Patel
You are saying set "ceph osd set norebalance" before running ceph-ansible playbook to add OSD once osd visible in "ceph osd tree" then i should do reweight to 0 and then do "ceph osd unset norebalance" On Tue, Oct 1, 2019 at 2:41 PM Paul Mezzanini wrote: > > You could also: > ceph osd set

[ceph-users] Panic in kernel CephFS client after kernel update

2019-10-01 Thread Kenneth Van Alstyne
All: I’m not sure this should go to LKML or here, but I’ll start here. After upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I started running into kernel panics in the “ceph” module. Based on the call trace, I believe I was able to narrow it down to the following commit in the Linux

Re: [ceph-users] how to set osd_crush_initial_weight 0 without restart any service

2019-10-01 Thread Satish Patel
Paul, I have tried your idea but it didn't work, i did set nobalance but it still did rebalancing and fill lots of data on my new OSD. I believe your option doesn't work with ceph-ansible playbook On Tue, Oct 1, 2019 at 2:45 PM Satish Patel wrote: > > You are saying set "ceph osd set

Re: [ceph-users] Nautilus minor versions archive

2019-10-01 Thread Paul Emmerich
On Tue, Oct 1, 2019 at 11:21 PM Volodymyr Litovka wrote: > > Dear colleagues, > > is archive of previos ceph minor versions is available? no, because reprepro doesn't support that (yeah, that's clearly a solvable problem, but the current workflow just uses reprepro and one repository) Paul

Re: [ceph-users] Panic in kernel CephFS client after kernel update

2019-10-01 Thread Ilya Dryomov
On Tue, Oct 1, 2019 at 6:41 PM Kenneth Van Alstyne wrote: > > All: > I’m not sure this should go to LKML or here, but I’ll start here. After > upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I started running > into kernel panics in the “ceph” module. Based on the call trace, I