Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Alex Litvak
I was planning to upgrade 14.2.1 to 14.2.2 next week.  Since there are few reports of crashes, does any one knows if upgrade somehow triggers the issue?  If not, that what is?  Since this has been 
reported before the upgrade by some, just wondering if upgrade to 14.2.2 makes the problem worse.


On 7/19/2019 9:09 PM, Nigel Williams wrote:


On Sat, 20 Jul 2019 at 04:28, Nathan Fish mailto:lordci...@gmail.com>> wrote:

On further investigation, it seems to be this bug:
http://tracker.ceph.com/issues/38724


We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, 
recovered with:

systemctl reset-failed ceph-osd@160
systemctl start ceph-osd@160



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
Good to know. I tried reset-failed and restart several times, it
didn't work on any of them. I also rebooted one of the hosts, didn't
help. Thankfully it seems they failed far enough apart that our
nearly-empty cluster rebuilt in time. But it's rather worrying.

On Fri, Jul 19, 2019 at 10:09 PM Nigel Williams
 wrote:
>
>
> On Sat, 20 Jul 2019 at 04:28, Nathan Fish  wrote:
>>
>> On further investigation, it seems to be this bug:
>> http://tracker.ceph.com/issues/38724
>
>
> We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, 
> recovered with:
>
> systemctl reset-failed ceph-osd@160
> systemctl start ceph-osd@160
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nigel Williams
On Sat, 20 Jul 2019 at 04:28, Nathan Fish  wrote:

> On further investigation, it seems to be this bug:
> http://tracker.ceph.com/issues/38724


We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this
bug, recovered with:

systemctl reset-failed ceph-osd@160
systemctl start ceph-osd@160
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Konstantin Shalygin

On 7/19/19 5:59 PM, Geoffrey Rhodes wrote:


Holding thumbs this helps however I still don't understand why the 
issue only occurs on ceph-osd nodes.
ceph-mon and ceph-mds nodes and even a cech client with the same 
adapters do not have these issues.


Because osd hosts actually do data storage work and your 1G nics under 
heavy loaded.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple OSD crashes

2019-07-19 Thread Alex Litvak

The issue should have been resolved by backport 
https://tracker.ceph.com/issues/40424 in nautilus, was it merged into 14.2.2 ?

Also do you think it is safe to upgrade from 14.2.1 to 14.2.2 ?

On 7/19/2019 1:05 PM, Paul Emmerich wrote:

I've also encountered a crash just like that after upgrading to 14.2.2

Looks like this issue: http://tracker.ceph.com/issues/37282

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io 
Tel: +49 89 1896585 90


On Fri, Jul 19, 2019 at 11:36 AM Daniel Aberger - Profihost AG mailto:d.aber...@profihost.ag>> wrote:

Hello,

we are experiencing crashing OSDs in multiple independent Ceph clusters.

Each OSD has very similar log entries regarding the crash as far as I
can tell.

Example log: https://pastebin.com/raw/vQ2AJ5ud

I can provide you with more log files. They are too large for pastebin
and I'm not aware of this mailing lists email attachement policy.

Every log consists of the following entries:

2019-07-10 21:36:31.903886 7f322aeff700 -1 rocksdb: submit_transaction
error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key =
0x08c1'.461231.000125574325' Value size = 184)
Put( Prefix = M key = 0x08c1'._fastinfo' Value size = 186)
Put( Prefix = O key =

0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe6f002f'x'
Value size = 325)
Put( Prefix = O key =

0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe'o'
Value size = 1608)
Put( Prefix = L key = 0x0226dc7a Value size = 16440)
2019-07-10 21:36:31.913113 7f322aeff700 -1
/build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_kv_sync_thread()' thread 7f322aeff700 time 2019-07-10
21:36:31.903909
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)

  ceph version 12.2.12-7-g1321c5e91f
(1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)


Unfortunately I'm unable to interpret the dumps. I hope you can help me
with this issue.

Regards,
Daniel



-- 
Mit freundlichen Grüßen

   Daniel Aberger
Ihr Profihost Team

---
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181     | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: i...@profihost.com 


Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Stuart Longland
On 19/7/19 8:43 pm, Marc Roos wrote:
>  
> Maybe a bit of topic, just curious what speeds did you get previously? 
> Depending on how you test your native drive of 5400rpm, the performance 
> could be similar. 4k random read of my 7200rpm/5400 rpm results in 
> ~60iops at 260kB/s.

Well, to be honest I never formally tested the performance prior to the
move to Bluestore.  It was working "acceptably" for my needs, thus I
never had a reason to test it.

It was never a speed demon, but it did well enough for my needs.  Had
Filestore on BTRFS remained an option in Ceph v12, I'd have stayed that way.

> I also wonder why filestore could be that much faster, is this not 
> something else? Maybe some dangerous caching method was on?

My understanding is that Bluestore does not benefit from the Linux
kernel filesystem cache.  On paper, Bluestore *should* be faster, but
it's hard to know for sure.

Maybe I should try migrating back to Filestore and see if that improves
things?
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Mike Christie
On 07/19/2019 02:42 AM, Marc Schöchlin wrote:
> Hello Jason,
> 
> Am 18.07.19 um 20:10 schrieb Jason Dillaman:
>> On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin  wrote:
>>> Hello cephers,
>>>
>>> rbd-nbd crashes in a reproducible way here.
>> I don't see a crash report in the log below. Is it really crashing or
>> is it shutting down? If it is crashing and it's reproducable, can you
>> install the debuginfo packages, attach gdb, and get a full backtrace
>> of the crash?
> 
> I do not get a crash report of rbd-nbd.
> I seems that "rbd-nbd" just terminates, and crashes the xfs filesystem 
> because the nbd device is not available anymore.
> ("rbd nbd ls" shows no mapped device anymore)
> 
>>
>> It seems like your cluster cannot keep up w/ the load and the nbd
>> kernel driver is timing out the IO and shutting down. There is a
>> "--timeout" option on "rbd-nbd" that you can use to increase the
>> kernel IO timeout for nbd.
>>
> I have also a 36TB XFS (non_ec) volume on this virtual system mapped by krbd 
> which is under really heavy read/write usage.
> I never experienced problems like this on this system with similar usage 
> patterns.
> 
> The volume which is involved in the problem only handles a really low load 
> and i was capable to create the error situation by using the simple "find . 
> -type f -name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;" command.
> I copied and read ~1.5 TB of data to this volume without a problem - it seems 
> that the gzip command provokes a io pattern which leads to the error 
> situation.
> 
> As described i use a luminous "12.2.11" client which does not support that 
> "--timeout" option (btw. a backport would be nice).
> Our ceph system runs with a heavy write load, therefore we already set a 60 
> seconds timeout using the following code:
> (https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c)
> 
> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, 
> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 
> 4.15.0-45, ubuntu 16.04) - we never experienced problems like this.
> We only experience problems like this with rbd-nbd > 12.2.5 on ubuntu 16.04 
> (kernel 4.15) or ubuntu 18.04 (kernel 4.15) with erasure encoding or without.
>

Are you only using the nbd_set_timeout tool for this newer kernel combo
to try and workaround the disconnect+io_errors problem in newer kernels,
or did you use that tool to set a timeout with older kernels? I am just
trying to clarify the problem, because the kernel changed behavior and I
am not sure if your issue is the very slow IO or that the kernel now
escalates its error handler by default.

With older kernels no timeout would be set for each command by default,
so if you were not running that tool then you would not see the nbd
disconnect+io_errors+xfs issue. You would just see slow IOs.

With newer kernels, like 4.15, nbd.ko always sets a per command timeout
even if you do not set it via a nbd ioctl/netlink command. By default
the timeout is 30 seconds. After the timeout period then the kernel does
that disconnect+IO_errors error handling which causes xfs to get errors.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
On further investigation, it seems to be this bug:
http://tracker.ceph.com/issues/38724

On Fri, Jul 19, 2019 at 1:38 PM Nathan Fish  wrote:
>
> I came in this morning and started to upgrade to 14.2.2, only to
> notice that 3 OSDs had crashed overnight - exactly 1 on each of 3
> hosts. Apparently there was no data loss, which implies they crashed
> at different times, far enough part to rebuild? Still digging through
> logs to find exactly when they first crashed.
>
> Log from restarting ceph-osd@53:
> https://termbin.com/3e0x
>
> If someone can read this log and get anything out of it I would
> appreciate it. All I can tell is that it wasn't a RocksDB ENOSPC,
> which I have seen before.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple OSD crashes

2019-07-19 Thread Paul Emmerich
I've also encountered a crash just like that after upgrading to 14.2.2

Looks like this issue: http://tracker.ceph.com/issues/37282

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, Jul 19, 2019 at 11:36 AM Daniel Aberger - Profihost AG <
d.aber...@profihost.ag> wrote:

> Hello,
>
> we are experiencing crashing OSDs in multiple independent Ceph clusters.
>
> Each OSD has very similar log entries regarding the crash as far as I
> can tell.
>
> Example log: https://pastebin.com/raw/vQ2AJ5ud
>
> I can provide you with more log files. They are too large for pastebin
> and I'm not aware of this mailing lists email attachement policy.
>
> Every log consists of the following entries:
>
> 2019-07-10 21:36:31.903886 7f322aeff700 -1 rocksdb: submit_transaction
> error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
> Put( Prefix = M key =
> 0x08c1'.461231.000125574325' Value size = 184)
> Put( Prefix = M key = 0x08c1'._fastinfo' Value size = 186)
> Put( Prefix = O key =
>
> 0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe6f002f'x'
> Value size = 325)
> Put( Prefix = O key =
>
> 0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe'o'
> Value size = 1608)
> Put( Prefix = L key = 0x0226dc7a Value size = 16440)
> 2019-07-10 21:36:31.913113 7f322aeff700 -1
> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
> BlueStore::_kv_sync_thread()' thread 7f322aeff700 time 2019-07-10
> 21:36:31.903909
> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
>
>  ceph version 12.2.12-7-g1321c5e91f
> (1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)
>
>
> Unfortunately I'm unable to interpret the dumps. I hope you can help me
> with this issue.
>
> Regards,
> Daniel
>
>
>
> --
> Mit freundlichen Grüßen
>   Daniel Aberger
> Ihr Profihost Team
>
> ---
> Profihost AG
> Expo Plaza 1
> 30539 Hannover
> Deutschland
>
> Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
> URL: http://www.profihost.com | E-Mail: i...@profihost.com
>
> Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
> Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
> Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
> Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Janne Johansson
Den fre 19 juli 2019 kl 12:43 skrev Marc Roos :

>
> Maybe a bit of topic, just curious what speeds did you get previously?
> Depending on how you test your native drive of 5400rpm, the performance
> could be similar. 4k random read of my 7200rpm/5400 rpm results in
> ~60iops at 260kB/s.
> I also wonder why filestore could be that much faster, is this not
> something else? Maybe some dangerous caching method was on?
>

Then again, filestore will use the OS fs caches normally, which bluestore
will not, so unless you tune
your bluestores carefully, it will be far easier to get at least read
caches to work in your favor with
filestore if you have RAM to spare on your OSD hosts.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
I came in this morning and started to upgrade to 14.2.2, only to
notice that 3 OSDs had crashed overnight - exactly 1 on each of 3
hosts. Apparently there was no data loss, which implies they crashed
at different times, far enough part to rebuild? Still digging through
logs to find exactly when they first crashed.

Log from restarting ceph-osd@53:
https://termbin.com/3e0x

If someone can read this log and get anything out of it I would
appreciate it. All I can tell is that it wasn't a RocksDB ENOSPC,
which I have seen before.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus 14.2.2 release announcement

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Alex Litvak wrote:
> Dear Ceph developers,
> 
> Please forgive me if this post offends anyone, but it would be nice if this
> and all other releases would be announced before or shortly after they hit the
> repos.

Yep, my fault.  Abhishek normally does this but he's out on vacation 
this week, and I forgot to do it yesterday.  Fixing a missing item in 
the release notes and then I'll send it all out.

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-19 Thread Ravi Patel
Thank you again for reaching out.

Based on your feedback, we decided to try a few more benchmarks. We were
originally doing single node testing using some internal applications and
this tool:
   - s3bench https://github.com/igneous-systems/s3bench to generate our
results.
Looks like the poor benchmarking contributed to some of a large chunk of
the error.

We just setup cosbench with 200 workers. This seems to have better
performance but still a significant drop from the RADOS layer.
- COSbench we are seeing ~2700 IOP/s for 4K files.
This still represents a ~18x drop from the RADOS layer in terms of IOP/s

It would be good to understand what metrics we should look at in ceph to
debug the issue, what to try next from a tuning perspective, and any other
benchmarks that the community could suggest to help us figure this out.


Cheers,
Ravi



---

Ravi Patel, PhD
Machine Learning Systems Lead
Email: r...@kheironmed.com



On Thu, 18 Jul 2019 at 09:42, Paul Emmerich  wrote:

>
>
> On Thu, Jul 18, 2019 at 3:44 AM Robert LeBlanc 
> wrote:
>
>> I'm pretty new to RGW, but I'm needing to get max performance as well.
>> Have you tried moving your RGW metadata pools to nvme? Carve out a bit of
>> NVMe space and then pin the pool to the SSD class in CRUSH, that way the
>> small metadata ops aren't on slow media.
>>
>
> no, don't do that:
>
> 1) a performance difference of 130 vs. 48k iopos is not due to SSD vs.
> NVMe for metadata unless the SSD is absolute crap
> 2) the OSDs already have an NVMe DB device, it's much easier to use it
> directly than by partioning the NVMes to create a separate partition as a
> normal OSD
>
>
> Assuming your NVMe disks are a reasonable size (30GB per OSD): put the
> metadata pools on the HDDs. It's better to have 48 OSDs with 4 NVMes behind
> them handling metadata than only 4 OSDs with SSDs.
>
> Running mons in VMs with gigabit network is fine for small clusters and
> not a performance problem
>
>
> How are you benchmarking?
>
> Paul
>
>
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 17, 2019 at 5:59 PM Ravi Patel  wrote:
>>
>>> Hello,
>>>
>>> We have deployed ceph cluster and we are trying to debug a massive drop
>>> in performance between the RADOS layer vs the RGW layer
>>>
>>> ## Cluster config
>>> 4 OSD nodes (12 Drives each, NVME Journals, 1 SSD drive) 40GbE NIC
>>> 2 RGW nodes ( DNS RR load balancing) 40GbE NIC
>>> 3 MON nodes 1 GbE NIC
>>>
>>> ## Pool configuration
>>> RGW data pool  - replicated 3x 4M stripe (HDD)
>>> RGW metadata pool - replicated 3x (SSD) pool
>>>
>>> ## Benchmarks
>>> 4K Read IOP/s performance using RADOS Bench 48,000~ IOP/s
>>> 4K Read RGW performance via s3 interface ~ 130 IOP/s
>>>
>>> Really trying to understand how to debug this issue. all the nodes never
>>> break 15% CPU utilization and there is plenty of RAM. The one pathological
>>> issue in our cluster is that the MON nodes are currently on VMs that are
>>> sitting behind a single 1 GbE NIC. (We are in the process of moving them,
>>> but are unsure if that will fix the issue.
>>>
>>> What metrics should we be looking at to debug the RGW layer. Where do we
>>> need to look?
>>>
>>> ---
>>>
>>> Ravi Patel, PhD
>>> Machine Learning Systems Lead
>>> Email: r...@kheironmed.com
>>>
>>>
>>> *Kheiron Medical Technologies*
>>>
>>> kheironmed.com | supporting radiologists with deep learning
>>>
>>> Kheiron Medical Technologies Ltd. is a registered company in England and
>>> Wales. This e-mail and its attachment(s) are intended for the above named
>>> only and are confidential. If they have come to you in error then you must
>>> take no action based upon them but contact us immediately. Any disclosure,
>>> copying, distribution or any action taken or omitted to be taken in
>>> reliance on it is prohibited and may be unlawful. Although this e-mail and
>>> its attachments are believed to be free of any virus, it is the
>>> responsibility of the recipient to ensure that they are virus free. If you
>>> contact us by e-mail then we will store your name and address to facilitate
>>> communications. Any statements contained herein are those of the individual
>>> and not the organisation.
>>>
>>> Registered number: 10184103. Registered office: RocketSpace, 40
>>> Islington High Street, London, N1 8EQ
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 










*Kheiron Medical Technologies*

kheironmed.com 
 | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are 

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-19 Thread David C
Thanks, Jeff. I'll give 14.2.2 a go when it's released.

On Wed, 17 Jul 2019, 22:29 Jeff Layton,  wrote:

> Ahh, I just noticed you were running nautilus on the client side. This
> patch went into v14.2.2, so once you update to that you should be good
> to go.
>
> -- Jeff
>
> On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote:
> > This is almost certainly the same bug that is fixed here:
> >
> > https://github.com/ceph/ceph/pull/28324
> >
> > It should get backported soon-ish but I'm not sure which luminous
> > release it'll show up in.
> >
> > Cheers,
> > Jeff
> >
> > On Wed, 2019-07-17 at 10:36 +0100, David C wrote:
> > > Thanks for taking a look at this, Daniel. Below is the only
> interesting bit from the Ceph MDS log at the time of the crash but I
> suspect the slow requests are a result of the Ganesha crash rather than the
> cause of it. Copying the Ceph list in case anyone has any ideas.
> > >
> > > 2019-07-15 15:06:54.624007 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : 6 slow requests, 5 included below; oldest blocked for > 34.588509
> secs
> > > 2019-07-15 15:06:54.624017 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 33.113514 seconds old, received at 2019-07-15
> 15:06:21.510423: client_request(client.16140784:5571174 setattr
> mtime=2019-07-15 14:59:45.642408 #0x10009079cfb 2019-07
> > > -15 14:59:45.642408 caller_uid=1161, caller_gid=1131{}) currently
> failed to xlock, waiting
> > > 2019-07-15 15:06:54.624020 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.588509 seconds old, received at 2019-07-15
> 15:06:20.035428: client_request(client.16129440:1067288 create
> #0x1000907442e/filePathEditorRegistryPrefs.melDXAtss 201
> > > 9-07-15 14:59:53.694087 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624025 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.583918 seconds old, received at 2019-07-15
> 15:06:20.040019: client_request(client.16140784:5570551 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 14:59:44.171408 cal
> > > ler_uid=1161, caller_gid=1131{}) currently failed to rdlock, waiting
> > > 2019-07-15 15:06:54.624028 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.580632 seconds old, received at 2019-07-15
> 15:06:20.043305: client_request(client.16129440:1067293 unlink
> #0x1000907442e/filePathEditorRegistryPrefs.melcdzxxc 201
> > > 9-07-15 14:59:53.701964 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624032 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 34.538332 seconds old, received at 2019-07-15
> 15:06:20.085605: client_request(client.16129440:1067308 create
> #0x1000907442e/filePathEditorRegistryPrefs.melHHljMk 201
> > > 9-07-15 14:59:53.744266 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:55.014073 7f5fdcdc0700  1 mds.mds01 Updating MDS map
> to version 68166 from mon.2
> > > 2019-07-15 15:06:59.624041 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : 7 slow requests, 2 included below; oldest blocked for > 39.588571
> secs
> > > 2019-07-15 15:06:59.624048 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 30.495843 seconds old, received at 2019-07-15
> 15:06:29.128156: client_request(client.16129440:1072227 create
> #0x1000907442e/filePathEditorRegistryPrefs.mel58AQSv 2019-07-15
> 15:00:02.786754 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:59.624053 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : slow request 39.432848 seconds old, received at 2019-07-15
> 15:06:20.191151: client_request(client.16140784:5570649 mknod
> #0x1000907442e/filePathEditorRegistryPrefs.mel3HZLNE 2019-07-15
> 14:59:44.322408 caller_uid=1161, caller_gid=1131{}) currently failed to
> wrlock, waiting
> > > 2019-07-15 15:07:03.014108 7f5fdcdc0700  1 mds.mds01 Updating MDS map
> to version 68167 from mon.2
> > > 2019-07-15 15:07:04.624096 7f5fda5bb700  0 log_channel(cluster) log
> [WRN] : 8 slow requests, 1 included below; oldest blocked for > 

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer


> On 19 Jul 2019, at 15:25, Sage Weil  wrote:
> 
> On Fri, 19 Jul 2019, Stig Telfer wrote:
>>> On 19 Jul 2019, at 10:01, Konstantin Shalygin  wrote:
 Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
 Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
 successfully.
 
 However, in ceph status I see a warning of the form "Legacy BlueStore 
 stats reporting detected” for all OSDs in the cluster.
 
 Can anyone help me with what has gone wrong, and what should be done to 
 fix it?
>>> I thin you should start to run repair for your OSD's - [1]
>>> [1] 
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html 
>>> 
>>> k
>>> 
>> Thanks Konstantin - running the BlueStore repair as indicated appears to 
>> be working.
>> 
>> One difference from Sage’s description of the scenario is that I did not 
>> explicitly create new OSDs during or after the Nautilus upgrade. Perhaps 
>> the Ceph-Ansible rolling update script did something to trigger this.
> 
> The warning is just telling you there are legacy bluestore instances that 
> were created before nautilus.  It is perhaps a bit harsh--you can also 
> just silence the warning if you don't care about getting the newer 
> and more accurate per-pool stats.
> 
> sage

Thanks Sage and Paul for clarifying that - good to know.

Best wishes,
Stig


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-19 Thread Tarek Zegar
On the host with the osd run:


 ceph-volume lvm list







From:   "☣Adam" 
To: ceph-users@lists.ceph.com
Date:   07/18/2019 03:25 PM
Subject:[EXTERNAL] Re: [ceph-users] Need to replace OSD. How do I find
physical disk
Sent by:"ceph-users" 



The block device can be found in /var/lib/ceph/osd/ceph-$ID/block
# ls -l /var/lib/ceph/osd/ceph-9/block

In my case it links to /dev/sdbvg/sdb which makes is pretty obvious
which drive this is, but the Volume Group and Logical volume could be
named anything.  To see what physical disk(s) make up this volume group
use lvblk (as Reed suggested)
# lvblk

If that drive needs to be located in a computer with many drives,
smartctl should be able to be used to pull the make, model, and serial
number
# smartctl -i /dev/sdb


I was not aware of ceph-volume, or `ceph-disk list` (which is apparently
now deprecated in favor of ceph-volume), so thank you to all in this
thread for teaching about alternative (arguably more proper) ways of
doing this. :-)

On 7/18/19 12:58 PM, Pelletier, Robert wrote:
> How do I find the physical disk in a Ceph luminous cluster in order to
> replace it. Osd.9 is down in my cluster which resides on ceph-osd1 host.
>
>
>
> If I run lsblk -io KNAME,TYPE,SIZE,MODEL,SERIAL I can get the serial
> numbers of all the physical disks for example
>
> sdb    disk  1.8T ST2000DM001-1CH1 Z1E5VLRG
>
>
>
> But how do I find out which osd is mapped to sdb and so on?
>
> When I run df –h I get this
>
>
>
> [root@ceph-osd1 ~]# df -h
>
> Filesystem   Size  Used Avail Use% Mounted on
>
> /dev/mapper/ceph--osd1-root   19G  1.9G   17G  10% /
>
> devtmpfs  48G 0   48G   0% /dev
>
> tmpfs 48G 0   48G   0% /dev/shm
>
> tmpfs 48G  9.3M   48G   1% /run
>
> tmpfs 48G 0   48G   0% /sys/fs/cgroup
>
> /dev/sda3    947M  232M  716M  25% /boot
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-2
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-5
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-0
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-8
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-7
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-33
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-10
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-1
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-38
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-4
>
> tmpfs 48G   24K   48G
1% /var/lib/ceph/osd/ceph-6
>
> tmpfs    9.5G 0  9.5G   0% /run/user/0
>
>
>
>
>
> *Robert Pelletier, **IT and Security Specialist***
>
> Eastern Maine Community College
> (207) 974-4782 | 354 Hogan Rd., Bangor, ME 04401
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=TXW65vJi4jZZ8MBtN2cjvq0bG2nV1-y_EM43NonJWFs=a-SpJzGVKsv4FRPY4Q84J3RrM3-FRsTVJz3f0825pOc=

>
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIF-g=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=TXW65vJi4jZZ8MBtN2cjvq0bG2nV1-y_EM43NonJWFs=a-SpJzGVKsv4FRPY4Q84J3RrM3-FRsTVJz3f0825pOc=




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Stig Telfer wrote:
> > On 19 Jul 2019, at 10:01, Konstantin Shalygin  wrote:
> >> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
> >> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
> >> successfully.
> >> 
> >> However, in ceph status I see a warning of the form "Legacy BlueStore 
> >> stats reporting detected” for all OSDs in the cluster.
> >> 
> >> Can anyone help me with what has gone wrong, and what should be done to 
> >> fix it?
> > I thin you should start to run repair for your OSD's - [1]
> > [1] 
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html 
> > 
> > k
> > 
> Thanks Konstantin - running the BlueStore repair as indicated appears to 
> be working.
> 
> One difference from Sage’s description of the scenario is that I did not 
> explicitly create new OSDs during or after the Nautilus upgrade. Perhaps 
> the Ceph-Ansible rolling update script did something to trigger this.

The warning is just telling you there are legacy bluestore instances that 
were created before nautilus.  It is perhaps a bit harsh--you can also 
just silence the warning if you don't care about getting the newer 
and more accurate per-pool stats.

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs snapshot scripting questions

2019-07-19 Thread Frank Schilder
This is a question I'm interested as well.

Right now, I'm using cephfs-snap from the storage tools project and am quite 
happy with that. I made a small modification, but will probably not change. Its 
a simple and robust tool.

About where to take snapshots. There seems to be a bug in cephfs that implies a 
recommended limit of the total number of snapshots to not more than 400. Hence, 
taking as few as possible (i.e. high up) seems sort of a must. Has this changed 
by now? In case this limit does not exist any more, what would be best practice?

Note that we disabled rolling snapshots due to a not yet fixed bug; see this 
thread: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54233.html

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: ceph-users  on behalf of Robert Ruge 

Sent: 17 July 2019 02:44:02
To: ceph-users@lists.ceph.com
Subject: [ceph-users] cephfs snapshot scripting questions

Greetings.

Before I reinvent the wheel has anyone written a script to maintain X number of 
snapshots on a cephfs file system that can be run through cron?
I am aware of the cephfs-snap code but just wondering if there are any other 
options out there.

On a related note which of these options would be better?

1.   Maintain one .snap directory at the root of the cephfs tree - 
/ceph/.snap

2.   Have a .snap directory for every second level directory 
/ceph/user/.snap

I am thinking the later might make it more obvious for the users to do their 
own restores but wondering what the resource implications of either approach 
might be.

The documentation indicates that I should use kernel >= 4.17 for cephfs.  I’m 
currently using Mimic 13.2.6 on Ubuntu 18.04 with kernel version 4.15.0. What 
issues might I see with this combination? I’m hesitant to upgrade to an 
unsupported kernel on Ubuntu but wondering if I’m going to be playing Russian 
Roulette with this combo.

Are there any gotcha’s I should be aware of before plunging into full blown 
cephfs snapshotting?

Regards and thanks.
Robert Ruge


Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Paul Emmerich
On Fri, Jul 19, 2019 at 1:47 PM Stig Telfer 
wrote:

>
> On 19 Jul 2019, at 10:01, Konstantin Shalygin  wrote:
>
> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
> successfully.
>
> However, in ceph status I see a warning of the form "Legacy BlueStore stats 
> reporting detected” for all OSDs in the cluster.
>
> Can anyone help me with what has gone wrong, and what should be done to fix 
> it?
>
> I thin you should start to run repair for your OSD's - [1]
>
> [1]
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html
>
> k
>
> Thanks Konstantin - running the BlueStore repair as indicated appears to
> be working.
>
> One difference from Sage’s description of the scenario is that I did not
> explicitly create new OSDs during or after the Nautilus upgrade. Perhaps
> the Ceph-Ansible rolling update script did something to trigger this.
>

the known bug is that stats will show up incorrectly; the warning is
expected after the upgrade

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


>
> Best wishes,
> Stig
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-19 Thread Paul Emmerich
bluestore warn on legacy statfs = false

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, Jul 19, 2019 at 1:35 PM nokia ceph  wrote:

> Hi Team,
>
> After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to
> warning state with following error
>
> cn1.chn6m1c1ru1c1.cdn ~# ceph status
>   cluster:
> id: e9afb5f3-4acf-421a-8ae6-caaf328ef888
> health: HEALTH_WARN
> Legacy BlueStore stats reporting detected on 335 OSD(s)
>
>   services:
> mon: 5 daemons, quorum cn1,cn2,cn3,cn4,cn5 (age 114m)
> mgr: cn4(active, since 2h), standbys: cn3, cn1, cn2, cn5
> osd: 335 osds: 335 up (since 112m), 335 in
>
>   data:
> pools:   1 pools, 8192 pgs
> objects: 129.01M objects, 849 TiB
> usage:   1.1 PiB used, 749 TiB / 1.8 PiB avail
> pgs: 8146 active+clean
>  46   active+clean+scrubbing
>
> Checked the bug list and found that this issue is solved however still
> exists ,
>
> https://github.com/ceph/ceph/pull/28563
>
> How to disable this warning?
>
> Thanks,
> Muthu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer

> On 19 Jul 2019, at 10:01, Konstantin Shalygin  wrote:
>> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
>> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
>> successfully.
>> 
>> However, in ceph status I see a warning of the form "Legacy BlueStore stats 
>> reporting detected” for all OSDs in the cluster.
>> 
>> Can anyone help me with what has gone wrong, and what should be done to fix 
>> it?
> I thin you should start to run repair for your OSD's - [1]
> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html 
> 
> k
> 
Thanks Konstantin - running the BlueStore repair as indicated appears to be 
working.

One difference from Sage’s description of the scenario is that I did not 
explicitly create new OSDs during or after the Nautilus upgrade. Perhaps the 
Ceph-Ansible rolling update script did something to trigger this.

Best wishes,
Stig


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-19 Thread nokia ceph
Hi Team,

After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to
warning state with following error

cn1.chn6m1c1ru1c1.cdn ~# ceph status
  cluster:
id: e9afb5f3-4acf-421a-8ae6-caaf328ef888
health: HEALTH_WARN
Legacy BlueStore stats reporting detected on 335 OSD(s)

  services:
mon: 5 daemons, quorum cn1,cn2,cn3,cn4,cn5 (age 114m)
mgr: cn4(active, since 2h), standbys: cn3, cn1, cn2, cn5
osd: 335 osds: 335 up (since 112m), 335 in

  data:
pools:   1 pools, 8192 pgs
objects: 129.01M objects, 849 TiB
usage:   1.1 PiB used, 749 TiB / 1.8 PiB avail
pgs: 8146 active+clean
 46   active+clean+scrubbing

Checked the bug list and found that this issue is solved however still
exists ,

https://github.com/ceph/ceph/pull/28563

How to disable this warning?

Thanks,
Muthu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Geoffrey Rhodes
Hi  Konstantin,

*Thanks I've run the following on all four interfaces:*
sudo ethtool -K  rx off tx off sg off tso off ufo off gso
off gro off lro off rxvlan off txvlan off ntuple off rxhash off

*The following do not seen to be available to change:*
Cannot change udp-fragmentation-offload
Cannot change large-receive-offload
Cannot change ntuple-filters

Holding thumbs this helps however I still don't understand why the issue
only occurs on ceph-osd nodes.
ceph-mon and ceph-mds nodes and even a cech client with the same adapters
do not have these issues.

Kind regards
Geoffrey Rhodes


On Fri, 19 Jul 2019 at 05:24, Konstantin Shalygin  wrote:

> On 7/18/19 7:43 PM, Geoffrey Rhodes wrote:
> > Sure, also attached.
>
> Try to disable flow control via `ethtool -K  rx off tx off`.
>
>
>
> k
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Geoffrey Rhodes
Hi Paul,

*Thanks I've run the following on all four interfaces:*
sudo ethtool -K  rx off tx off sg off tso off ufo off gso off
gro off lro off rxvlan off txvlan off ntuple off rxhash off

*The following do not seen to be available to change:*
Cannot change udp-fragmentation-offload
Cannot change large-receive-offload
Cannot change ntuple-filters

Holding thumbs this helps however I still don't understand why the issue
only occurs on ceph-osd nodes.
ceph-mon and ceph-mds nodes and even a cech client with the same adapters
do not have these issues.

Kind regards
Geoffrey Rhodes


On Thu, 18 Jul 2019 at 18:35, Paul Emmerich  wrote:

> Hi,
>
> Intel 82576 is bad. I've seen quite a few problems with these older
> igb familiy NICs, but losing the PCIe link is a new one.
> I usually see them getting stuck with a message like "tx queue X hung,
> resetting device..."
>
> Try to disable offloading features using ethtool, that sometimes helps
> with the problems that I've seen. Maybe that's just a variant of the stuck
> problem?
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Thu, Jul 18, 2019 at 12:47 PM Geoffrey Rhodes 
> wrote:
>
>> Hi Cephers,
>>
>> I've been having an issue since upgrading my cluster to Mimic 6 months
>> ago (previously installed with Luminous 12.2.1).
>> All nodes that have the same PCIe network card seem to loose network
>> connectivity randomly. (frequency ranges from a few days to weeks per host
>> node)
>> The affected nodes only have the Intel 82576 LAN Card in common,
>> different motherboards, installed drives, RAM and even PSUs.
>> Nodes that have the Intel I350 cards are not affected by the Mimic
>> upgrade.
>> Each host node has recommended RAM installed and has between 4 and 6 OSDs
>> / sata hard drives installed.
>> The cluster operated for over a year (Luminous) without a single issue,
>> only after the Mimic upgrade did the issues begin with these nodes.
>> The cluster is only used for CephFS (file storage, low intensity usage)
>> and makes use of erasure data pool (K=4, M=2).
>>
>> I've tested many things, different kernel versions, different Ubuntu LTS
>> releases, re-installation and even CENTOS 7, different releases of Mimic,
>> different igb drivers.
>> If I stop the ceph-osd daemons the issue does not occur.  If I swap out
>> the Intel 82576 card with the Intel I350 the issue is resolved.
>> I haven't any more ideas other than replacing the cards but I feel the
>> issue is linked to the ceph-osd daemon and a change in the Mimic release.
>> Below are the various software versions and drivers I've tried and a log
>> extract from a node that lost network connectivity. - Any help or
>> suggestions would be greatly appreciated.
>>
>> *OS:*  Ubuntu 16.04 / 18.04 and recently CENTOS 7
>> *Ceph Version:*Mimic (currently 13.2.6)
>> *Network card:*4-PORT 1GB INTEL 82576 LAN CARD (AOC-SG-I4)
>> *Driver:  *   igb
>> *Driver Versions:* 5.3.0-k / 5.3.5.22s / 5.4.0-k
>> *Network Config:* 2 x bonded (LACP) 1GB nic for public net,   2 x
>> bonded (LACP) 1GB nic for private net
>> *Log errors:*
>> Jun 27 12:10:28 cephnode5 kernel: [497346.638608] igb :03:00.0
>> enp3s0f0: PCIe link lost, device now detached
>> Jun 27 12:10:28 cephnode5 kernel: [497346.686752] igb :04:00.1
>> enp4s0f1: PCIe link lost, device now detached
>> Jun 27 12:10:29 cephnode5 kernel: [497347.550473] igb :03:00.1
>> enp3s0f1: PCIe link lost, device now detached
>> Jun 27 12:10:29 cephnode5 kernel: [497347.646785] igb :04:00.0
>> enp4s0f0: PCIe link lost, device now detached
>> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
>> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
>> 10.100.4.1:6809 osd.16 since back 2019-06
>> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
>> 12:10:23.796726)
>> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
>> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
>> 10.100.6.1:6804 osd.20 since back 2019-06
>> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
>> 12:10:23.796726)
>> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
>> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
>> 10.100.7.1:6803 osd.25 since back 2019-06
>> -27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
>> 12:10:23.796726)
>> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
>> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
>> 10.100.8.1:6803 osd.30 since back 2019-06
>> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
>> 12:10:23.796726)
>> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
>> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
>> 

Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Marc Roos
 
Maybe a bit of topic, just curious what speeds did you get previously? 
Depending on how you test your native drive of 5400rpm, the performance 
could be similar. 4k random read of my 7200rpm/5400 rpm results in 
~60iops at 260kB/s.
I also wonder why filestore could be that much faster, is this not 
something else? Maybe some dangerous caching method was on?



-Original Message-
From: Stuart Longland [mailto:stua...@longlandclan.id.au] 
Sent: vrijdag 19 juli 2019 12:22
To: ceph-users
Subject: [ceph-users] Future of Filestore?

Hi all,

Earlier this year, I did a migration from Ceph 10 to 12.  Previously, I 
was happily running Ceph v10 on Filestore with BTRFS, and getting 
reasonable performance.

Moving to Ceph v12 necessitated a migration away from this set-up, and 
reading the documentation, Bluestore seemed to be "the way", so a hasty 
migration was performed and now my then 3-node cluster moved to 
Bluestore.  I've since added two new nodes to that cluster and replaced 
the disks in all systems, so I have 5 WD20SPZX-00Us storing my data.

I'm now getting about 5MB/sec I/O speeds in my VMs.

I'm contemplating whether I migrate back to using Filestore (on XFS this 
time, since BTRFS appears to be a rude word despite Ceph v10 docs 
suggesting it as a good option), but I'm not sure what the road map is 
for supporting Filestore long-term.

Is Filestore likely to have long term support for the next few years or 
should I persevere with tuning Bluestore to get something that won't be 
outperformed by an early 90s PIO mode 0 IDE HDD?
--
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Please help: change IP address of a cluster

2019-07-19 Thread ST Wong (ITSC)
Hi all,

Our cluster has to change to new IP range in same VLAN:  10.0.7.0/24 -> 
10.0.18.0/23, while IP address on private network for OSDs  remains unchanged.
I wonder if we can do that in either one following ways:

=

1.

a.   Define static route for 10.0.18.0/23 on each node

b.   Do it one by one:

For each monitor/mgr:

-  remove from cluster

-  change IP address

-  add static route to original IP range 10.0.7.0/24

-  delete static route for 10.0.18.0/23

-  add back to cluster

For each OSD:

-  stop OSD daemons

-   change IP address

-  add static route to original IP range 10.0.7.0/24

-  delete static route for 10.0.18.0/23

-  start OSD daemons

c.   Clean up all static routes defined.



2.

a.   Export and update monmap using the messy way as described in 
http://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-mons/



ceph mon getmap -o {tmp}/{filename}

monmaptool -rm node1 -rm node2 ... --rm node n {tmp}/{filename}

monmaptool -add node1 v2:10.0.18.1:3330,v1:10.0.18.1:6789 -add node2 
v2:10.0.18.2:3330,v1:10.0.18.2:6789 ... --add nodeN 
v2:10.0.18.N:3330,v1:10.0.18.N:6789  {tmp}/{filename}



b.   stop entire cluster daemons and change IP addresses


c.   For each mon node:  ceph-mon -I {mon-id} -inject-monmap 
{tmp}/{filename}



d.   Restart cluster daemons.



3.   Or any better method...
=

Would anyone please help?   Thanks a lot.
Rgds
/st wong

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Future of Filestore?

2019-07-19 Thread Stuart Longland
Hi all,

Earlier this year, I did a migration from Ceph 10 to 12.  Previously, I
was happily running Ceph v10 on Filestore with BTRFS, and getting
reasonable performance.

Moving to Ceph v12 necessitated a migration away from this set-up, and
reading the documentation, Bluestore seemed to be "the way", so a hasty
migration was performed and now my then 3-node cluster moved to
Bluestore.  I've since added two new nodes to that cluster and replaced
the disks in all systems, so I have 5 WD20SPZX-00Us storing my data.

I'm now getting about 5MB/sec I/O speeds in my VMs.

I'm contemplating whether I migrate back to using Filestore (on XFS this
time, since BTRFS appears to be a rude word despite Ceph v10 docs
suggesting it as a good option), but I'm not sure what the road map is
for supporting Filestore long-term.

Is Filestore likely to have long term support for the next few years or
should I persevere with tuning Bluestore to get something that won't be
outperformed by an early 90s PIO mode 0 IDE HDD?
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple OSD crashes

2019-07-19 Thread Daniel Aberger - Profihost AG
Hello,

we are experiencing crashing OSDs in multiple independent Ceph clusters.

Each OSD has very similar log entries regarding the crash as far as I
can tell.

Example log: https://pastebin.com/raw/vQ2AJ5ud

I can provide you with more log files. They are too large for pastebin
and I'm not aware of this mailing lists email attachement policy.

Every log consists of the following entries:

2019-07-10 21:36:31.903886 7f322aeff700 -1 rocksdb: submit_transaction
error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key =
0x08c1'.461231.000125574325' Value size = 184)
Put( Prefix = M key = 0x08c1'._fastinfo' Value size = 186)
Put( Prefix = O key =
0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe6f002f'x'
Value size = 325)
Put( Prefix = O key =
0x7f80015806b4'(!rbd_data.7c012a6b8b4567.004e!='0xfffe'o'
Value size = 1608)
Put( Prefix = L key = 0x0226dc7a Value size = 16440)
2019-07-10 21:36:31.913113 7f322aeff700 -1
/build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_kv_sync_thread()' thread 7f322aeff700 time 2019-07-10
21:36:31.903909
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)

 ceph version 12.2.12-7-g1321c5e91f
(1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)


Unfortunately I'm unable to interpret the dumps. I hope you can help me
with this issue.

Regards,
Daniel



-- 
Mit freundlichen Grüßen
  Daniel Aberger
Ihr Profihost Team

---
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: i...@profihost.com

Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Konstantin Shalygin

Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
successfully.

However, in ceph status I see a warning of the form "Legacy BlueStore stats 
reporting detected” for all OSDs in the cluster.

Can anyone help me with what has gone wrong, and what should be done to fix it?

I thin you should start to run repair for your OSD's - [1]



[1] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/035889.html


k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer
Hi all - 

Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to 
Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion 
successfully.

However, in ceph status I see a warning of the form "Legacy BlueStore stats 
reporting detected” for all OSDs in the cluster.

Can anyone help me with what has gone wrong, and what should be done to fix it?

Thanks
Stig

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Marc Schöchlin
Hello Jason,

Am 18.07.19 um 20:10 schrieb Jason Dillaman:
> On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin  wrote:
>> Hello cephers,
>>
>> rbd-nbd crashes in a reproducible way here.
> I don't see a crash report in the log below. Is it really crashing or
> is it shutting down? If it is crashing and it's reproducable, can you
> install the debuginfo packages, attach gdb, and get a full backtrace
> of the crash?

I do not get a crash report of rbd-nbd.
I seems that "rbd-nbd" just terminates, and crashes the xfs filesystem because 
the nbd device is not available anymore.
("rbd nbd ls" shows no mapped device anymore)

>
> It seems like your cluster cannot keep up w/ the load and the nbd
> kernel driver is timing out the IO and shutting down. There is a
> "--timeout" option on "rbd-nbd" that you can use to increase the
> kernel IO timeout for nbd.
>
I have also a 36TB XFS (non_ec) volume on this virtual system mapped by krbd 
which is under really heavy read/write usage.
I never experienced problems like this on this system with similar usage 
patterns.

The volume which is involved in the problem only handles a really low load and 
i was capable to create the error situation by using the simple "find . -type f 
-name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;" command.
I copied and read ~1.5 TB of data to this volume without a problem - it seems 
that the gzip command provokes a io pattern which leads to the error situation.

As described i use a luminous "12.2.11" client which does not support that 
"--timeout" option (btw. a backport would be nice).
Our ceph system runs with a heavy write load, therefore we already set a 60 
seconds timeout using the following code:
(https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c)

We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, 
kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 
4.15.0-45, ubuntu 16.04) - we never experienced problems like this.
We only experience problems like this with rbd-nbd > 12.2.5 on ubuntu 16.04 
(kernel 4.15) or ubuntu 18.04 (kernel 4.15) with erasure encoding or without.

Regards
Marc


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com