[ceph-users] Rados bench behaves oddly

2020-01-22 Thread John Hearns
We have a CEPH storage cluster which is having problems.
When I run a rados bench I get the behaviour below. Has anyone seen this
sort of thing before?

# rados bench -p scbench 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
0
1  165741   163.379   1640.102551
 0.277302
2  16   10286   171.653   180   0.0297766
 0.273388
3  16   138   122   162.434   144 0.89867
 0.313735
4  16   172   156155.82   136   0.0203412
 0.327505
5  16   205   189151.04   132   0.0997663
 0.304306
6  16   246   230   153.186   164   0.0606922
 0.263446
7  16   269   253   144.44492   0.0339286
 0.247406
8  16   269   253 126.4 0   -
 0.247406
9  16   269   253   112.363 0   -
 0.247406
   10  16   269   253   101.132 0   -
 0.247406
   11  16   269   253   91.9418 0   -
 0.247406
   12  16   269   25384.283 0   -
 0.247406

-- 










*Kheiron Medical Technologies*

kheironmed.com 
 | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are confidential. If they have come to you in error then you must 
take no action based upon them but contact us immediately. Any disclosure, 
copying, distribution or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. Although this e-mail and 
its attachments are believed to be free of any virus, it is the 
responsibility of the recipient to ensure that they are virus free. If you 
contact us by e-mail then we will store your name and address to facilitate 
communications. Any statements contained herein are those of the individual 
and not the organisation.




Registered number: 10184103. Registered 
office: 2nd Floor Stylus Building, 116 Old Street, London, England, EC1V 9BG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed to encode map errors

2019-12-04 Thread John Hearns
The version is Nautilus.   There is a small mismatch in some of the OSD
version numbers, but this has been running for a long time and we have nit
seen this behaviiour.
It is also worth saying that I removed (ahem) then replaced the key for an
osd yesterday. Thanks to Wido for suggesting the fix to that.
I would say these messages happened after the OSD keys were put back in.

#ceph versions

{
"mon": {
"ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)": 1
},
"osd": {
"ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)": 30,
"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)": 9
},
"mds": {
"ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)": 1,
"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)": 1
},
"rgw": {
"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)": 2
},
"rgw-nfs": {
"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)": 35,
"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)": 13
}
}


On Wed, 4 Dec 2019 at 07:58, Martin Verges  wrote:

> Hello,
>
> what versions of Ceph are you running?
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Di., 3. Dez. 2019 um 19:05 Uhr schrieb John Hearns  >:
>
>> And me again for the second time in one day.
>>
>> ceph -w is now showing messages like this:
>>
>> 2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with
>> expected crc
>>
>> Any advice please?
>>
>> *Kheiron Medical Technologies*
>>
>> kheironmed.com | supporting radiologists with deep learning
>>
>> Kheiron Medical Technologies Ltd. is a registered company in England and
>> Wales. This e-mail and its attachment(s) are intended for the above named
>> only and are confidential. If they have come to you in error then you must
>> take no action based upon them but contact us immediately. Any disclosure,
>> copying, distribution or any action taken or omitted to be taken in
>> reliance on it is prohibited and may be unlawful. Although this e-mail and
>> its attachments are believed to be free of any virus, it is the
>> responsibility of the recipient to ensure that they are virus free. If you
>> contact us by e-mail then we will store your name and address to facilitate
>> communications. Any statements contained herein are those of the individual
>> and not the organisation.
>>
>> Registered number: 10184103. Registered office: 2nd Floor Stylus
>> Building, 116 Old Street, London, England, EC1V 9BG
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 










*Kheiron Medical Technologies*

kheironmed.com 
<http://kheironmed.com/> | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are confidential. If they have come to you in error then you must 
take no action based upon them but contact us immediately. Any disclosure, 
copying, distribution or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. Although this e-mail and 
its attachments are believed to be free of any virus, it is the 
responsibility of the recipient to ensure that they are virus free. If you 
contact us by e-mail then we will store your name and address to facilitate 
communications. Any statements contained herein are those of the individual 
and not the organisation.




Registered number: 10184103. Registered 
office: 2nd Floor Stylus Building, 116 Old Street, London, England, EC1V 9BG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Failed to encode map errors

2019-12-03 Thread John Hearns
And me again for the second time in one day.

ceph -w is now showing messages like this:

2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with
expected crc

Any advice please?

-- 










*Kheiron Medical Technologies*

kheironmed.com 
 | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are confidential. If they have come to you in error then you must 
take no action based upon them but contact us immediately. Any disclosure, 
copying, distribution or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. Although this e-mail and 
its attachments are believed to be free of any virus, it is the 
responsibility of the recipient to ensure that they are virus free. If you 
contact us by e-mail then we will store your name and address to facilitate 
communications. Any statements contained herein are those of the individual 
and not the organisation.




Registered number: 10184103. Registered 
office: 2nd Floor Stylus Building, 116 Old Street, London, England, EC1V 9BG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Osd auth del

2019-12-03 Thread John Hearns
Thankyou.   ceph auth add did work

I did try ceph auth get-or-create this does not read from an input file
- it will generate a new key.

On Tue, 3 Dec 2019 at 13:50, Willem Jan Withagen  wrote:

> On 3-12-2019 11:43, Wido den Hollander wrote:
> >
> >
> > On 12/3/19 11:40 AM, John Hearns wrote:
> >> I had a fat fingered moment yesterday
> >> I typed   ceph auth del osd.3
> >> Where osd.3 is an otherwise healthy little osd
> >> I have not set noout or down on  osd.3 yet
> >>
> >> This is a Nautilus cluster.
> >> ceph health reports everything is OK
> >>
> >
> > Fetch the key from the OSD's datastore on the machine itself. On the OSD
> > machine you'll find a file called keyring.
> >
> > Get that file and import it with the proper caps back into cephx. Then
> > all should be fixed!
>
> The magic incantation there would be:
>
> ceph auth add osd. osd 'allow *' mon 'allow rwx' keyring
>
> --WjW
>
>

-- 










*Kheiron Medical Technologies*

kheironmed.com 
<http://kheironmed.com/> | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are confidential. If they have come to you in error then you must 
take no action based upon them but contact us immediately. Any disclosure, 
copying, distribution or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. Although this e-mail and 
its attachments are believed to be free of any virus, it is the 
responsibility of the recipient to ensure that they are virus free. If you 
contact us by e-mail then we will store your name and address to facilitate 
communications. Any statements contained herein are those of the individual 
and not the organisation.




Registered number: 10184103. Registered 
office: 2nd Floor Stylus Building, 116 Old Street, London, England, EC1V 9BG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Osd auth del

2019-12-03 Thread John Hearns
I had a fat fingered moment yesterday
I typed   ceph auth del osd.3
Where osd.3 is an otherwise healthy little osd
I have not set noout or down on  osd.3 yet

This is a Nautilus cluster.
ceph health reports everything is OK

However ceph tell osd.* version hangs when it gets to osd.3
Also the log ceph-osd.3.log is full of these lines:

2019-12-03 10:33:29.503 7f010adf1700  0 cephx: verify_authorizer could not
get service secret for service osd secret_id=10281
2019-12-03 10:33:29.591 7f010adf1700  0 auth: could not find secret_id=10281
2019-12-03 10:33:29.591 7f010adf1700  0 cephx: verify_authorizer could not
get service secret for service osd secret_id=10281
2019-12-03 10:33:29.819 7f010adf1700  0 auth: could not find secret_id=10281

OK, once you have all stopped laughing some advice would be appreciated.

-- 










*Kheiron Medical Technologies*

kheironmed.com 
 | supporting radiologists with deep learning


Kheiron Medical Technologies Ltd. is a registered company in England and 
Wales. This e-mail and its attachment(s) are intended for the above named 
only and are confidential. If they have come to you in error then you must 
take no action based upon them but contact us immediately. Any disclosure, 
copying, distribution or any action taken or omitted to be taken in 
reliance on it is prohibited and may be unlawful. Although this e-mail and 
its attachments are believed to be free of any virus, it is the 
responsibility of the recipient to ensure that they are virus free. If you 
contact us by e-mail then we will store your name and address to facilitate 
communications. Any statements contained herein are those of the individual 
and not the organisation.




Registered number: 10184103. Registered 
office: 2nd Floor Stylus Building, 116 Old Street, London, England, EC1V 9BG
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure coded pools on Ambedded - advice please

2019-10-24 Thread John Hearns
I am setting up a storage cluster on Ambedded ARM hardware, which is nice!
I find that I can set up an erasure coded pool with the default k=2,m=1

The cluster has 9x OSD with HDD and 12xOSD with SSD

If I configure another erasure profile such as k=7 m=2 then the pool
creates, but the pgs stick in configuring/incomplete.
Some advice please:

a) what erasure profiles do people suggest for this setup

b) a pool with m=1 will work fine of course, I imagine though a failed OSD
has to be replaced quickly

If anyone else has Ambedded, what crush rule do you select for the metadata
when creating a pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cloudstack and CEPH Day London

2019-10-24 Thread John Hearns
I will be attending the Cloudstack and CEPH Day in London today.
Please say hello - rotund Scottish guy, not much hair. Glaswegian accent!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread John Hearns
Try running  gstack  on the ceph mgr process when it is frozen?
This could be a name resolution problem, as you suspect. Maybe gstack will
show where the process is 'stuck'and this might be a call to your name
resolution service.

On Tue, 27 Aug 2019 at 14:25, Jake Grimmett  wrote:

> Whoops, I'm running Scientific Linux 7.6, going to upgrade to 7.7. soon...
>
> thanks
>
> Jake
>
>
> On 8/27/19 2:22 PM, Jake Grimmett wrote:
> > Hi Reed,
> >
> > That exactly matches what I'm seeing:
> >
> > when iostat is working OK, I see ~5% CPU use by ceph-mgr
> > and when iostat freezes, ceph-mgr CPU increases to 100%
> >
> > regarding OS, I'm using Scientific Linux 7.7
> > Kernel 3.10.0-957.21.3.el7.x86_64
> >
> > I'm not sure if the mgr initiates scrubbing, but if so, this could be
> > the cause of the "HEALTH_WARN 20 pgs not deep-scrubbed in time" that we
> see.
> >
> > Anyhow, many thanks for your input, please let me know if you have
> > further ideas :)
> >
> > best,
> >
> > Jake
> >
> > On 8/27/19 2:01 PM, Reed Dier wrote:
> >> Curious what dist you're running on, as I've been having similar issues
> with instability in the mgr as well, curious if any similar threads to pull
> at.
> >>
> >> While the iostat command is running, is the active mgr using 100% CPU
> in top?
> >>
> >> Reed
> >>
> >>> On Aug 27, 2019, at 6:41 AM, Jake Grimmett 
> wrote:
> >>>
> >>> Dear All,
> >>>
> >>> We have a new Nautilus (14.2.2) cluster, with 328 OSDs spread over 40
> nodes.
> >>>
> >>> Unfortunately "ceph iostat" spends most of it's time frozen, with
> >>> occasional periods of working normally for less than a minute, then
> >>> freeze again for a couple of minutes, then come back to life, and so so
> >>> on...
> >>>
> >>> No errors are seen on screen, unless I press CTRL+C when iostat is
> stalled:
> >>>
> >>> [root@ceph-s3 ~]# ceph iostat
> >>> ^CInterrupted
> >>> Traceback (most recent call last):
> >>>  File "/usr/bin/ceph", line 1263, in 
> >>>retval = main()
> >>>  File "/usr/bin/ceph", line 1194, in main
> >>>verbose)
> >>>  File "/usr/bin/ceph", line 619, in new_style_command
> >>>ret, outbuf, outs = do_command(parsed_args, target, cmdargs,
> >>> sigdict, inbuf, verbose)
> >>>  File "/usr/bin/ceph", line 593, in do_command
> >>>return ret, '', ''
> >>> UnboundLocalError: local variable 'ret' referenced before assignment
> >>>
> >>> Observations:
> >>>
> >>> 1) This problem does not seem to be related to load on the cluster.
> >>>
> >>> 2) When iostat is stalled the dashboard is also non-responsive, if
> >>> iostat is working, the dashboard also works.
> >>>
> >>> Presumably the iostat and dashboard problems are due to the same
> >>> underlying fault? Perhaps a problem with the mgr?
> >>>
> >>>
> >>> 3) With iostat working, tailing /var/log/ceph/ceph-mgr.ceph-s3.log
> >>> shows:
> >>>
> >>> 2019-08-27 09:09:56.817 7f8149834700  0 log_channel(audit) log [DBG] :
> >>> from='client.4120202 -' entity='client.admin' cmd=[{"width": 95,
> >>> "prefix": "iostat", "poll": true, "target": ["mgr", ""],
> "print_header":
> >>> false}]: dispatch
> >>>
> >>> 4) When iostat isn't working, we see no obvious errors in the mgr log.
> >>>
> >>> 5) When the dashboard is not working, mgr log sometimes shows:
> >>>
> >>> 2019-08-27 09:18:18.810 7f813e533700  0 mgr[dashboard]
> >>> [:::10.91.192.36:43606] [GET] [500] [2.724s] [jake] [1.6K]
> >>> /api/health/minimal
> >>> 2019-08-27 09:18:18.887 7f813e533700  0 mgr[dashboard] ['{"status":
> "500
> >>> Internal Server Error", "version": "3.2.2", "detail": "The server
> >>> encountered an unexpected condition which prevented it from fulfilling
> >>> the request.", "traceback": "Traceback (most recent call last):\\n
> File
> >>> \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\", line
> 656,
> >>> in respond\\nresponse.body = self.handler()\\n  File
> >>> \\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
> >>> 188, in __call__\\nself.body = self.oldhandler(*args, **kwargs)\\n
> >>> File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\", line
> >>> 221, in wrap\\nreturn self.newhandler(innerfunc, *args,
> **kwargs)\\n
> >>> File \\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line
> >>> 88, in dashboard_exception_handler\\nreturn handler(*args,
> >>> **kwargs)\\n  File
> >>> \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\", line
> 34,
> >>> in __call__\\nreturn self.callable(*self.args, **self.kwargs)\\n
> >>> File \\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line
> >>> 649, in inner\\nret = func(*args, **kwargs)\\n  File
> >>> \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 192, in
> >>> minimal\\nreturn self.health_minimal.all_health()\\n  File
> >>> \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 51, in
> >>> all_health\\nresult[\'pools\'] = self.pools()\\n  File
> >>> \\"/usr/share/ceph/mgr/dashboard/controllers/health.py\\", line 

Re: [ceph-users] Ubuntu 19.04

2019-07-07 Thread John Hearns
You can compile from source :-)
I Can't comment on the compatibility of the packages between 18.04 and
19.04, sorry.

On Sat, 6 Jul 2019 at 15:44, Ashley Merrick 
wrote:

> Hello,
>
> Looking at the possibility of upgrading my personal storage cluster from
> Ubuntu 18.04 -> 19.04 to benefit from a newer version of the Kernel e.t.c
>
> I see CEPH only seems to releases packages for LTS versions which makes
> sense as in production environments most people wouldn't want to upgrade to
> the next OS release every 6 months.
>
> Will the 18.04 packages work fine on 19.04 or shall I hold off and sit on
> 18.04 till 20.04 (next LTS) comes along?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Massive TCP connection on radosgw

2019-05-20 Thread John Hearns
I found similar behaviour on a Nautilus cluster on Friday. Around 300 000
open connections which I think were the result of a benchmarking run which
was terminated. I restarted the radosgw service to get rid of them.

On Mon, 20 May 2019 at 06:56, Li Wang  wrote:

> Dear ceph community members,
>
> We have a ceph cluster (mimic 13.2.4) with 7 nodes and 130+ OSDs. However,
> we observed over 70 millions active TCP connections on the radosgw host,
> which makes the radosgw very unstable.
>
> After further investigation, we found most of the TCP connections on the
> radosgw are connected to OSDs.
>
> May I ask what might be the possible reason causing the the massive amount
> of TCP connection? And is there anything configuration or tuning work that
> I can do to solve this issue?
>
> Any suggestion is highly appreciated.
>
> Regards,
> Li Wang
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus upgrade but older releases reported by features

2019-03-27 Thread John Hearns
Sure

# ceph versions
{
"mon": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 2
},
"osd": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 12
},
"mds": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 3
},
"rgw": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 4
},
"overall": {
"ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc)
nautilus (stable)": 24
}
}


On Wed, 27 Mar 2019 at 11:20, Konstantin Shalygin  wrote:

> We recently updated a cluster to the Nautlius release by updating Debian
> packages from the Ceph site. Then rebooted all servers.
>
> ceph features still reports older releases, for example the osd
>
> "osd": [
> {
> "features": "0x3ffddff8ffac",
> "release": "luminous",
> "num": 12
> }
>
> I think I am not understanding what is exactly meant by release here.
> Cn we alter the osd (mon, clients etc.) such that they report nautilus ??
>
> Show your `ceph versions` please.
>
>
>
> k
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus upgrade but older releases reported by features

2019-03-27 Thread John Hearns
We recently updated a cluster to the Nautlius release by updating Debian
packages from the Ceph site. Then rebooted all servers.

ceph features still reports older releases, for example the osd

"osd": [
{
"features": "0x3ffddff8ffac",
"release": "luminous",
"num": 12
}

I think I am not understanding what is exactly meant by release here.
Cn we alter the osd (mon, clients etc.) such that they report nautilus ??
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v14.2.0 Nautilus released

2019-03-21 Thread John Hearns
Martin, my thanks to Croit for making this repository available.
I have been building Ceph from source on Ubuntu Cosmic for the last few
days.
It is much more convenient to use a repo.

On Thu, 21 Mar 2019 at 09:32, Martin Verges  wrote:

> Hello,
>
> we strongly believe it would be good for Ceph to have the packaged
> directly on the official Debian mirrors, but for everyone out there
> having trouble with Ceph on Debian we are glad to help.
> If Ceph is not available on Debian, it might affect a lot of other
> Software, for example Proxmox.
>
> You can find Ceph Nautilus 14.2.0 for Debian 10 Buster on our public
> mirror.
>
> $ curl https://mirror.croit.io/keys/release.asc | apt-key add -
> $ echo 'deb https://mirror.croit.io/debian-nautilus/ buster main' >>
> /etc/apt/sources.list.d/croit-ceph.list
>
> If we can help to get the packages on the official mirrors, please
> feel free contact us!
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mi., 20. März 2019 um 20:49 Uhr schrieb Ronny Aasen
> :
> >
> >
> > with Debian buster frozen, If there are issues with ceph on debian that
> > would best be fixed in debian, now is the last chance to get anything
> > into buster before the next release.
> >
> > it is also important to get mimic and luminous packages built for
> > Buster. Since you want to avoid a situation where you have to upgrade
> > both the OS and ceph at the same time.
> >
> > kind regards
> > Ronny Aasen
> >
> >
> >
> > On 20.03.2019 07:09, Alfredo Deza wrote:
> > > There aren't any Debian packages built for this release because we
> > > haven't updated the infrastructure to build (and test) Debian packages
> > > yet.
> > >
> > > On Tue, Mar 19, 2019 at 10:24 AM Sean Purdy 
> wrote:
> > >> Hi,
> > >>
> > >>
> > >> Will debian packages be released?  I don't see them in the nautilus
> repo.  I thought that Nautilus was going to be debian-friendly, unlike
> Mimic.
> > >>
> > >>
> > >> Sean
> > >>
> > >> On Tue, 19 Mar 2019 14:58:41 +0100
> > >> Abhishek Lekshmanan  wrote:
> > >>
> > >>> We're glad to announce the first release of Nautilus v14.2.0 stable
> > >>> series. There have been a lot of changes across components from the
> > >>> previous Ceph releases, and we advise everyone to go through the
> release
> > >>> and upgrade notes carefully.
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?

2019-03-18 Thread John Hearns
Thankyou Marc.
I cloned the GitHub repo and am building the packages. No biggie really and
hey, I do like living on the edge.

On Mon, 18 Mar 2019 at 16:04, Marc Roos  wrote:

>
>
> If you want the excitement, can I then wish you my possible future ceph
> cluster problems, so I won't have them ;)
>
>
>
>
> -Original Message-
> From: John Hearns
> Sent: 18 March 2019 17:00
> To: ceph-users
> Subject: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?
>
> May I ask if there is a repository for the latest Ceph Nautilus for
> Ubuntu?
> Specifically Ubuntu 18.10 Cosmic Cuttlefish.
>
> Perhaps I am payig a penalty for living on the bleeding edge. But one
> does have to have some excitement in life.
>
> Thanks
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Nautilus for Ubuntu Cosmic?

2019-03-18 Thread John Hearns
May I ask if there is a repository for the latest Ceph Nautilus for Ubuntu?
Specifically Ubuntu 18.10 Cosmic Cuttlefish.

Perhaps I am payig a penalty for living on the bleeding edge. But one does
have to have some excitement in life.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph migration

2019-02-27 Thread John Hearns
We did a similar upgrade on a test system yesterday, from mimic to nautilus.
All of the PGSstayed offlien till we issued this command:

ceph osd require-osd-release nautlius --yes-i-really-mean-it}

On Wed, 27 Feb 2019 at 12:19, Zhenshi Zhou  wrote:

> Hi,
>
> The servers have moved to the new datacenter and I got it online
> following the instruction.
>
> # ceph -s
>   cluster:
> id: 7712ab7e-3c38-44b3-96d3-4e1de9da0ff6
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3
> mgr: ceph-mon3(active), standbys: ceph-mon1, ceph-mon2
> mds: cephfs-1/1/1 up  {0=ceph-mds=up:active}, 1 up:standby
> osd: 63 osds: 63 up, 63 in
>
>   data:
> pools:   4 pools, 640 pgs
> objects: 108.6 k objects, 379 GiB
> usage:   1.3 TiB used, 228 TiB / 229 TiB avail
> pgs: 640 active+clean
>
> Thanks guys:)
>
> Eugen Block  于2019年2月27日周三 上午2:45写道:
>
>> Hi,
>>
>> > Well, I've just reacted to all the text at the beginning of
>> >
>> http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-the-messy-way
>> > including the title "the messy way". If the cluster is clean I see no
>> > reason for doing brain surgery on monmaps
>> > just to "save" a few minutes of redoing correctly from scratch.
>>
>> with that I would agree. Careful planning and an installation
>> following the docs should be first priority. But I would also
>> encourage users to experiment with ceph before going into production.
>> Dealing with failures and outages on a production cluster causes much
>> more headache than on a test cluster. ;-)
>>
>> If the cluster is empty anyway, I would also rather reinstall it, it
>> doesn't take that much time. I just wanted to point out that there is
>> a way that worked for me, although that was only a test cluster.
>>
>> Regards,
>> Eugen
>>
>>
>> Zitat von Janne Johansson :
>>
>> > Den mån 25 feb. 2019 kl 13:40 skrev Eugen Block :
>> >> I just moved a (virtual lab) cluster to a different network, it worked
>> >> like a charm.
>> >> In an offline method - you need to:
>> >>
>> >> - set osd noout, ensure there are no OSDs up
>> >> - Change the MONs IP, See the bottom of [1] "CHANGING A MONITOR’S IP
>> >> ADDRESS", MONs are the only ones really
>> >> sticky with the IP
>> >> - Ensure ceph.conf has the new MON IPs and network IPs
>> >> - Start MONs with new monmap, then start OSDs
>> >>
>> >> > No, certain ips will be visible in the databases, and those will
>> >> not change.
>> >> I'm not sure where old IPs will be still visible, could you clarify
>> >> that, please?
>> >
>> > Well, I've just reacted to all the text at the beginning of
>> >
>> http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-the-messy-way
>> > including the title "the messy way". If the cluster is clean I see no
>> > reason for doing brain surgery on monmaps
>> > just to "save" a few minutes of redoing correctly from scratch. What
>> > if you miss some part, some command gives you an error
>> > you really aren't comfortable with, something doesn't really feel
>> > right after doing it, then the whole lifetime of that cluster
>> > will be followed by a small nagging feeling that it might have been
>> > that time you followed a guide that tries to talk you out of
>> > doing it that way, for a cluster with no data.
>> >
>> > I think that is the wrong way to learn how to run clusters.
>> >
>> > --
>> > May the most significant bit of your life be positive.
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Diskprediction - smart returns

2019-02-27 Thread John Hearns
To answer my own question version 7.0 of the smartmontools package is
needed. This has the --json flag
See:
http://debian.2.n7.nabble.com/Bug-918535-smartmontools-New-upstream-release-7-0-td4447595.html

On Wed, 27 Feb 2019 at 11:09, John Hearns  wrote:

> I am looking at the diskprediction health metrics in Nautilus Looks very
> useful. http://docs.ceph.com/docs/nautilus/mgr/diskprediction/
>
> On a Debian 9 system with smartctl version  6.6 2016-05-31 I get this:
>
> # ceph device get-health-metrics  SEAGATE_ST1000NM0023_Z1W1ZB0P
> {
> "20190227-104719": {
> "nvme_smart_health_information_add_log_error_code": -22,
> "nvme_vendor": "seagate",
> "nvme_smart_health_information_add_log_error": "nvme returned an
> error: sudo: exit status: 1",
> "dev": "/dev/sdb",
>     "error": "smartctl returned invalid JSON"
> }
>
> I am guessing a more up to date smartmontools is needed?
>
> John Hearns
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Diskprediction - smart returns

2019-02-27 Thread John Hearns
I am looking at the diskprediction health metrics in Nautilus Looks very
useful. http://docs.ceph.com/docs/nautilus/mgr/diskprediction/

On a Debian 9 system with smartctl version  6.6 2016-05-31 I get this:

# ceph device get-health-metrics  SEAGATE_ST1000NM0023_Z1W1ZB0P
{
"20190227-104719": {
"nvme_smart_health_information_add_log_error_code": -22,
"nvme_vendor": "seagate",
"nvme_smart_health_information_add_log_error": "nvme returned an
error: sudo: exit status: 1",
"dev": "/dev/sdb",
"error": "smartctl returned invalid JSON"
}

I am guessing a more up to date smartmontools is needed?

John Hearns
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread John Hearns
This is a general question for the ceph list.
Should Jesper be looking at these vm tunables?
vm.dirty_ratio
vm.dirty_centisecs

What effect do they have when using Cephfs?

On Sun, 14 Oct 2018 at 14:24, John Hearns  wrote:

> Hej Jesper.
> Sorry I do not have a direct answer to your question.
> When looking at memory usage, I often use this command:
>
> watch cat /rpoc/meminfo
>
>
>
>
>
>
> On Sun, 14 Oct 2018 at 13:22,  wrote:
>
>> Hi
>>
>> We have a dataset of ~300 GB on CephFS which as being used for
>> computations
>> over and over agian .. being refreshed daily or similar.
>>
>> When hosting it on NFS after refresh, they are transferred, but from
>> there - they would be sitting in the kernel page cache of the client
>> until they are refreshed serverside.
>>
>> On CephFS it look "similar" but "different". Where the "steady state"
>> operation over NFS would give a client/server traffic of < 1MB/s ..
>> CephFS contantly pulls 50-100MB/s over the network.  This has
>> implications for the clients that end up spending unnessary time waiting
>> for IO in the execution.
>>
>> This is in a setting where the CephFS client mem look like this:
>>
>> $ free -h
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G340G1.2G 19G
>> 354G
>> Swap:  8.8G430M8.4G
>>
>>
>> If I just repeatedly run (within a few minute) something that is using the
>> files, then
>> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
>> like
>> it is being evicted way faster than in the NFS setting?
>>
>> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
>> type on a total of 24GB data in 300'ish files.
>>
>> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
>> time CMD ;
>>
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 16G312G1.2G 48G
>> 355G
>> Swap:  8.8G430M8.4G
>>
>> real0m8.997s
>> user0m2.036s
>> sys 0m6.915s
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G277G1.2G 82G
>> 354G
>> Swap:  8.8G430M8.4G
>>
>> real3m25.904s
>> user0m2.794s
>> sys 0m9.028s
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G283G1.2G 76G
>> 353G
>> Swap:  8.8G430M8.4G
>>
>> real6m18.358s
>> user0m2.847s
>> sys 0m10.651s
>>
>>
>> Munin graphs of the system confirms that there has been zero memory
>> pressure over the period.
>>
>> Is there things in the CephFS case that can cause the page-cache to be
>> invailated?
>> Could less agressive "read-ahead" play a role?
>>
>> Other thoughts on what root cause on the different behaviour could be?
>>
>> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
>> that could impact ?
>>
>> Jesper
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread John Hearns
Hej Jesper.
Sorry I do not have a direct answer to your question.
When looking at memory usage, I often use this command:

watch cat /rpoc/meminfo






On Sun, 14 Oct 2018 at 13:22,  wrote:

> Hi
>
> We have a dataset of ~300 GB on CephFS which as being used for computations
> over and over agian .. being refreshed daily or similar.
>
> When hosting it on NFS after refresh, they are transferred, but from
> there - they would be sitting in the kernel page cache of the client
> until they are refreshed serverside.
>
> On CephFS it look "similar" but "different". Where the "steady state"
> operation over NFS would give a client/server traffic of < 1MB/s ..
> CephFS contantly pulls 50-100MB/s over the network.  This has
> implications for the clients that end up spending unnessary time waiting
> for IO in the execution.
>
> This is in a setting where the CephFS client mem look like this:
>
> $ free -h
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G340G1.2G 19G
> 354G
> Swap:  8.8G430M8.4G
>
>
> If I just repeatedly run (within a few minute) something that is using the
> files, then
> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
> like
> it is being evicted way faster than in the NFS setting?
>
> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
> type on a total of 24GB data in 300'ish files.
>
> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
> time CMD ;
>
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 16G312G1.2G 48G
> 355G
> Swap:  8.8G430M8.4G
>
> real0m8.997s
> user0m2.036s
> sys 0m6.915s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G277G1.2G 82G
> 354G
> Swap:  8.8G430M8.4G
>
> real3m25.904s
> user0m2.794s
> sys 0m9.028s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G283G1.2G 76G
> 353G
> Swap:  8.8G430M8.4G
>
> real6m18.358s
> user0m2.847s
> sys 0m10.651s
>
>
> Munin graphs of the system confirms that there has been zero memory
> pressure over the period.
>
> Is there things in the CephFS case that can cause the page-cache to be
> invailated?
> Could less agressive "read-ahead" play a role?
>
> Other thoughts on what root cause on the different behaviour could be?
>
> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
> that could impact ?
>
> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SAN or DAS for Production ceph

2018-08-28 Thread John Hearns
James, you also use the words enterprise and production ready.
Is Redhat support important to you?




On Tue, 28 Aug 2018 at 23:56, John Hearns  wrote:

> James, well for a start don't use a SAN. I speak as someone who managed a
> SAN with Brocade switches and multipathing for an F1 team. CEPH is Software
> Defined Storage. You want discreet storage servers with a high bandwidth
> Ethernet (or maybe Infiniband) fabric.
>
> Fibrechannel still has it place here though if you want servers with FC
> attached JBODs.
>
> Also you ask about the choice between spinning disks, SSDs and NVMe
> drives. Think about the COST for your petabyte archive.
> True, these days you can argue that all SSD could be comparable to
> spinning disks. But NVMe? Yes you get the best performance.. but do you
> really want all that video data on $$$ NVMe? You need tiering.
>
> Also dont forget low and slow archive tiers - shingled archive disks and
> perhaps tape.
>
> Me, I would start from the building blocks of Supermicro 36 bay storage
> servers. Fill them with 12 Tbyte helium drives.
> Two slots in the back for SSDs for your journaling.
> For a higher performance tier, look at the 'double double' storage servers
> from Supermicro. Or even nicer the new 'ruler'form factor servers.
> For a higher density archiving tier the 90 bay Supermicro servers.
>
> Please get in touch with someone for advice. If you are in the UK I am
> happy to help and point you in the right direction.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, 28 Aug 2018 at 21:05, James Watson  wrote:
>
>> Dear cephers,
>>
>> I am new to the storage domain.
>> Trying to get my head around the enterprise - production-ready setup.
>>
>> The following article helps a lot here: (Yahoo ceph implementation)
>> https://yahooeng.tumblr.com/tagged/object-storage
>>
>> But a couple of questions:
>>
>> What HDD would they have used here? NVMe / SATA /SAS etc (with just 52
>> storage node they got 3.2 PB of capacity !! )
>> I try to calculate a similar setup with HGST Ultrastar He12 (12TB and
>> it's more recent ) and would need 86 HDDs that adds up to 1 PB only!!
>>
>> How is the HDD drive attached is it DAS or a SAN (using Fibre Channel
>> Switches, Host Bus Adapters etc)?
>>
>> Do we need a proprietary hashing algorithm to implement multi-cluster
>> based setup of ceph to contain CPU/Memory usage within the cluster when
>> rebuilding happens during device failure?
>>
>> If proprietary hashing algorithm is required to setup multi-cluster ceph
>> using load balancer - then what could be the alternative setup we can
>> deploy to address the same issue?
>>
>> The aim is to design a similar architecture but with upgraded products
>> and higher performance. - Any suggestions or thoughts are welcome
>>
>>
>>
>> Thanks in advance
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SAN or DAS for Production ceph

2018-08-28 Thread John Hearns
James, well for a start don't use a SAN. I speak as someone who managed a
SAN with Brocade switches and multipathing for an F1 team. CEPH is Software
Defined Storage. You want discreet storage servers with a high bandwidth
Ethernet (or maybe Infiniband) fabric.

Fibrechannel still has it place here though if you want servers with FC
attached JBODs.

Also you ask about the choice between spinning disks, SSDs and NVMe drives.
Think about the COST for your petabyte archive.
True, these days you can argue that all SSD could be comparable to spinning
disks. But NVMe? Yes you get the best performance.. but do you really want
all that video data on $$$ NVMe? You need tiering.

Also dont forget low and slow archive tiers - shingled archive disks and
perhaps tape.

Me, I would start from the building blocks of Supermicro 36 bay storage
servers. Fill them with 12 Tbyte helium drives.
Two slots in the back for SSDs for your journaling.
For a higher performance tier, look at the 'double double' storage servers
from Supermicro. Or even nicer the new 'ruler'form factor servers.
For a higher density archiving tier the 90 bay Supermicro servers.

Please get in touch with someone for advice. If you are in the UK I am
happy to help and point you in the right direction.














On Tue, 28 Aug 2018 at 21:05, James Watson  wrote:

> Dear cephers,
>
> I am new to the storage domain.
> Trying to get my head around the enterprise - production-ready setup.
>
> The following article helps a lot here: (Yahoo ceph implementation)
> https://yahooeng.tumblr.com/tagged/object-storage
>
> But a couple of questions:
>
> What HDD would they have used here? NVMe / SATA /SAS etc (with just 52
> storage node they got 3.2 PB of capacity !! )
> I try to calculate a similar setup with HGST Ultrastar He12 (12TB and it's
> more recent ) and would need 86 HDDs that adds up to 1 PB only!!
>
> How is the HDD drive attached is it DAS or a SAN (using Fibre Channel
> Switches, Host Bus Adapters etc)?
>
> Do we need a proprietary hashing algorithm to implement multi-cluster
> based setup of ceph to contain CPU/Memory usage within the cluster when
> rebuilding happens during device failure?
>
> If proprietary hashing algorithm is required to setup multi-cluster ceph
> using load balancer - then what could be the alternative setup we can
> deploy to address the same issue?
>
> The aim is to design a similar architecture but with upgraded products and
> higher performance. - Any suggestions or thoughts are welcome
>
>
>
> Thanks in advance
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Design a PetaByte scale CEPH object storage

2018-08-27 Thread John Hearns
James, I would recommend that you do the following

a) write out a clear set of requirements and use cases for this system. Do
not mention any specific technology
b) plan to install and test a small ProofOfConcept system. You can then
assess if it meets the requirement in (a)

On Mon, 27 Aug 2018 at 09:14, Marc Roos  wrote:

>
>
> > I am a software developer and am new to this domain.
>
> So maybe first get some senior system admin or so? You also do not want
> me to start doing some amateur brain surgery, do you?
>
> > each file has approx 15 TB
>  Pfff, maybe rethink/work this to
>
>
>
>
>
>
>
> -Original Message-
> From: James Watson [mailto:import.me...@gmail.com]
> Sent: zondag 26 augustus 2018 20:24
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Design a PetaByte scale CEPH object storage
>
> Hi CEPHers,
>
> I need to design an HA CEPH object storage system. The scenario is that
> we are recording HD Videos and end of the day we need to copy all these
> video files (each file has approx 15 TB ) to our storage system.
>
> 1)Which would be the best tech in storage to transfer these PBs size
> loads of videos to CEPH based object storage wirelessly.
>
> 2)How should I design my CEPH in the scale of PBs and make sure its
> future proof.
>
> 3)What are the latest hardware components I might require to accomplish
> this task?
>
>
> I am a software developer and am new to this domain. Kindly request all
> to provide the name of even the most basic of hardware components
> required for the setup so that I can do a cost estimation and compare
> with other techs.
>
> My novice solution so far:
>
> 1. Transmitting module if using WiFi (802.11ac (aka Gigabit Wifi) max
> 200 Mbps speed) to transfer a file of size 15 TB to CEPH Storage takes 7
> days !!
>
> 2.CEPH needs to be configured with High Availability A SAN with FC
> networking in place (GEN 6 SANS) using NVMe SSD with HBAs that support
> NVMe over Fibre Channel giving a transfer rate of 16 Gbps to Host
> Server.
>
>
> Thanks for your help in advance.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Design a PetaByte scale CEPH object storage

2018-08-26 Thread John Hearns
James, I echo what Christian Balzer says. DO not fixate on CEPH at this
stage, we need to look at what the requirements are, There are alternatives
such as Spectrum Scale and Minio. Also, depending on how often the videos
are to be recalled, looking at a tape based solution.

Regarding hardware, Supermicro is a good place to start. I helped supply 20
Petabytes of CEPH storage to an STFC site based on the 36 bay Supermicro
storage servers.

Regarding the requirement for wireless transfer of the data we would have
to examine this.  I am sure that you have a good reason for this, though
there should be a way to do this using a high bandwidth connection - 40 or
100Gbps Ethernet. I would look at basing your solution around a modern high
bandwidth network anyway.
Are you based in the UK? If so, we should talk off list.

John Hearns




On Mon, 27 Aug 2018 at 03:39, Christian Balzer  wrote:

>
> Hello,
>
>
> On Sun, 26 Aug 2018 22:23:53 +0400 James Watson wrote:
>
> > Hi CEPHers,
> >
> > I need to design an HA CEPH object storage system.
>
> The first question that comes to mind is why?
> Why does it need to be Ceph and why object based (RGW)?
>
> From what's stated below it seems that nobody at your end has in depth
> experience with Ceph and related HW and the bit about getting that amount
> of data in via WiFi boggles the mind.
>
> HA is implied with Ceph, the bits that feed into it or read out if it will
> be your problem/responsibility.
>
> > The scenario is that we
> > are recording HD Videos and end of the day we need to copy all these
> video
> > files (each file has approx 15 TB ) to our storage system.
> >
> How many of these 15TB files, how long do you need to keep them,
> estimated future growth, etc?
>
> > 1)Which would be the best tech in storage to transfer these PBs size
> loads
> > of videos to CEPH based object storage wirelessly.
> >
> As you already figured out, wireless is going to be an issue.
> Are those 15TB files accumulated locally (sounds odd for remote cameras to
> have such storage)?
> Unless you deploy something beefier in the network department OR go to
> sneakernet this won't work well.
>
> > 2)How should I design my CEPH in the scale of PBs and make sure its
> future
> > proof.
> >
> Lots of examples around if you're looking around here and on google.
> Dense nodes (lots of HDDs) work if you have enough of them, otherwise
> smaller is preferable.
>
> > 3)What are the latest hardware components I might require to accomplish
> > this task?
> >
> You're entering the realm of paid consulting here, which I'm sure some
> people present might be willing to do.
> Vendors like Supermicro probably, too.
> Heck, you can buy it off the shelf even (not the only one out there):
> http://www.fujitsu.com/global/products/computing/storage/eternus-cd/s2/
>
> In general your needs sound a lot like large streaming (in and out), which
> has less demands on the HW then lots of small IOPS.
>
> >
> > I am a software developer and am new to this domain. Kindly request all
> to
> > provide the name of even the most basic of hardware components required
> for
> > the setup so that I can do a cost estimation and compare with other
> techs.
> >
> You can do cheap with Ceph, _if_ you know what you're doing.
> OTOH Ceph emphasizes redundancy first, so compared with other solutions it
> may not be as cheap as it could be, depending on configuration choices and
> performance requirements.
>
>
> > My novice solution so far:
> >
> > 1. Transmitting module if using WiFi (802.11ac (aka Gigabit Wifi) max 200
> > Mbps speed) to transfer a file of size 15 TB to CEPH Storage takes 7
> days !!
> >
> > 2.CEPH needs to be configured with High Availability
> > A SAN with FC networking in place (GEN 6 SANS) using NVMe SSD with HBAs
> > that support NVMe over Fibre Channel giving a transfer rate of 16 Gbps to
> > Host Server.
> >
> Ceph is all about local storage per node, SAN and FC are anathema and NVMe
> is likely not needed in your scenario, at least not for actual storage
> space.
>
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active directory integration with cephfs

2018-07-26 Thread John Hearns
NFS Ganesha certainly works with Cephfs. I would investigate that also.
http://docs.ceph.com/docs/master/cephfs/nfs/

Regarding Active Directory, I have done a lot of work recently with sssd.
Not entirely relevant to this list, please send me a mail offline.

Not sure if this is any direct use
https://github.com/MI-OSiRIS/docker-nfs-ganesha-ceph









On Thu, 26 Jul 2018 at 08:34, Serkan Çoban  wrote:

> You can do it by exporting cephfs by samba. I don't think any other
> way exists for cephfs.
>
> On Thu, Jul 26, 2018 at 9:12 AM, Manuel Sopena Ballesteros
>  wrote:
> > Dear Ceph community,
> >
> >
> >
> > I am quite new to Ceph but trying to learn as much quick as I can. We are
> > deploying our first Ceph production cluster in the next few weeks, we
> choose
> > luminous and our goal is to have cephfs. One of the question I have been
> > asked by other members of our team is if there is a possibility to
> integrate
> > ceph authentication/authorization with Active Directory. I have seen in
> the
> > documentations that objct gateway can do this but I am not about cephfs.
> >
> >
> >
> > Anyone has any idea if I can integrate cephfs with AD?
> >
> >
> >
> > Thank you very much
> >
> >
> >
> > Manuel Sopena Ballesteros | Big data Engineer
> > Garvan Institute of Medical Research
> > The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> > T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E:
> manuel...@garvan.org.au
> >
> >
> >
> > NOTICE
> > Please consider the environment before printing this email. This message
> and
> > any attachments are intended for the addressee named and may contain
> legally
> > privileged/confidential/copyright information. If you are not the
> intended
> > recipient, you should not read, use, disclose, copy or distribute this
> > communication. If you have received this message in error please notify
> us
> > at once by return email and then delete both messages. We accept no
> > liability for the distribution of viruses or similar in electronic
> > communications. This notice should not be removed.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread John Hearns
It is worth asking - why do you want to have two interfaces?
If you have 1Gbps interfaces and this is a bandwidth requirement then
10Gbps cards and switches are very cheap these days.

On 1 June 2018 at 10:37, Panayiotis Gotsis  wrote:

> Hello
>
> Bonding and iscsi are not a best practice architecture. Multipath is,
> however I can attest to problems with the multipathd and debian.
>
> In any case, what you should try to do and check is:
>
> 1) Use two vlans, one for each ethernet port, with different ip
> address space. Your initiators on the hosts will then be able to
> discover two iscsi targets.
> 2) You should ensure that ping between host interfaces and iscsi
> targets is working. You should ensure that the iscsi target daemon is
> up (through the use of netstat for example) for each one of the two
> ip addresses/ethernet interfaces
> 3) Check multipath configuration
>
>
> On 18-06-01 05:08 +0200, Marc Roos wrote:
>
>>
>>
>> Indeed, you have to add routes and rules to routing table. Just bond
>> them.
>>
>>
>> -Original Message-
>> From: John Hearns [mailto:hear...@googlemail.com]
>> Sent: vrijdag 1 juni 2018 10:00
>> To: ceph-users
>> Subject: Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters -
>> how to ?
>>
>> Errr   is this very wise ?
>>
>> I have both its Ethernets connected to the same LAN,
>>with different IPs in the same subnet
>>(like, 192.168.200.230/24 and 192.168.200.231/24)
>>
>>
>> In my experience setting up to interfaces on the same subnet means that
>> your ssystem doesnt know which one to route traffic through...
>>
>>
>>
>>
>>
>>
>>
>> On 1 June 2018 at 09:01, Wladimir Mutel  wrote:
>>
>>
>> Dear all,
>>
>> I am experimenting with Ceph setup. I set up a single node
>> (Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA
>> HDDs,
>> Ubuntu 18.04 Bionic, Ceph packages from
>> http://download.ceph.com/debian-luminous/dists/xenial/
>> <http://download.ceph.com/debian-luminous/dists/xenial/>
>> and iscsi parts built manually per
>> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual
>> -instal
>> l/
>> <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/> )
>> Also i changed 'chooseleaf ... host' into 'chooseleaf ...
>> osd'
>> in the CRUSH map to run with single host.
>>
>> I have both its Ethernets connected to the same LAN,
>> with different IPs in the same subnet
>> (like, 192.168.200.230/24 and 192.168.200.231/24)
>> mon_host in ceph.conf is set to 192.168.200.230,
>> and ceph daemons (mgr, mon, osd) are listening to this IP.
>>
>> What I would like to finally achieve, is to provide
>> multipath
>> iSCSI access through both these Ethernets to Ceph RBDs,
>> and apparently, gwcli does not allow me to add a second
>> gateway to the same target. It is going like this :
>>
>> /iscsi-target> create iqn.2018-06.host.test:test
>> ok
>> /iscsi-target> cd iqn.2018-06.host.test:test/gateways
>> /iscsi-target...test/gateways> create p10s 192.168.200.230
>> skipchecks=true
>> OS version/package checks have been bypassed
>> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
>> ok
>> /iscsi-target...test/gateways> create p10s2 192.168.200.231
>> skipchecks=true
>> OS version/package checks have been bypassed
>> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
>> Failed : Gateway creation failed, gateway(s)
>> unavailable:192.168.200.231(UNKNOWN state)
>>
>> host names are defined in /etc/hosts as follows :
>>
>> 192.168.200.230 p10s
>> 192.168.200.231 p10s2
>>
>> so I suppose that something does not listen on
>> 192.168.200.231, but I don't have an idea what is that thing and how to
>> make it listen there. Or how to achieve this goal (utilization of both
>> Ethernets for iSCSI) in different way. Shoud I aggregate Ethernets into
>> a 'bond' interface with single IP ? Should I build and use 'lrbd' tool
>> instead of 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+
>> ?
>> What other directions

Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread John Hearns
Errr   is this very wise ?

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)


In my experience setting up to interfaces on the same subnet means that
your ssystem doesnt know which one to route traffic through...






On 1 June 2018 at 09:01, Wladimir Mutel  wrote:

> Dear all,
>
> I am experimenting with Ceph setup. I set up a single node
> (Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA HDDs,
> Ubuntu 18.04 Bionic, Ceph packages from
> http://download.ceph.com/debian-luminous/dists/xenial/
> and iscsi parts built manually per
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/)
> Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
> in the CRUSH map to run with single host.
>
> I have both its Ethernets connected to the same LAN,
> with different IPs in the same subnet
> (like, 192.168.200.230/24 and 192.168.200.231/24)
> mon_host in ceph.conf is set to 192.168.200.230,
> and ceph daemons (mgr, mon, osd) are listening to this IP.
>
> What I would like to finally achieve, is to provide multipath
> iSCSI access through both these Ethernets to Ceph RBDs,
> and apparently, gwcli does not allow me to add a second
> gateway to the same target. It is going like this :
>
> /iscsi-target> create iqn.2018-06.host.test:test
> ok
> /iscsi-target> cd iqn.2018-06.host.test:test/gateways
> /iscsi-target...test/gateways> create p10s 192.168.200.230 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> ok
> /iscsi-target...test/gateways> create p10s2 192.168.200.231 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> Failed : Gateway creation failed, gateway(s) 
> unavailable:192.168.200.231(UNKNOWN
> state)
>
> host names are defined in /etc/hosts as follows :
>
> 192.168.200.230 p10s
> 192.168.200.231 p10s2
>
> so I suppose that something does not listen on 192.168.200.231,
> but I don't have an idea what is that thing and how to make it listen
> there. Or how to achieve this goal (utilization of both Ethernets for
> iSCSI) in different way. Shoud I aggregate Ethernets into a 'bond'
> interface with single IP ? Should I build and use 'lrbd' tool instead of
> 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+ ?
> What other directions could you give me on this task ?
> Thanks in advance for your replies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-16 Thread John Hearns
The answer given at the seminar yesterday was that a practical limit was
around 60km.
I don't think 100km is that much longer.  I defer to the experts here.






On 16 May 2018 at 15:24, Up Safe <upands...@gmail.com> wrote:

> Hi,
>
> About a 100 km.
> I have a 2-4ms latency between them.
>
> Leon
>
> On Wed, May 16, 2018, 16:13 John Hearns <hear...@googlemail.com> wrote:
>
>> Leon,
>> I was at a Lenovo/SuSE seminar yesterday and asked a similar question
>> regarding separated sites.
>> How far apart are these two geographical locations?   It does matter.
>>
>> On 16 May 2018 at 15:07, Up Safe <upands...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to build a multi site setup.
>>> But the only guides I've found on the net were about building it with
>>> object storage or rbd.
>>> What I need is cephfs.
>>>
>>> I.e. I need to have 2 synced file storages at 2 geographical locations.
>>> Is this possible?
>>>
>>> Also, if I understand correctly - cephfs is just a component on top of
>>> the object storage.
>>> Following this logic - it should be possible, right?
>>>
>>> Or am I totally off here?
>>>
>>> Thanks,
>>> Leon
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-16 Thread John Hearns
Leon,
I was at a Lenovo/SuSE seminar yesterday and asked a similar question
regarding separated sites.
How far apart are these two geographical locations?   It does matter.

On 16 May 2018 at 15:07, Up Safe  wrote:

> Hi,
>
> I'm trying to build a multi site setup.
> But the only guides I've found on the net were about building it with
> object storage or rbd.
> What I need is cephfs.
>
> I.e. I need to have 2 synced file storages at 2 geographical locations.
> Is this possible?
>
> Also, if I understand correctly - cephfs is just a component on top of the
> object storage.
> Following this logic - it should be possible, right?
>
> Or am I totally off here?
>
> Thanks,
> Leon
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread John Hearns
Blair,
   methinks someone is doing bitcoin mining on your systems when they are
idle   :-)

I WAS going to say that maybe the cpupower utility needs an update to cope
with that generation of CPUs.
But 7proc/cpuinfo never lies  (does it ?)




On 16 May 2018 at 13:22, Blair Bethwaite  wrote:

> On 15 May 2018 at 08:45, Wido den Hollander  wrote:
>>
>> > We've got some Skylake Ubuntu based hypervisors that we can look at to
>> > compare tomorrow...
>> >
>>
>> Awesome!
>
>
> Ok, so results still inconclusive I'm afraid...
>
> The Ubuntu machines we're looking at (Dell R740s and C6420s running with
> Performance BIOS power profile, which amongst other things disables cstates
> and enables turbo) are currently running either a 4.13 or a 4.15 HWE kernel
> - we needed 4.13 to support PERC10 and even get them booting from local
> storage, then 4.15 to get around a prlimit bug that was breaking Nova
> snapshots, so here we are. Where are you getting 4.16,
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16/ ?
>
> So interestingly in our case we seem to have no cpufreq driver loaded.
> After installing linux-generic-tools (cause cpupower is supposed to
> supersede cpufrequtils I think?):
>
> rr42-03:~$ uname -a
> Linux rcgpudc1rr42-03 4.15.0-13-generic #14~16.04.1-Ubuntu SMP Sat Mar 17
> 03:04:59 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> rr42-03:~$ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-4.15.0-13-generic root=/dev/mapper/vg00-root ro
> intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=1
>
> rr42-03:~$ lscpu
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):36
> On-line CPU(s) list:   0-35
> Thread(s) per core:1
> Core(s) per socket:18
> Socket(s): 2
> NUMA node(s):  2
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 85
> Model name:Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:  4
> CPU MHz:   3400.956
> BogoMIPS:  5401.45
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  1024K
> L3 cache:  25344K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64
> monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca
> sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
> invpcid_single pti intel_ppin mba tpr_shadow vnmi flexpriority ept vpid
> fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a
> avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw
> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total
> cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke
>
> rr42-03:~$ sudo cpupower frequency-info
> analyzing CPU 0:
>   no or unknown cpufreq driver is active on this CPU
>   CPUs which run at the same hardware frequency: Not Available
>   CPUs which need to have their frequency coordinated by software: Not
> Available
>   maximum transition latency:  Cannot determine or is not supported.
> Not Available
>   available cpufreq governors: Not Available
>   Unable to determine current policy
>   current CPU frequency: Unable to call hardware
>   current CPU frequency:  Unable to call to kernel
>   boost state support:
> Supported: yes
> Active: yes
>
>
> And of course there is nothing under sysfs (/sys/devices/system/cpu*). But
> /proc/cpuinfo and cpupower-monitor show that we seem to be hitting turbo
> freqs:
>
> rr42-03:~$ sudo cpupower monitor
>   |Nehalem|| Mperf
> PKG |CORE|CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq
>0|   0|   0|  0.00|  0.00|  0.00|  0.00||  0.05| 99.95|  3391
>0|   1|   4|  0.00|  0.00|  0.00|  0.00||  0.02| 99.98|  3389
>0|   2|   8|  0.00|  0.00|  0.00|  0.00||  0.14| 99.86|  3067
>0|   3|   6|  0.00|  0.00|  0.00|  0.00||  0.01| 99.99|  3385
>0|   4|   2|  0.00|  0.00|  0.00|  0.00||  0.09| 99.91|  3119
>0|   8|  12|  0.00|  0.00|  0.00|  0.00||  0.03| 99.97|  3312
>0|   9|  16|  0.00|  0.00|  0.00|  0.00||  0.11| 99.89|  3157
>0|  10|  14|  0.00|  0.00|  0.00|  0.00||  0.01| 99.99|  3352
>0|  11|  10|  0.00|  0.00|  0.00|  0.00||  0.05| 99.95|  3390
>0|  16|  20|  0.00|  0.00|  0.00|  0.00||  0.00|100.00|  3387
>0|  17|  24|  0.00|  0.00|  0.00|  0.00||  0.22| 99.78|  3115
>0|  18|  26|  0.00|  0.00|  

Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread John Hearns
Hello Yoann. I am working with similar issues at the moment in a biotech
company in Denmark.

First of all what authentication setup are you using?
If you are using sssd there is a very simple and useful utility called
sss_override
You can óverride' the uid which you get from LDAP with the genuine one.

Oops. On reading your email more closely.
Why not just add ceph to your /etc/group  file?





On 15 May 2018 at 08:58, Yoann Moulin  wrote:

> Hello,
>
> I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I
> have to install ceph-common to mount a cephfs filesystem but ceph-common
> fails because a user with uid 65045 already exist with a group also set at
> 65045.
>
> Server under Ubuntu 16.04.4 LTS
>
> > Setting up ceph-common (12.2.5-1xenial) ...
> > Adding system user cephdone
> > Setting system user ceph properties..usermod: group 'ceph' does not exist
> > dpkg: error processing package ceph-common (--configure):
> >  subprocess installed post-installation script returned error exit
> status 6
>
> The user is correctly created but the group not.
>
> > # grep ceph /etc/passwd
> > ceph:x:64045:64045::/home/ceph:/bin/false
> > # grep ceph /etc/group
> > #
> Is there a workaround for that?
>
> --
> Yoann Moulin
> EPFL IC-IT
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-14 Thread John Hearns
Wido, I am going to put my rather large foot in it here.
I am sure it is understood that the Turbo mode will not keep all cores at
the maximum frequency at any given time.
There is a thermal envelope for the chip, and the chip works to keep  the
power dissipation within that envelope.
>From what I gather there is a range of thermal limits even within a given
processor SKU, so every chip will exhibit
different Turbo mode behaviour.
And I am sure we all know that when AVX comes into use the Turbo limit is
lower.

I guess what I am saying that for to have reproducible behaviour, if you
care about it for timings etc. Turbo
can be switched off.
Before you say it, in this case you want to achieve the minimum latency and
reproducibility at the Mhz level is not important.

Also worth saying that cooling is important with Turboboost comes into
play. I heard a paper at an HPC Advisory Council
where a Russian setup by Lenovo got significantly more performance at the
HPC acceptance testing stage when cooling was turned up.

I guess my rambling has not added much to this debate, sorry.
cue a friendly Intel engineer to wander in and tell us exactly what is
going on.



On 14 May 2018 at 15:13, Wido den Hollander  wrote:

>
>
> On 05/01/2018 10:19 PM, Nick Fisk wrote:
> > 4.16 required?
> > https://www.phoronix.com/scan.php?page=news_item=Skylake-
> X-P-State-Linux-
> > 4.16
> >
>
> I've been trying with the 4.16 kernel for the last few days, but still,
> it's not working.
>
> The CPU's keep clocking down to 800Mhz
>
> I've set scaling_min_freq=scaling_max_freq in /sys, but that doesn't
> change a thing. The CPUs keep scaling down.
>
> Still not close to the 1ms latency with these CPUs :(
>
> Wido
>
> >
> > -Original Message-
> > From: ceph-users  On Behalf Of Blair
> > Bethwaite
> > Sent: 01 May 2018 16:46
> > To: Wido den Hollander 
> > Cc: ceph-users ; Nick Fisk 
> > Subject: Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling
> on
> > NVMe/SSD Ceph OSDs
> >
> > Also curious about this over here. We've got a rack's worth of R740XDs
> with
> > Xeon 4114's running RHEL 7.4 and intel-pstate isn't even active on them,
> > though I don't believe they are any different at the OS level to our
> > Broadwell nodes (where it is loaded).
> >
> > Have you tried poking the kernel's pmqos interface for your use-case?
> >
> > On 2 May 2018 at 01:07, Wido den Hollander  wrote:
> >> Hi,
> >>
> >> I've been trying to get the lowest latency possible out of the new
> >> Xeon Scalable CPUs and so far I got down to 1.3ms with the help of Nick.
> >>
> >> However, I can't seem to pin the CPUs to always run at their maximum
> >> frequency.
> >>
> >> If I disable power saving in the BIOS they stay at 2.1Ghz (Silver
> >> 4110), but that disables the boost.
> >>
> >> With the Power Saving enabled in the BIOS and when giving the OS all
> >> control for some reason the CPUs keep scaling down.
> >>
> >> $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >>
> >> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report
> >> errors and bugs to cpuf...@vger.kernel.org, please.
> >> analyzing CPU 0:
> >>   driver: intel_pstate
> >>   CPUs which run at the same hardware frequency: 0
> >>   CPUs which need to have their frequency coordinated by software: 0
> >>   maximum transition latency: 0.97 ms.
> >>   hardware limits: 800 MHz - 3.00 GHz
> >>   available cpufreq governors: performance, powersave
> >>   current policy: frequency should be within 800 MHz and 3.00 GHz.
> >>   The governor "performance" may decide which speed to
> use
> >>   within this range.
> >>   current CPU frequency is 800 MHz.
> >>
> >> I do see the CPUs scale up to 2.1Ghz, but they quickly scale down
> >> again to 800Mhz and that hurts latency. (50% difference!)
> >>
> >> With the CPUs scaling down to 800Mhz my latency jumps from 1.3ms to
> >> 2.4ms on avg. With turbo enabled I hope to get down to 1.1~1.2ms on avg.
> >>
> >> $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> >> performance
> >>
> >> Everything seems to be OK and I would expect the CPUs to stay at
> >> 2.10Ghz, but they aren't.
> >>
> >> C-States are also pinned to 0 as a boot parameter for the kernel:
> >>
> >> processor.max_cstate=1 intel_idle.max_cstate=0
> >>
> >> Running Ubuntu 16.04.4 with the 4.13 kernel from the HWE from Ubuntu.
> >>
> >> Has anybody tried this yet with the recent Intel Xeon Scalable CPUs?
> >>
> >> Thanks,
> >>
> >> Wido
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Cheers,
> > ~Blairo
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > 

Re: [ceph-users] CentOS release 7.4.1708 and selinux-policy-base >= 3.13.1-166.el7_4.9

2018-05-03 Thread John Hearns
Anton

if you still cannot install the ceph RPMs, becuse of that dependency,
do as Ruben suggests - install selinux-policy-targeted

Then you use the RPM option   --nodeps  which will ignore the dependency
requirements.
Do not be afraid to use this option  - and do not use it blindly either.
Sometimes you need it.

You are using Yum probavly - I don't think there is an equivalent option. I
may be wrong.












On 3 May 2018 at 10:57, Ruben Kerkhof  wrote:

> On Thu, May 3, 2018 at 1:33 AM,   wrote:
> >
> > Hi all.
>
> Hi Anton,
>
> >
> > We try to setup our first CentOS 7.4.1708 CEPH cluster, based on
> Luminous 12.2.5. What we get is:
> >
> >
> > Error: Package: 2:ceph-selinux-12.2.5-0.el7.x86_64 (Ceph-Luminous)
> >Requires: selinux-policy-base >= 3.13.1-166.el7_4.9
> >
> >
> > __Host infos__:
> >
> > root> lsb_release -d
> > Description:CentOS Linux release 7.4.1708 (Core)
> >
> > root@> uname -a
> > Linux  3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40
> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> >
> > __Question__:
> > Where can I find the elinux-policy-base-3.13.1-166.el7_4.9 package?
>
> It is provided by selinux-policy-targeted:
>
> ruben@localhost: ~$ rpm -q --provides selinux-policy-targeted
> config(selinux-policy-targeted) = 3.13.1-166.el7_4.9
> selinux-policy-base = 3.13.1-166.el7_4.9
> selinux-policy-targeted = 3.13.1-166.el7_4.9
>
> >
> >
> > Regards
> >  Anton
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> Kind regards,
>
> Ruben Kerkhof
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Please help me get rid of Slow / blocked requests

2018-05-01 Thread John Hearns
>Sounds like one of the following could be happening:
> 1) RBD write caching doing the 37K IOPS, which will need to flush at some
point which causes the drop.

I am not sure this will help Shantur. But you could try running  'watch cat
/proc/meminfo' during a benchmark run.
You might be able to spot caches being flushed.
iostat is probably a better tool




On 1 May 2018 at 13:13, Van Leeuwen, Robert  wrote:

> > On 5/1/18, 12:02 PM, "ceph-users on behalf of Shantur Rathore" <
> ceph-users-boun...@lists.ceph.com on behalf of shantur.rath...@gmail.com>
> wrote:
> >I am not sure if the benchmark is overloading the cluster as 3 out of
> >   5 runs the benchmark goes around 37K IOPS and suddenly for the
> >problematic runs it drops to 0 IOPS for a couple of minutes and then
> >   resumes. This is a test cluster so nothing else is running off it.
>
> Sounds like one of the following could be happening:
> 1) RBD write caching doing the 37K IOPS, which will need to flush at some
> point which causes the drop.
>
> 2) Hardware performance drops over time.
> You could be hitting hardware write cache on RAID or disk controllers.
> Especially SSDs can have a performance drop after writing to them for a
> while due to either SSD housekeeping or caches filling up.
> So always run benchmarks over longer periods to make sure you get the
> actual sustainable performance of your cluster.
>
> Cheers,
> Robert van Leeuwen
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-26 Thread John Hearns
Ronny, talking about reboots, has anyone had experience of live kernel
patching with CEPH?  I am asking out of simple curiosity.


On 25 April 2018 at 19:40, Ronny Aasen  wrote:

> the difference in cost between 2 and 3 servers are not HUGE. but the
> reliability  difference between a size 2/1 pool and a 3/2 pool is massive.
> a 2/1 pool is just a single fault during maintenance away from dataloss.
> but you need multiple simultaneous faults, and have very bad luck to break
> a 3/2 pool
>
> I would recommend rather using 2/2 pools if you are willing to accept a
> little downtime when a disk dies.  the cluster io would stop until the
> disks backfill to cover for the lost disk.
> but it is better then having inconsistent pg's or dataloss because a disk
> crashed during a routine reboot, or 2 disks
>
> also worth to read this link https://www.spinics.net/lists/
> ceph-users/msg32895.html   a good explanation.
>
> you have good backups and are willing to restore the whole pool. And it is
> of course your privilege to run 2/1 pools but be mind full of the risks of
> doing so.
>
>
> kind regards
> Ronny Aasen
>
> BTW: i did not know ubuntu automagically rebooted after a upgrade. you can
> probably avoid that reboot somehow in ubuntu. and do the restarts of
> services manually. if you wish to maintain service during upgrade
>
>
>
>
>
> On 25.04.2018 11:52, Ranjan Ghosh wrote:
>
>> Thanks a lot for your detailed answer. The problem for us, however, was
>> that we use the Ceph packages that come with the Ubuntu distribution. If
>> you do a Ubuntu upgrade, all packages are upgraded in one go and the server
>> is rebooted. You cannot influence anything or start/stop services
>> one-by-one etc. This was concering me, because the upgrade instructions
>> didn't mention anything about an alternative or what to do in this case.
>> But someone here enlightened me that - in general - it all doesnt matter
>> that much *if you are just accepting a downtime*. And, indeed, it all
>> worked nicely. We stopped all services on all servers, upgraded the Ubuntu
>> version, rebooted all servers and were ready to go again. Didn't encounter
>> any problems there. The only problem turned out to be our own fault and
>> simply a firewall misconfiguration.
>>
>> And, yes, we're running a "size:2 min_size:1" because we're on a very
>> tight budget. If I understand correctly, this means: Make changes of files
>> to one server. *Eventually* copy them to the other server. I hope this
>> *eventually* means after a few minutes. Up until now I've never experienced
>> *any* problems with file integrity with this configuration. In fact, Ceph
>> is incredibly stable. Amazing. I have never ever had any issues whatsoever
>> with broken files/partially written files, files that contain garbage etc.
>> Even after starting/stopping services, rebooting etc. With GlusterFS and
>> other Cluster file system I've experienced many such problems over the
>> years, so this is what makes Ceph so great. I have now a lot of trust in
>> Ceph, that it will eventually repair everything :-) And: If a file that has
>> been written a few seconds ago is really lost it wouldnt be that bad for
>> our use-case. It's a web-server. Most important stuff is in the DB. We have
>> hourly backups of everything. In a huge emergency, we could even restore
>> the backup from an hour ago if we really had to. Not nice, but if it
>> happens every 6 years or sth due to some freak hardware failure, I think it
>> is manageable. I accept it's not the recommended/perfect solution if you
>> have infinite amounts of money at your hands, but in our case, I think it's
>> not extremely audacious either to do it like this, right?
>>
>>
>> Am 11.04.2018 um 19:25 schrieb Ronny Aasen:
>>
>>> ceph upgrades are usualy not a problem:
>>> ceph have to be upgraded in the right order. normally when each service
>>> is on its own machine this is not difficult.
>>> but when you have mon, mgr, osd, mds, and klients on the same host you
>>> have to do it a bit carefully..
>>>
>>> i tend to have a terminal open with "watch ceph -s" running, and i never
>>> do another service until the health is ok again.
>>>
>>> first apt upgrade the packages on all the hosts. This only update the
>>> software on disk and not the running services.
>>> then do the restart of services in the right order.  and only on one
>>> host at the time
>>>
>>> mons: first you restart the mon service on all mon running hosts.
>>> all the 3 mons are active at the same time, so there is no "shifting
>>> around" but make sure the quorum is ok again before you do the next mon.
>>>
>>> mgr: then restart mgr on all hosts that run mgr. there is only one
>>> active mgr at the time now, so here there will be a bit of shifting around.
>>> but it is only for statistics/management so it may affect your ceph -s
>>> command, but not the cluster operation.
>>>
>>> osd: restart osd processes one osd at the time, make sure 

Re: [ceph-users] Bluestore caching, flawed by design?

2018-04-02 Thread John Hearns
Christian, you mention single socket systems for storage servers.
I often thought that the Xeon-D would be ideal as a building block for
storage servers
https://www.intel.com/content/www/us/en/products/processors/xeon/d-processors.html
Low power, and a complete System-On-Chip with 10gig Ethernet.

I haven't been following these processors lately. Is anyone building  CEPH
clusters using them

On 2 April 2018 at 02:59, Christian Balzer  wrote:

>
> Hello,
>
> firstly, Jack pretty much correctly correlated my issues to Mark's points,
> more below.
>
> On Sat, 31 Mar 2018 08:24:45 -0500 Mark Nelson wrote:
>
> > On 03/29/2018 08:59 PM, Christian Balzer wrote:
> >
> > > Hello,
> > >
> > > my crappy test cluster was rendered inoperational by an IP renumbering
> > > that wasn't planned and forced on me during a DC move, so I decided to
> > > start from scratch and explore the fascinating world of
> Luminous/bluestore
> > > and all the assorted bugs. ^_-
> > > (yes I could have recovered the cluster by setting up a local VLAN with
> > > the old IPs, extract the monmap, etc, but I consider the need for a
> > > running monitor a flaw, since all the relevant data was present in the
> > > leveldb).
> > >
> > > Anyways, while I've read about bluestore OSD cache in passing here, the
> > > back of my brain was clearly still hoping that it would use
> pagecache/SLAB
> > > like other filesystems.
> > > Which after my first round of playing with things clearly isn't the
> case.
> > >
> > > This strikes me as a design flaw and regression because:
> >
> > Bluestore's cache is not broken by design.
> >
>
> During further tests I verified something that caught my attention out of
> the corner of my when glancing at atop output of the OSDs during my fio
> runs.
>
> Consider this fio run, after having done the same with write to populate
> the file and caches (1GB per OSD default on the test cluster, 20 OSDs
> total on 5 nodes):
> ---
> $ fio --size=8G --ioengine=libaio --invalidate=1 --direct=1 --numjobs=1
> --rw=randread --name=fiojob --blocksize=4M --iodepth=32
> ---
>
> This is being run against a kernel mounted RBD image.
> On the Luminous test cluster it will read the data from the disks,
> completely ignoring the pagecache on the host (as expected and desired)
> AND the bluestore cache.
>
> On a Jewel based test cluster with filestore the reads will be served from
> the pagecaches of the OSD nodes, not only massively improving speed but
> more importantly spindle contention.
>
> My guess is that bluestore treats "direct" differently than the kernel
> accessing a filestore based OSD and I'm not sure what the "correct"
> behavior here is.
> But somebody migrating to bluestore with such a use case and plenty of RAM
> on their OSD nodes is likely to notice this and not going to be happy about
> it.
>
>
> > I'm not totally convinced that some of the trade-offs we've made with
> > bluestore's cache implementation are optimal, but I think you should
> > consider cooling your rhetoric down.
> >
> > > 1. Completely new users may think that bluestore defaults are fine and
> > > waste all that RAM in their machines.
> >
> > What does "wasting" RAM mean in the context of a node running ceph? Are
> > you upset that other applications can't come in and evict bluestore
> > onode, OMAP, or object data from cache?
> >
> What Jack pointed out, unless you go around and start tuning things,
> all available free RAM won't be used for caching.
>
> This raises another point, it being per process data and from skimming
> over some bluestore threads here, if you go and raise the cache to use
> most RAM during normal ops you're likely to be visited by the evil OOM
> witch during heavy recovery OPS.
>
> Whereas the good ole pagecache would just get evicted in that scenario.
>
> > > 2. Having a per OSD cache is inefficient compared to a common cache
> like
> > > pagecache, since an OSD that is busier than others would benefit from a
> > > shared cache more.
> >
> > It's only "inefficient" if you assume that using the pagecache, and more
> > generally, kernel syscalls, is free.  Yes the pagecache is convenient
> > and yes it gives you a lot of flexibility, but you pay for that
> > flexibility if you are trying to do anything fast.
> >
> > For instance, take the new KPTI patches in the kernel for meltdown. Look
> > at how badly it can hurt MyISAM database performance in MariaDB:
> >
> I, like many others here, have decided that all the Meltdown and Spectre
> patches are a bit pointless on pure OSD nodes, because if somebody on the
> node is running random code you're already in deep doodoo.
>
> That being said, I will totally concur that syscalls aren't free.
> However given the latencies induced by the rather long/complex code IOPS
> have to transverse within Ceph, how much of a gain would you say
> eliminating these particular calls did achieve?
>
> > https://mariadb.org/myisam-table-scan-performance-kpti/
> >
> > MyISAM does not have 

Re: [ceph-users] Bluestore caching, flawed by design?

2018-04-02 Thread John Hearns
> A long time ago I was responsible for validating the performance of CXFS
on an SGI Altix UV distributed shared-memory supercomputer.  As it turns
out, we could achieve about 22GB/s writes with XFS (a huge >number at the
time), but CXFS was 5-10x slower.  A big part of that turned out to be the
kernel distributing page cache across the Numalink5 interconnects to remote
memory.
> The problem can potentially happen on any NUMA system to varying degrees.

That's very interesting. I used to manage Itanium Altixes and then an UV
system. That work sounds very interesting.
I set up cpusets on the UV system, which had a big performance increase
since user jobs had CPUs and memory close to each other.
I also had a boot cpuset on the first blade, which had the fibrechannel
HBA, so I guess that had a similar effect in that the CXFS processes were
local to the IO card.
UV was running SuSE - sorry.

On the subject of memory allocation, GPFS uses an amount of pagepool
memory. The given advice always seems to be make this large.
There is one fixed pagepool on a server, even if it has multiple NSDs
How does this compare to CEPH memory allocation?







On 31 March 2018 at 15:24, Mark Nelson  wrote:

> On 03/29/2018 08:59 PM, Christian Balzer wrote:
>
> Hello,
>>
>> my crappy test cluster was rendered inoperational by an IP renumbering
>> that wasn't planned and forced on me during a DC move, so I decided to
>> start from scratch and explore the fascinating world of Luminous/bluestore
>> and all the assorted bugs. ^_-
>> (yes I could have recovered the cluster by setting up a local VLAN with
>> the old IPs, extract the monmap, etc, but I consider the need for a
>> running monitor a flaw, since all the relevant data was present in the
>> leveldb).
>>
>> Anyways, while I've read about bluestore OSD cache in passing here, the
>> back of my brain was clearly still hoping that it would use pagecache/SLAB
>> like other filesystems.
>> Which after my first round of playing with things clearly isn't the case.
>>
>> This strikes me as a design flaw and regression because:
>>
>
> Bluestore's cache is not broken by design.
>
> I'm not totally convinced that some of the trade-offs we've made with
> bluestore's cache implementation are optimal, but I think you should
> consider cooling your rhetoric down.
>
> 1. Completely new users may think that bluestore defaults are fine and
>> waste all that RAM in their machines.
>>
>
> What does "wasting" RAM mean in the context of a node running ceph? Are
> you upset that other applications can't come in and evict bluestore onode,
> OMAP, or object data from cache?
>
> 2. Having a per OSD cache is inefficient compared to a common cache like
>> pagecache, since an OSD that is busier than others would benefit from a
>> shared cache more.
>>
>
> It's only "inefficient" if you assume that using the pagecache, and more
> generally, kernel syscalls, is free.  Yes the pagecache is convenient and
> yes it gives you a lot of flexibility, but you pay for that flexibility if
> you are trying to do anything fast.
>
> For instance, take the new KPTI patches in the kernel for meltdown. Look
> at how badly it can hurt MyISAM database performance in MariaDB:
>
> https://mariadb.org/myisam-table-scan-performance-kpti/
>
> MyISAM does not have a dedicated row cache and instead caches row data in
> the page cache as you suggest Bluestore should do for it's data.  Look at
> how badly KPTI hurts performance (~40%). Now look at ARIA with a dedicated
> 128MB cache (less than 1%).  KPTI is a really good example of how much this
> stuff can hurt you, but syscalls, context switches, and page faults were
> already expensive even before meltdown.  Not to mention that right now
> bluestore keeps onodes and buffers stored in it's cache in an unencoded
> form.
>
> Here's a couple of other articles worth looking at:
>
> https://eng.uber.com/mysql-migration/
> https://www.scylladb.com/2018/01/07/cost-of-avoiding-a-meltdown/
> http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-melt
> down-performance.html
>
> 3. A uniform OSD cache size of course will be a nightmare when having
>> non-uniform HW, either with RAM or number of OSDs.
>>
>
> Non-Uniform hardware is a big reason that pinning dedicated memory to
> specific cores/sockets is really nice vs relying on potentially remote
> memory page cache reads.  A long time ago I was responsible for validating
> the performance of CXFS on an SGI Altix UV distributed shared-memory
> supercomputer.  As it turns out, we could achieve about 22GB/s writes with
> XFS (a huge number at the time), but CXFS was 5-10x slower.  A big part of
> that turned out to be the kernel distributing page cache across the
> Numalink5 interconnects to remote memory.  The problem can potentially
> happen on any NUMA system to varying degrees.
>
> Personally I have two primary issues with bluestore's memory configuration
> right now:
>
> 1) It's too complicated for users to figure