[ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-09 Thread Brent Kennedy
As per a previous thread, my pgs are set too high.  I tried adjusting the
"mon max pg per osd" up higher and higher, which did clear the
error(restarted monitors and managers each time), but it seems that data
simply wont move around the cluster.  If I stop the primary OSD of an
incomplete pg, the cluster just shows those affected pages as
active+undersized+degraded:

 

services:

mon: 3 daemons, quorum mon1,mon2,mon3

mgr: mon3(active), standbys: mon1, mon2

osd: 43 osds: 43 up, 43 in

 

data:

pools:   11 pools, 36896 pgs

objects: 8148k objects, 10486 GB

usage:   21532 GB used, 135 TB / 156 TB avail

pgs: 0.043% pgs unknown

 0.011% pgs not active

 362942/16689272 objects degraded (2.175%)

 34483 active+clean

 2393  active+undersized+degraded

 16unknown

 3 incomplete

1   down

 

The 16 unknown are from me trying to setup a new pool, which was successful,
but when I tried to copy an existing pool to it, the command just sat there.
I did this in the hopes of copying the existing oversized pg pools to new
pools and then deleting the old pools.  I really didn't want to move the
data, but the issue needs to be dealt with.

 

If I start the OSD back up, the cluster goes back to:

services:

mon: 3 daemons, quorum mon1,mon2,mon3

mgr: mon3(active), standbys: mon1, mon2

osd: 43 osds: 43 up, 43 in

 

  data:

pools:   11 pools, 36896 pgs

objects: 8148k objects, 10486 GB

usage:   21533 GB used, 135 TB / 156 TB avail

pgs: 0.041% pgs unknown

 0.014% pgs not active

 36876 active+clean

 16unknown

 4 incomplete

 

The cluster was upgraded from Hammer .94 without issues to Jewel and then
Luminous 12.2.2 last week using the latest ceph-deploy.

 

I guess the issue at the moment is that data is not moving either for
recovery or new data being added( basically the new data just times out ).  

 

I also adjusted "osd max pg per osd hard ratio " to 5, but that didn't seem
to trigger any data moved.  I did restart the OSDs each time I changed it.
The data just wont finish moving.  "ceph -w" shows this:

2018-01-10 07:49:27.715163 osd.20 [WRN] slow request 960.675164 seconds old,
received at 2018-01-10 07:33:27.039907: osd_op(client.3542508.0:4097 14.0
14.50e8d0b0 (undecoded) ondisk+write+known_if_redirected e125984) currently
queued_for_pg

 

Ceph health detail this:

HEALTH_ERR Reduced data availability: 20 pgs inactive, 4 pgs incomplete;
Degraded data redundancy: 20 pgs unclean; 2 slow requests are blocked > 32
sec; 66 stuck requests are blocked > 4096 sec

PG_AVAILABILITY Reduced data availability: 20 pgs inactive, 4 pgs incomplete

pg 11.720 is incomplete, acting [21,10]

pg 11.9ab is incomplete, acting [14,2]

pg 11.9fb is incomplete, acting [32,43]

pg 11.c13 is incomplete, acting [42,26]

pg 14.0 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.1 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.2 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.3 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.4 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.5 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.6 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.7 is stuck inactive for 1046.844458, current state
creating+activating, last acting [21,40,5]

pg 14.8 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.9 is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.a is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.b is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.c is stuck inactive for 1046.844458, current state unknown, last
acting []

   pg 14.d is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.e is stuck inactive for 1046.844458, current state unknown, last
acting []

pg 14.f is stuck inactive for 1046.844458, current state unknown, last
acting []

PG_DEGRADED Degraded data redundancy: 20 pgs unclean

pg 11.720 is stuck unclean since forever, current state incomplete, last
acting [21,10]

pg 11.9ab is stuck unclean since forever, current state incomplete, last
acting [14,2]

pg 11.9fb is stuck unclean since forever, current state incomplete, last
acting [32,43]

pg 11.c13 is stuck unclean since forever, current state incomplete, last
acting [42,26]

pg 14.0 is stuck unclean for 1046.844458, current state unknown, last
acting []

pg 14.1 is stuck unclean for 1046.844458, current state unknown, last
acting []

pg 14.2 is stuck unclean for 

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Fabian Grünbichler
On Tue, Jan 09, 2018 at 02:14:51PM -0500, Alfredo Deza wrote:
> On Tue, Jan 9, 2018 at 1:35 PM, Reed Dier  wrote:
> > I would just like to mirror what Dan van der Ster’s sentiments are.
> >
> > As someone attempting to move an OSD to bluestore, with limited/no LVM
> > experience, it is a completely different beast and complexity level compared
> > to the ceph-disk/filestore days.
> >
> > ceph-deploy was a very simple tool that did exactly what I was looking to
> > do, but now we have deprecated ceph-disk halfway into a release, ceph-deploy
> > doesn’t appear to fully support ceph-volume, which is now the official way
> > to manage OSDs moving forward.
> 
> ceph-deploy now fully supports ceph-volume, we should get a release soon
> 
> >
> > My ceph-volume create statement ‘succeeded’ but the OSD doesn’t start, so
> > now I am trying to zap the disk to try to recreate the OSD, and the zap is
> > failing as Dan’s did.
> 
> I would encourage you to open a ticket in the tracker so that we can
> improve on what failed for you
> 
> http://tracker.ceph.com/projects/ceph-volume/issues/new
> 
> ceph-volume keeps thorough logs in /var/log/ceph/ceph-volume.log and
> /var/log/ceph/ceph-volume-systemd.log
> 
> If you create a ticket, please make sure to add all the output and
> steps that you can
> >
> > And yes, I was able to get it zapped using the lvremove, vgremove, pvremove
> > commands, but that is not obvious to someone who hasn’t used LVM extensively
> > for storage management before.
> >
> > I also want to mirror Dan’s sentiments about the unnecessary complexity
> > imposed on what I expect is the default use case of an entire disk being
> > used. I can’t see anything more than the ‘entire disk’ method being the
> > largest use case for users of ceph, especially the smaller clusters trying
> > to maximize hardware/spend.
> 
> We don't take lightly the introduction of LVM here. The new tool is
> addressing several insurmountable issues with how ceph-disk operated.
> 
> Although using an entire disk might be easier in the use case you are
> in, it is certainly not the only thing we have to support, so then
> again, we can't
> reliably decide what strategy would be best to destroy that volume, or
> group, or if the PV should be destroyed as well.

wouldn't it be possible to detect on creation that it is a full physical
disk that gets initialized completely by ceph-volume, store that in the
metadata somewhere and clean up accordingly when destroying the OSD?

> 
> The 'zap' sub-command will allow that lv to be reused for an OSD and
> that should work. Again, if it isn't sufficient, we really do need
> more information and a
> ticket in the tracker is the best way.
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs going down/up at random

2018-01-09 Thread Mike O'Connor
On 10/01/2018 4:24 PM, Sam Huracan wrote:
> Hi Mike,
>
> Could you show system log at moment osd down and up?
Ok so I have no idea how I missed this each time I looked but the syslog
does show a problem.

I've created the dump file mentioned in the log its 29M compressed so
any one who wants it I'll have to more directly send it.

Mike

--
Jan 10 15:56:31 pve ceph-osd[2722]: 2018-01-10 15:56:31.338068
7efe5eac1700 -1 abort: Corruption: block checksum mismatch
Jan 10 15:56:31 pve ceph-osd[2722]: *** Caught signal (Aborted) **
Jan 10 15:56:31 pve ceph-osd[2722]:  in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd[2722]:  ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd[2722]:  1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd[2722]:  2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd[2722]:  3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd[2722]:  4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd[2722]:  5:
(RocksDBStore::get(std::__cxx11::basic_string const&, char const*,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd[2722]:  6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd[2722]:  7:
(BlueStore::getattr(boost::intrusive_ptr&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd[2722]:  8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string const&, ceph::buffer::list*)+0x106) [0x55a8b35bde26]
Jan 10 15:56:31 pve ceph-osd[2722]:  9:
(PrimaryLogPG::get_snapset_context(hobject_t const&, bool,
std::map, ceph::buffer::list,
std::less >,
std::allocator const,
ceph::buffer::list> > > const*, bool)+0x3fb) [0x55a8b35081db]
Jan 10 15:56:31 pve ceph-osd[2722]:  10:
(PrimaryLogPG::get_object_context(hobject_t const&, bool,
std::map, ceph::buffer::list,
std::less >,
std::allocator const,
ceph::buffer::list> > > const*)+0xc39) [0x55a8b352fec9]
Jan 10 15:56:31 pve ceph-osd[2722]:  11:
(PrimaryLogPG::find_object_context(hobject_t const&,
std::shared_ptr*, bool, bool, hobject_t*)+0x387)
[0x55a8b3533687]
Jan 10 15:56:31 pve ceph-osd[2722]:  12:
(PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2214)
[0x55a8b3571694]
Jan 10 15:56:31 pve ceph-osd[2722]:  13:
(PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xec6) [0x55a8b352c436]
Jan 10 15:56:31 pve ceph-osd[2722]:  14:
(OSD::dequeue_op(boost::intrusive_ptr,
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3ab)
[0x55a8b33a99eb]
Jan 10 15:56:31 pve ceph-osd[2722]:  15:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr
const&)+0x5a) [0x55a8b3647eba]
Jan 10 15:56:31 pve ceph-osd[2722]:  16:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x55a8b33d0f4d]
Jan 10 15:56:31 pve ceph-osd[2722]:  17:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x55a8b39b806f]
Jan 10 15:56:31 pve ceph-osd[2722]:  18:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a8b39bb370]
Jan 10 15:56:31 pve ceph-osd[2722]:  19: (()+0x7494) [0x7efe796ad494]
Jan 10 15:56:31 pve ceph-osd[2722]:  20: (clone()+0x3f) [0x7efe78734aff]
Jan 10 15:56:31 pve ceph-osd[2722]: 2018-01-10 15:56:31.343532
7efe5eac1700 -1 *** Caught signal (Aborted) **
Jan 10 15:56:31 pve ceph-osd[2722]:  in thread 7efe5eac1700
thread_name:tp_osd_tp
Jan 10 15:56:31 pve ceph-osd[2722]:  ceph version 12.2.2
(215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
Jan 10 15:56:31 pve ceph-osd[2722]:  1: (()+0xa16664) [0x55a8b396b664]
Jan 10 15:56:31 pve ceph-osd[2722]:  2: (()+0x110c0) [0x7efe796b70c0]
Jan 10 15:56:31 pve ceph-osd[2722]:  3: (gsignal()+0xcf) [0x7efe7867efcf]
Jan 10 15:56:31 pve ceph-osd[2722]:  4: (abort()+0x16a) [0x7efe786803fa]
Jan 10 15:56:31 pve ceph-osd[2722]:  5:
(RocksDBStore::get(std::__cxx11::basic_string const&, char const*,
unsigned long, ceph::buffer::list*)+0x29f) [0x55a8b38a995f]
Jan 10 15:56:31 pve ceph-osd[2722]:  6:
(BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x5ae)
[0x55a8b382d2ae]
Jan 10 15:56:31 pve ceph-osd[2722]:  7:
(BlueStore::getattr(boost::intrusive_ptr&,
ghobject_t const&, char const*, ceph::buffer::ptr&)+0xf6) [0x55a8b382e326]
Jan 10 15:56:31 pve ceph-osd[2722]:  8:
(PGBackend::objects_get_attr(hobject_t const&,
std::__cxx11::basic_string

Re: [ceph-users] OSDs going down/up at random

2018-01-09 Thread Sam Huracan
Hi Mike,

Could you show system log at moment osd down and up?

On Jan 10, 2018 12:52, "Mike O'Connor"  wrote:

> On 10/01/2018 3:52 PM, Linh Vu wrote:
> >
> > Have you checked your firewall?
> >
> There are no ip tables rules at this time but connection tracking is
> enable. I would expect errors about running out of table space if that
> was an issue.
>
> Thanks
> Mike
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs going down/up at random

2018-01-09 Thread Mike O'Connor
On 10/01/2018 3:52 PM, Linh Vu wrote:
>
> Have you checked your firewall?
>
There are no ip tables rules at this time but connection tracking is
enable. I would expect errors about running out of table space if that
was an issue.

Thanks
Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs going down/up at random

2018-01-09 Thread Linh Vu
Have you checked your firewall?


From: ceph-users  on behalf of Mike O'Connor 

Sent: Wednesday, 10 January 2018 3:40:30 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] OSDs going down/up at random

Hi All

I have a ceph host (12.2.2) which has 14 OSDs which seem to go down the
up, what should I look at to try to identify the issue ?
The system has three LSI SAS9201-8i cards which is then connected 14
drives at this time. (option of 24 drives)
I have three of these chassis but only one is running right now so I
have CEPH set for singe node.

I have very carefully looks at the logs files and not found anything
which indicates any issues with the controller and the drives.

dmesg has these messages.
---
[78752.708932] libceph: osd3 http://10.1.6.2:6834 socket closed (con state OPEN)
[78752.710319] libceph: osd3 http://10.1.6.2:6834 socket closed (con state
CONNECTING)
[78753.426244] libceph: osd3 down
[78753.426640] libceph: osd3 down
[78776.496962] libceph: osd5 http://10.1.6.2:6810 socket closed (con state OPEN)
[78776.498626] libceph: osd5 http://10.1.6.2:6810 socket closed (con state
CONNECTING)
[78777.446384] libceph: osd5 down
[78777.446720] libceph: osd5 down
[78806.466973] libceph: osd3 up
[78806.467429] libceph: osd3 up
[78855.565098] libceph: osd10 http://10.1.6.2:6801 socket closed (con state 
OPEN)
[78855.567062] libceph: osd10 http://10.1.6.2:6801 socket closed (con state
CONNECTING)
[78856.554209] libceph: osd10 down
[78856.554357] libceph: osd10 down
[78868.265665] libceph: osd1 http://10.1.6.2:6830 socket closed (con state OPEN)
[78868.266347] libceph: osd1 http://10.1.6.2:6830 socket closed (con state
CONNECTING)
[78868.529575] libceph: osd1 down
[78869.469264] libceph: osd1 down
[78899.538533] libceph: osd10 up
[78899.538808] libceph: osd10 up
[78903.556418] libceph: osd5 up
[78905.309401] libceph: osd5 up
[78909.755499] libceph: osd1 up
[78912.008581] libceph: osd1 up
[78912.040872] libceph: osd4 http://10.1.6.2:6850 socket error on write
[78924.736964] libceph: osd8 http://10.1.6.2:6809 socket closed (con state OPEN)
[78924.738402] libceph: osd8 http://10.1.6.2:6809 socket closed (con state
CONNECTING)
[78925.602597] libceph: osd8 down
[78925.602942] libceph: osd8 down
[78988.648108] libceph: osd8 up
[78988.648462] libceph: osd8 up
[79010.808917] libceph: osd4 http://10.1.6.2:6850 socket closed (con state OPEN)
[79010.810722] libceph: osd4 http://10.1.6.2:6850 socket closed (con state
CONNECTING)
[79011.617598] libceph: osd4 down
[79011.617861] libceph: osd4 down
[79072.772966] libceph: osd14 http://10.1.6.2:6854 socket closed (con state 
OPEN)
[79072.773434] libceph: osd14 http://10.1.6.2:6854 socket closed (con state 
OPEN)
[79072.774219] libceph: osd14 http://10.1.6.2:6854 socket closed (con state
CONNECTING)
[79073.657383] libceph: osd14 down
[79073.657552] libceph: osd14 down
[79082.565025] libceph: osd13 http://10.1.6.2:6846 socket closed (con state 
OPEN)
[79082.565814] libceph: osd13 http://10.1.6.2:6846 socket closed (con state 
OPEN)
[79082.566279] libceph: osd13 http://10.1.6.2:6846 socket closed (con state
CONNECTING)
[79082.670861] libceph: osd13 down
[79082.671023] libceph: osd13 down
[79115.435180] libceph: osd14 up
[79115.435989] libceph: osd14 up
[79117.603991] libceph: osd13 up
[79118.557601] libceph: osd13 up
[79154.719547] libceph: osd4 up
[79154.720232] libceph: osd4 up
[79175.900935] libceph: osd12 http://10.1.6.2:6822 socket closed (con state 
OPEN)
[79175.902922] libceph: osd12 http://10.1.6.2:6822 socket closed (con state
CONNECTING)
[79176.650847] libceph: osd12 down
[79176.651138] libceph: osd12 down
[79219.762665] libceph: osd12 up
[79219.763090] libceph: osd12 up
[79252.405666] libceph: osd11 http://10.1.6.2:6805 socket closed (con state 
OPEN)
[79252.406349] libceph: osd11 http://10.1.6.2:6805 socket closed (con state
CONNECTING)
[79252.462748] libceph: osd11 down
[79252.462855] libceph: osd11 down
[79285.656850] libceph: osd11 up
[79285.657341] libceph: osd11 up
[80558.024975] libceph: osd13 http://10.1.6.2:6854 socket closed (con state 
OPEN)
[80558.025751] libceph: osd13 http://10.1.6.2:6854 socket closed (con state 
OPEN)
[80558.026341] libceph: osd13 http://10.1.6.2:6854 socket closed (con state
CONNECTING)
[80558.652903] libceph: osd13 http://10.1.6.2:6854 socket error on write
[80558.734330] libceph: osd13 down
[80558.734501] libceph: osd13 down
[80590.753493] libceph: osd13 up
[80592.884936] libceph: osd13 up
[80592.897062] libceph: osd12 http://10.1.6.2:6822 socket closed (con state 
OPEN)
[90351.841800] libceph: osd1 down
[90371.299988] libceph: osd1 down
[90391.238370] libceph: osd1 up
[90391.778979] libceph: osd1 up

Thanks for any help/ideas
Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs going down/up at random

2018-01-09 Thread Mike O'Connor
Hi All

I have a ceph host (12.2.2) which has 14 OSDs which seem to go down the
up, what should I look at to try to identify the issue ?
The system has three LSI SAS9201-8i cards which is then connected 14
drives at this time. (option of 24 drives)
I have three of these chassis but only one is running right now so I
have CEPH set for singe node.

I have very carefully looks at the logs files and not found anything
which indicates any issues with the controller and the drives.

dmesg has these messages.
---
[78752.708932] libceph: osd3 10.1.6.2:6834 socket closed (con state OPEN)
[78752.710319] libceph: osd3 10.1.6.2:6834 socket closed (con state
CONNECTING)
[78753.426244] libceph: osd3 down
[78753.426640] libceph: osd3 down
[78776.496962] libceph: osd5 10.1.6.2:6810 socket closed (con state OPEN)
[78776.498626] libceph: osd5 10.1.6.2:6810 socket closed (con state
CONNECTING)
[78777.446384] libceph: osd5 down
[78777.446720] libceph: osd5 down
[78806.466973] libceph: osd3 up
[78806.467429] libceph: osd3 up
[78855.565098] libceph: osd10 10.1.6.2:6801 socket closed (con state OPEN)
[78855.567062] libceph: osd10 10.1.6.2:6801 socket closed (con state
CONNECTING)
[78856.554209] libceph: osd10 down
[78856.554357] libceph: osd10 down
[78868.265665] libceph: osd1 10.1.6.2:6830 socket closed (con state OPEN)
[78868.266347] libceph: osd1 10.1.6.2:6830 socket closed (con state
CONNECTING)
[78868.529575] libceph: osd1 down
[78869.469264] libceph: osd1 down
[78899.538533] libceph: osd10 up
[78899.538808] libceph: osd10 up
[78903.556418] libceph: osd5 up
[78905.309401] libceph: osd5 up
[78909.755499] libceph: osd1 up
[78912.008581] libceph: osd1 up
[78912.040872] libceph: osd4 10.1.6.2:6850 socket error on write
[78924.736964] libceph: osd8 10.1.6.2:6809 socket closed (con state OPEN)
[78924.738402] libceph: osd8 10.1.6.2:6809 socket closed (con state
CONNECTING)
[78925.602597] libceph: osd8 down
[78925.602942] libceph: osd8 down
[78988.648108] libceph: osd8 up
[78988.648462] libceph: osd8 up
[79010.808917] libceph: osd4 10.1.6.2:6850 socket closed (con state OPEN)
[79010.810722] libceph: osd4 10.1.6.2:6850 socket closed (con state
CONNECTING)
[79011.617598] libceph: osd4 down
[79011.617861] libceph: osd4 down
[79072.772966] libceph: osd14 10.1.6.2:6854 socket closed (con state OPEN)
[79072.773434] libceph: osd14 10.1.6.2:6854 socket closed (con state OPEN)
[79072.774219] libceph: osd14 10.1.6.2:6854 socket closed (con state
CONNECTING)
[79073.657383] libceph: osd14 down
[79073.657552] libceph: osd14 down
[79082.565025] libceph: osd13 10.1.6.2:6846 socket closed (con state OPEN)
[79082.565814] libceph: osd13 10.1.6.2:6846 socket closed (con state OPEN)
[79082.566279] libceph: osd13 10.1.6.2:6846 socket closed (con state
CONNECTING)
[79082.670861] libceph: osd13 down
[79082.671023] libceph: osd13 down
[79115.435180] libceph: osd14 up
[79115.435989] libceph: osd14 up
[79117.603991] libceph: osd13 up
[79118.557601] libceph: osd13 up
[79154.719547] libceph: osd4 up
[79154.720232] libceph: osd4 up
[79175.900935] libceph: osd12 10.1.6.2:6822 socket closed (con state OPEN)
[79175.902922] libceph: osd12 10.1.6.2:6822 socket closed (con state
CONNECTING)
[79176.650847] libceph: osd12 down
[79176.651138] libceph: osd12 down
[79219.762665] libceph: osd12 up
[79219.763090] libceph: osd12 up
[79252.405666] libceph: osd11 10.1.6.2:6805 socket closed (con state OPEN)
[79252.406349] libceph: osd11 10.1.6.2:6805 socket closed (con state
CONNECTING)
[79252.462748] libceph: osd11 down
[79252.462855] libceph: osd11 down
[79285.656850] libceph: osd11 up
[79285.657341] libceph: osd11 up
[80558.024975] libceph: osd13 10.1.6.2:6854 socket closed (con state OPEN)
[80558.025751] libceph: osd13 10.1.6.2:6854 socket closed (con state OPEN)
[80558.026341] libceph: osd13 10.1.6.2:6854 socket closed (con state
CONNECTING)
[80558.652903] libceph: osd13 10.1.6.2:6854 socket error on write
[80558.734330] libceph: osd13 down
[80558.734501] libceph: osd13 down
[80590.753493] libceph: osd13 up
[80592.884936] libceph: osd13 up
[80592.897062] libceph: osd12 10.1.6.2:6822 socket closed (con state OPEN)
[90351.841800] libceph: osd1 down
[90371.299988] libceph: osd1 down
[90391.238370] libceph: osd1 up
[90391.778979] libceph: osd1 up

Thanks for any help/ideas
Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 'lost' cephfs filesystem?

2018-01-09 Thread Mark Schouten
Hi,

While upgrading a server with a CephFS mount tonight, it stalled on installing 
a new kernel, because it was waiting for `sync`. 

I'm pretty sure it has something to do with the CephFS filesystem which caused 
some issues last week. I think the kernel still has a reference to the 
probably lazy unmounted CephFS filesystem.
Unmounting the filesystem 'works', which means it is no longer available, but 
the unmount-command seems to be waiting for sync() as well. Mounting the 
filesystem again doesn't work either.

I know the simple solution is to just reboot the server, but the server holds 
quite a lot of VM's and Containers, so I'd prefer to fix this without a reboot. 

Anybody with some clever ideas? :)

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard runs on all manager instances?

2018-01-09 Thread John Spray
On Tue, Jan 9, 2018 at 6:34 PM, Tim Bishop  wrote:
> Hi,
>
> I've recently upgraded from Jewel to Luminous and I'm therefore new to
> using the Dashboard. I noted this section in the documentation:
>
> http://docs.ceph.com/docs/master/mgr/dashboard/#load-balancer
>
> "Please note that the dashboard will only start on the
> manager which is active at that moment. Query the Ceph
> cluster status to see which manager is active (e.g., ceph
> mgr dump). In order to make the dashboard available via a
> consistent URL regardless of which manager daemon is currently
> active, you may want to set up a load balancer front-end
> to direct traffic to whichever manager endpoint is available.
> If you use a reverse http proxy that forwards a subpath to
> the dashboard, you need to configure url_prefix (see above)."
>
> However, from what I can see the dashboard is actually started on all
> manager instances. On the standby instances it simply has a redirect to
> the active instance. So the above documentation would look to be
> incorrect?

Oops, missed that bit of the documentation when adding the standby
functionality.  Will fix that.

John

>
> I was planning to use Apache's mod_proxy_balancer to just pick the
> active instance, but the above makes that tricky. How have others solved
> this? Or do you just pick the right instance and go direct? Or maybe I'm
> missing a config option to make the dashboard only run on the active
> manager?
>
> Thanks,
> Tim.
>
> --
> Tim Bishop
> PGP Key: 0x6C226B37FDF38D55
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Alfredo Deza
On Tue, Jan 9, 2018 at 3:27 PM, Reed Dier  wrote:
> After removing the —osd-id flag, everything came up normally.

We just verified this is a bug when using --osd-id and that ID is no
longer available in the cluster. I've created

http://tracker.ceph.com/issues/22642 to get this fixed properly.

>
>  -221.82448 host node24
>   0   hdd   7.28450 osd.0 up  1.0
> 1.0
>   8   hdd   7.26999 osd.8 up  1.0
> 1.0
>  16   hdd   7.26999 osd.16up  1.0
> 1.0
>
>
> Given the vanilla-ness to this ceph-volume command, is this something
> ceph-deploy-able?
>
> I’m seeing ceph-deploy 1.5.39 as the latest stable release.
>

1.5.39 is going to be the last release that supports ceph-disk. We are
planning on releasing 2.0.0 (which breaks backwards compatibility) for
later this week or early next week.

> ceph-deploy --username root disk zap $NODE:$HDD
>
> ceph-deploy --username root osd create $NODE:$HDD:$SSD
>
>
> In that example $HDD is the main OSD device, and $SSD is the NVMe partition
> I want to use for block.db (and block.wal). Or is the syntax different from
> the filestore days?

The upcoming version does not support re-using an OSD ID, but that
might be easy to add. The docs in master are already displaying how
the new API looks:

http://docs.ceph.com/ceph-deploy/docs/#deploying-osds

> And I am assuming that no --bluestore would be necessary given that I am
> reading that bluestore is the default and filestore requires intervention.

bluestore is the default, correct. The API will change a lot, you might want to
>
> Thanks,
>
> Reed
>
> On Jan 9, 2018, at 2:10 PM, Reed Dier  wrote:
>
> -221.81000 host node24
>   0   hdd   7.26999 osd.0 destroyed0
> 1.0
>   8   hdd   7.26999 osd.8up  1.0
> 1.0
>  16   hdd   7.26999 osd.16   up  1.0
> 1.0
>
>
> Should I do these prior to running without the osd-id specified?
>
> # ceph osd crush remove osd.$ID
> # ceph auth del osd.$ID
> # ceph osd rm osd.$ID
>
>
> And then it fill in the missing osd.0.
> Will set norebalance flag first to prevent data reshuffle upon the osd being
> removed from the crush map.
>
> Thanks,
>
> Reed
>
> On Jan 9, 2018, at 2:05 PM, Alfredo Deza  wrote:
>
> On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier  wrote:
>
> Hi ceph-users,
>
> Hoping that this is something small that I am overlooking, but could use the
> group mind to help.
>
> Ceph 12.2.2, Ubuntu 16.04 environment.
> OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore
> journal to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5).
>
> I have an OSD that I am trying to convert to bluestore and running into some
> trouble.
>
> Started here until the ceps-volume create statement, which doesn’t work.
> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/
> Worth mentioning I also flushed the journal on the nvme partition before
> nuking the OSD.
>
> $ sudo ceph-osd -i 0 --flush-journal
>
>
> So I first started with this command:
>
> $ sudo ceph-volume lvm create --bluestore --data /dev/sda --block.db
> /dev/nvme0n1p5 --osd-id 0
>
>
> Pastebin to the ceph-volume log: https://pastebin.com/epkM3aP6
>
> However the OSD doesn’t start.
>
>
> I was just able to replicate this by using an ID that doesn't exist in
> the cluster. On a cluster with just one OSD (with an ID of 0) I
> created
> an OSD with --osd-id 3, and had the exact same results.
>
>
> Pastebin to ceph-osd log: https://pastebin.com/9qEsAJzA
>
> I tried restarting the process, by deleting the LVM structures, zapping the
> disk using ceph-volume.
> This time using prepare and activate instead of create.
>
> $ sudo ceph-volume lvm prepare --bluestore --data /dev/sda --block.db
> /dev/nvme0n1p5 --osd-id 0
>
> $ sudo ceph-volume lvm activate --bluestore 0
> 227e1721-cd2e-4d7e-bb48-bc2bb715a038
>
>
> Also ran the enable on the ceph-volume systemd unit per
> http://docs.ceph.com/docs/master/install/manual-deployment/
>
> $ sudo systemctl enable
> ceph-volume@lvm-0-227e1721-cd2e-4d7e-bb48-bc2bb715a038
>
>
> Same results.
>
> Any help is greatly appreciated.
>
>
> Could you try without passing --osd-id ?
>
>
> Thanks,
>
> Reed
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
After removing the —osd-id flag, everything came up normally.

>  -221.82448 host node24
>   0   hdd   7.28450 osd.0 up  1.0 1.0
>   8   hdd   7.26999 osd.8 up  1.0 1.0
>  16   hdd   7.26999 osd.16up  1.0 1.0


Given the vanilla-ness to this ceph-volume command, is this something 
ceph-deploy-able?

I’m seeing ceph-deploy 1.5.39 as the latest stable release.

> ceph-deploy --username root disk zap $NODE:$HDD
> ceph-deploy --username root osd create $NODE:$HDD:$SSD

In that example $HDD is the main OSD device, and $SSD is the NVMe partition I 
want to use for block.db (and block.wal). Or is the syntax different from the 
filestore days?
And I am assuming that no --bluestore would be necessary given that I am 
reading that bluestore is the default and filestore requires intervention.

Thanks,

Reed

> On Jan 9, 2018, at 2:10 PM, Reed Dier  wrote:
> 
>> -221.81000 host node24
>>   0   hdd   7.26999 osd.0 destroyed0 
>> 1.0
>>   8   hdd   7.26999 osd.8up  1.0 
>> 1.0
>>  16   hdd   7.26999 osd.16   up  1.0 
>> 1.0
> 
> Should I do these prior to running without the osd-id specified?
>> # ceph osd crush remove osd.$ID
>> # ceph auth del osd.$ID
>> # ceph osd rm osd.$ID
> 
> 
> And then it fill in the missing osd.0.
> Will set norebalance flag first to prevent data reshuffle upon the osd being 
> removed from the crush map.
> 
> Thanks,
> 
> Reed
> 
>> On Jan 9, 2018, at 2:05 PM, Alfredo Deza > > wrote:
>> 
>> On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier > > wrote:
>>> Hi ceph-users,
>>> 
>>> Hoping that this is something small that I am overlooking, but could use the
>>> group mind to help.
>>> 
>>> Ceph 12.2.2, Ubuntu 16.04 environment.
>>> OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore
>>> journal to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5).
>>> 
>>> I have an OSD that I am trying to convert to bluestore and running into some
>>> trouble.
>>> 
>>> Started here until the ceps-volume create statement, which doesn’t work.
>>> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/ 
>>> 
>>> Worth mentioning I also flushed the journal on the nvme partition before
>>> nuking the OSD.
>>> 
>>> $ sudo ceph-osd -i 0 --flush-journal
>>> 
>>> 
>>> So I first started with this command:
>>> 
>>> $ sudo ceph-volume lvm create --bluestore --data /dev/sda --block.db
>>> /dev/nvme0n1p5 --osd-id 0
>>> 
>>> 
>>> Pastebin to the ceph-volume log: https://pastebin.com/epkM3aP6
>>> 
>>> However the OSD doesn’t start.
>> 
>> I was just able to replicate this by using an ID that doesn't exist in
>> the cluster. On a cluster with just one OSD (with an ID of 0) I
>> created
>> an OSD with --osd-id 3, and had the exact same results.
>> 
>>> 
>>> Pastebin to ceph-osd log: https://pastebin.com/9qEsAJzA 
>>> 
>>> 
>>> I tried restarting the process, by deleting the LVM structures, zapping the
>>> disk using ceph-volume.
>>> This time using prepare and activate instead of create.
>>> 
>>> $ sudo ceph-volume lvm prepare --bluestore --data /dev/sda --block.db
>>> /dev/nvme0n1p5 --osd-id 0
>>> 
>>> $ sudo ceph-volume lvm activate --bluestore 0
>>> 227e1721-cd2e-4d7e-bb48-bc2bb715a038
>>> 
>>> 
>>> Also ran the enable on the ceph-volume systemd unit per
>>> http://docs.ceph.com/docs/master/install/manual-deployment/ 
>>> 
>>> 
>>> $ sudo systemctl enable
>>> ceph-volume@lvm-0-227e1721-cd2e-4d7e-bb48-bc2bb715a038
>>> 
>>> 
>>> Same results.
>>> 
>>> Any help is greatly appreciated.
>> 
>> Could you try without passing --osd-id ?
>>> 
>>> Thanks,
>>> 
>>> Reed
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
> -221.81000 host node24
>   0   hdd   7.26999 osd.0 destroyed0 
> 1.0
>   8   hdd   7.26999 osd.8up  1.0 
> 1.0
>  16   hdd   7.26999 osd.16   up  1.0 
> 1.0

Should I do these prior to running without the osd-id specified?
> # ceph osd crush remove osd.$ID
> # ceph auth del osd.$ID
> # ceph osd rm osd.$ID


And then it fill in the missing osd.0.
Will set norebalance flag first to prevent data reshuffle upon the osd being 
removed from the crush map.

Thanks,

Reed

> On Jan 9, 2018, at 2:05 PM, Alfredo Deza  wrote:
> 
> On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier  > wrote:
>> Hi ceph-users,
>> 
>> Hoping that this is something small that I am overlooking, but could use the
>> group mind to help.
>> 
>> Ceph 12.2.2, Ubuntu 16.04 environment.
>> OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore
>> journal to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5).
>> 
>> I have an OSD that I am trying to convert to bluestore and running into some
>> trouble.
>> 
>> Started here until the ceps-volume create statement, which doesn’t work.
>> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/
>> Worth mentioning I also flushed the journal on the nvme partition before
>> nuking the OSD.
>> 
>> $ sudo ceph-osd -i 0 --flush-journal
>> 
>> 
>> So I first started with this command:
>> 
>> $ sudo ceph-volume lvm create --bluestore --data /dev/sda --block.db
>> /dev/nvme0n1p5 --osd-id 0
>> 
>> 
>> Pastebin to the ceph-volume log: https://pastebin.com/epkM3aP6
>> 
>> However the OSD doesn’t start.
> 
> I was just able to replicate this by using an ID that doesn't exist in
> the cluster. On a cluster with just one OSD (with an ID of 0) I
> created
> an OSD with --osd-id 3, and had the exact same results.
> 
>> 
>> Pastebin to ceph-osd log: https://pastebin.com/9qEsAJzA
>> 
>> I tried restarting the process, by deleting the LVM structures, zapping the
>> disk using ceph-volume.
>> This time using prepare and activate instead of create.
>> 
>> $ sudo ceph-volume lvm prepare --bluestore --data /dev/sda --block.db
>> /dev/nvme0n1p5 --osd-id 0
>> 
>> $ sudo ceph-volume lvm activate --bluestore 0
>> 227e1721-cd2e-4d7e-bb48-bc2bb715a038
>> 
>> 
>> Also ran the enable on the ceph-volume systemd unit per
>> http://docs.ceph.com/docs/master/install/manual-deployment/
>> 
>> $ sudo systemctl enable
>> ceph-volume@lvm-0-227e1721-cd2e-4d7e-bb48-bc2bb715a038
>> 
>> 
>> Same results.
>> 
>> Any help is greatly appreciated.
> 
> Could you try without passing --osd-id ?
>> 
>> Thanks,
>> 
>> Reed
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Alfredo Deza
On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier  wrote:
> Hi ceph-users,
>
> Hoping that this is something small that I am overlooking, but could use the
> group mind to help.
>
> Ceph 12.2.2, Ubuntu 16.04 environment.
> OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore
> journal to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5).
>
> I have an OSD that I am trying to convert to bluestore and running into some
> trouble.
>
> Started here until the ceps-volume create statement, which doesn’t work.
> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/
> Worth mentioning I also flushed the journal on the nvme partition before
> nuking the OSD.
>
> $ sudo ceph-osd -i 0 --flush-journal
>
>
> So I first started with this command:
>
> $ sudo ceph-volume lvm create --bluestore --data /dev/sda --block.db
> /dev/nvme0n1p5 --osd-id 0
>
>
> Pastebin to the ceph-volume log: https://pastebin.com/epkM3aP6
>
> However the OSD doesn’t start.

I was just able to replicate this by using an ID that doesn't exist in
the cluster. On a cluster with just one OSD (with an ID of 0) I
created
an OSD with --osd-id 3, and had the exact same results.

>
> Pastebin to ceph-osd log: https://pastebin.com/9qEsAJzA
>
> I tried restarting the process, by deleting the LVM structures, zapping the
> disk using ceph-volume.
> This time using prepare and activate instead of create.
>
> $ sudo ceph-volume lvm prepare --bluestore --data /dev/sda --block.db
> /dev/nvme0n1p5 --osd-id 0
>
> $ sudo ceph-volume lvm activate --bluestore 0
> 227e1721-cd2e-4d7e-bb48-bc2bb715a038
>
>
> Also ran the enable on the ceph-volume systemd unit per
> http://docs.ceph.com/docs/master/install/manual-deployment/
>
> $ sudo systemctl enable
> ceph-volume@lvm-0-227e1721-cd2e-4d7e-bb48-bc2bb715a038
>
>
> Same results.
>
> Any help is greatly appreciated.

Could you try without passing --osd-id ?
>
> Thanks,
>
> Reed
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
Hi ceph-users,

Hoping that this is something small that I am overlooking, but could use the 
group mind to help.

Ceph 12.2.2, Ubuntu 16.04 environment.
OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore journal 
to a blocks.db and WAL device on an NVMe partition (/dev/nvme0n1p5).

I have an OSD that I am trying to convert to bluestore and running into some 
trouble.

Started here until the ceps-volume create statement, which doesn’t work. 
http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/ 

Worth mentioning I also flushed the journal on the nvme partition before nuking 
the OSD.
> $ sudo ceph-osd -i 0 --flush-journal


So I first started with this command:
> $ sudo ceph-volume lvm create --bluestore --data /dev/sda --block.db 
> /dev/nvme0n1p5 --osd-id 0


Pastebin to the ceph-volume log: https://pastebin.com/epkM3aP6 


However the OSD doesn’t start.

Pastebin to ceph-osd log: https://pastebin.com/9qEsAJzA 


I tried restarting the process, by deleting the LVM structures, zapping the 
disk using ceph-volume.
This time using prepare and activate instead of create.
> $ sudo ceph-volume lvm prepare --bluestore --data /dev/sda --block.db 
> /dev/nvme0n1p5 --osd-id 0
> $ sudo ceph-volume lvm activate --bluestore 0 
> 227e1721-cd2e-4d7e-bb48-bc2bb715a038

Also ran the enable on the ceph-volume systemd unit per 
http://docs.ceph.com/docs/master/install/manual-deployment/ 

> $ sudo systemctl enable ceph-volume@lvm-0-227e1721-cd2e-4d7e-bb48-bc2bb715a038

Same results.

Any help is greatly appreciated.

Thanks,

Reed___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Alfredo Deza
On Tue, Jan 9, 2018 at 1:35 PM, Reed Dier  wrote:
> I would just like to mirror what Dan van der Ster’s sentiments are.
>
> As someone attempting to move an OSD to bluestore, with limited/no LVM
> experience, it is a completely different beast and complexity level compared
> to the ceph-disk/filestore days.
>
> ceph-deploy was a very simple tool that did exactly what I was looking to
> do, but now we have deprecated ceph-disk halfway into a release, ceph-deploy
> doesn’t appear to fully support ceph-volume, which is now the official way
> to manage OSDs moving forward.

ceph-deploy now fully supports ceph-volume, we should get a release soon

>
> My ceph-volume create statement ‘succeeded’ but the OSD doesn’t start, so
> now I am trying to zap the disk to try to recreate the OSD, and the zap is
> failing as Dan’s did.

I would encourage you to open a ticket in the tracker so that we can
improve on what failed for you

http://tracker.ceph.com/projects/ceph-volume/issues/new

ceph-volume keeps thorough logs in /var/log/ceph/ceph-volume.log and
/var/log/ceph/ceph-volume-systemd.log

If you create a ticket, please make sure to add all the output and
steps that you can
>
> And yes, I was able to get it zapped using the lvremove, vgremove, pvremove
> commands, but that is not obvious to someone who hasn’t used LVM extensively
> for storage management before.
>
> I also want to mirror Dan’s sentiments about the unnecessary complexity
> imposed on what I expect is the default use case of an entire disk being
> used. I can’t see anything more than the ‘entire disk’ method being the
> largest use case for users of ceph, especially the smaller clusters trying
> to maximize hardware/spend.

We don't take lightly the introduction of LVM here. The new tool is
addressing several insurmountable issues with how ceph-disk operated.

Although using an entire disk might be easier in the use case you are
in, it is certainly not the only thing we have to support, so then
again, we can't
reliably decide what strategy would be best to destroy that volume, or
group, or if the PV should be destroyed as well.

The 'zap' sub-command will allow that lv to be reused for an OSD and
that should work. Again, if it isn't sufficient, we really do need
more information and a
ticket in the tracker is the best way.

>
> Just wanted to piggy back this thread to echo Dan’s frustration.
>
> Thanks,
>
> Reed
>
> On Jan 8, 2018, at 10:41 AM, Alfredo Deza  wrote:
>
> On Mon, Jan 8, 2018 at 10:53 AM, Dan van der Ster 
> wrote:
>
> On Mon, Jan 8, 2018 at 4:37 PM, Alfredo Deza  wrote:
>
> On Thu, Dec 21, 2017 at 11:35 AM, Stefan Kooman  wrote:
>
> Quoting Dan van der Ster (d...@vanderster.com):
>
> Thanks Stefan. But isn't there also some vgremove or lvremove magic
> that needs to bring down these /dev/dm-... devices I have?
>
>
> Ah, you want to clean up properly before that. Sure:
>
> lvremove -f /
> vgremove 
> pvremove /dev/ceph-device (should wipe labels)
>
> So ideally there should be a ceph-volume lvm destroy / zap option that
> takes care of this:
>
> 1) Properly remove LV/VG/PV as shown above
> 2) wipefs to get rid of LVM signatures
> 3) dd zeroes to get rid of signatures that might still be there
>
>
> ceph-volume does have a 'zap' subcommand, but it does not remove
> logical volumes or groups. It is intended to leave those in place for
> re-use. It uses wipefs, but
> not in a way that would end up removing LVM signatures.
>
> Docs for zap are at: http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/
>
> The reason for not attempting removal is that an LV might not be a
> 1-to-1 device to volume group. It is being suggested here to "vgremove
> "
> but what if the group has several other LVs that should not get
> removed? Similarly, what if the logical volume is not a single PV but
> many?
>
> We believe that these operations should be up to the administrator
> with better context as to what goes where and what (if anything)
> really needs to be removed
> from LVM.
>
>
> Maybe I'm missing something, but aren't most (almost all?) use-cases just
>
>   ceph-volume lvm create /dev/
>
>
> No
>
>
> ? Or do you expect most deployments to do something more complicated with
> lvm?
>
>
> Yes, we do. For example dmcache, which to ceph-volume looks like a
> plain logical volume, but it can be vary on how it is implemented
> behind the scenes
>
> In that above whole-disk case, I think it would be useful to have a
> very simple cmd to tear down whatever ceph-volume created, so that
> ceph admins don't need to reverse engineer what ceph-volume is doing
> with lvm.
>
>
> Right, that would work if that was the only supported way of dealing
> with lvm. We aren't imposing this, we added it as a convenience if a
> user did not want
> to deal with lvm at all. LVM has a plethora of ways to create an LV,
> and we don't want to either restrict users to our view of LVM or
> 

Re: [ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete

2018-01-09 Thread Jens-U. Mozdzen

Hi Brent,

Brent Kennedy wrote to the mailing list:
Unfortunately, I don?t see that setting documented anywhere other  
than the release notes.  Its hard to find guidance for questions in  
that case, but luckily you noted it in your blog post.  I wish I  
knew what setting to put that at.  I did use the deprecated one  
after moving to hammer a while back due to the mis-calcuated PGs.  I  
have now that settings, but used 0 as the value, which cleared the  
error in the status, but the stuck incomplete pgs persist.


per your earlier message, you currently have at max 2549 PGs per OSD  
("too many PGs per OSD (2549 > max 200)"). Therefore, you might try  
setting mon_max_pg_per_osd to 2600 (to give some room for minor growth  
during backfills) and restart the OSDs.


Of course, reducing the number of PGs per OSD should somehow be on  
your list, but I do understand that that's not always as easy as it's  
written... especially given the fact that Ceph seems to still lack a  
few mechanisms to clean up certain situations (like lossless migration  
of pool contents to another pool, for RBD or CephFS).


Regards,
Jens

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd: map failed

2018-01-09 Thread Karun Josy
Hello,

We have a user "testuser" with below permissions :

$ ceph auth get client.testuser
exported keyring for client.testuser
[client.testuser]
key = ==
caps mon = "profile rbd"
caps osd = "profile rbd pool=ecpool, profile rbd pool=cv, profile
rbd-read-only pool=templates"


But when we try to map an image in pool 'templates' we get the below error
:
--
# rbd map templates/centos.7-4.x86-64.2017 --id testuser
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted


Is it because that user has only read permission in templates pool ?



Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard runs on all manager instances?

2018-01-09 Thread Janne Johansson
2018-01-09 19:34 GMT+01:00 Tim Bishop :

> Hi,
>
> I've recently upgraded from Jewel to Luminous and I'm therefore new to
> using the Dashboard. I noted this section in the documentation:
>
> http://docs.ceph.com/docs/master/mgr/dashboard/#load-balancer
>
> "Please note that the dashboard will only start on the
> manager which is active at that moment.
>



> However, from what I can see the dashboard is actually started on all
> manager instances. On the standby instances it simply has a redirect to
> the active instance. So the above documentation would look to be
> incorrect?
>


The statement is wrong. I think it came with 12.2.2 to have non-master MGR
nodes
to redirect to the current master instead of requiring users to set up
balancers/
proxies to get a consistent URL.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Reed Dier
I would just like to mirror what Dan van der Ster’s sentiments are.

As someone attempting to move an OSD to bluestore, with limited/no LVM 
experience, it is a completely different beast and complexity level compared to 
the ceph-disk/filestore days.

ceph-deploy was a very simple tool that did exactly what I was looking to do, 
but now we have deprecated ceph-disk halfway into a release, ceph-deploy 
doesn’t appear to fully support ceph-volume, which is now the official way to 
manage OSDs moving forward.

My ceph-volume create statement ‘succeeded’ but the OSD doesn’t start, so now I 
am trying to zap the disk to try to recreate the OSD, and the zap is failing as 
Dan’s did.

And yes, I was able to get it zapped using the lvremove, vgremove, pvremove 
commands, but that is not obvious to someone who hasn’t used LVM extensively 
for storage management before.

I also want to mirror Dan’s sentiments about the unnecessary complexity imposed 
on what I expect is the default use case of an entire disk being used. I can’t 
see anything more than the ‘entire disk’ method being the largest use case for 
users of ceph, especially the smaller clusters trying to maximize 
hardware/spend.

Just wanted to piggy back this thread to echo Dan’s frustration.

Thanks,

Reed

> On Jan 8, 2018, at 10:41 AM, Alfredo Deza  wrote:
> 
> On Mon, Jan 8, 2018 at 10:53 AM, Dan van der Ster  > wrote:
>> On Mon, Jan 8, 2018 at 4:37 PM, Alfredo Deza  wrote:
>>> On Thu, Dec 21, 2017 at 11:35 AM, Stefan Kooman  wrote:
 Quoting Dan van der Ster (d...@vanderster.com):
> Thanks Stefan. But isn't there also some vgremove or lvremove magic
> that needs to bring down these /dev/dm-... devices I have?
 
 Ah, you want to clean up properly before that. Sure:
 
 lvremove -f /
 vgremove 
 pvremove /dev/ceph-device (should wipe labels)
 
 So ideally there should be a ceph-volume lvm destroy / zap option that
 takes care of this:
 
 1) Properly remove LV/VG/PV as shown above
 2) wipefs to get rid of LVM signatures
 3) dd zeroes to get rid of signatures that might still be there
>>> 
>>> ceph-volume does have a 'zap' subcommand, but it does not remove
>>> logical volumes or groups. It is intended to leave those in place for
>>> re-use. It uses wipefs, but
>>> not in a way that would end up removing LVM signatures.
>>> 
>>> Docs for zap are at: http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/
>>> 
>>> The reason for not attempting removal is that an LV might not be a
>>> 1-to-1 device to volume group. It is being suggested here to "vgremove
>>> "
>>> but what if the group has several other LVs that should not get
>>> removed? Similarly, what if the logical volume is not a single PV but
>>> many?
>>> 
>>> We believe that these operations should be up to the administrator
>>> with better context as to what goes where and what (if anything)
>>> really needs to be removed
>>> from LVM.
>> 
>> Maybe I'm missing something, but aren't most (almost all?) use-cases just
>> 
>>   ceph-volume lvm create /dev/
> 
> No
>> 
>> ? Or do you expect most deployments to do something more complicated with 
>> lvm?
>> 
> 
> Yes, we do. For example dmcache, which to ceph-volume looks like a
> plain logical volume, but it can be vary on how it is implemented
> behind the scenes
> 
>> In that above whole-disk case, I think it would be useful to have a
>> very simple cmd to tear down whatever ceph-volume created, so that
>> ceph admins don't need to reverse engineer what ceph-volume is doing
>> with lvm.
> 
> Right, that would work if that was the only supported way of dealing
> with lvm. We aren't imposing this, we added it as a convenience if a
> user did not want
> to deal with lvm at all. LVM has a plethora of ways to create an LV,
> and we don't want to either restrict users to our view of LVM or
> attempt to understand all the many different
> ways that may be and assume some behavior is desired (like removing a VG)
> 
>> 
>> Otherwise, perhaps it would be useful to document the expected normal
>> lifecycle of an lvm osd: create, failure / replacement handling,
>> decommissioning.
>> 
>> Cheers, Dan
>> 
>> 
>> 
>>> 
 
 Gr. Stefan
 
 --
 | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
 | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dashboard runs on all manager instances?

2018-01-09 Thread Tim Bishop
Hi,

I've recently upgraded from Jewel to Luminous and I'm therefore new to
using the Dashboard. I noted this section in the documentation:

http://docs.ceph.com/docs/master/mgr/dashboard/#load-balancer

"Please note that the dashboard will only start on the
manager which is active at that moment. Query the Ceph
cluster status to see which manager is active (e.g., ceph
mgr dump). In order to make the dashboard available via a
consistent URL regardless of which manager daemon is currently
active, you may want to set up a load balancer front-end
to direct traffic to whichever manager endpoint is available.
If you use a reverse http proxy that forwards a subpath to
the dashboard, you need to configure url_prefix (see above)."

However, from what I can see the dashboard is actually started on all
manager instances. On the standby instances it simply has a redirect to
the active instance. So the above documentation would look to be
incorrect?

I was planning to use Apache's mod_proxy_balancer to just pick the
active instance, but the above makes that tricky. How have others solved
this? Or do you just pick the right instance and go direct? Or maybe I'm
missing a config option to make the dashboard only run on the active
manager?

Thanks,
Tim.

-- 
Tim Bishop
PGP Key: 0x6C226B37FDF38D55

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] nfs-ganesha rpm build script has not been adapted for this -

2018-01-09 Thread Daniel Gryniewicz
This was fixed on next (for 2.6, currently in -rc1) but not backported 
to 2.5.


Daniel

On 01/09/2018 12:41 PM, Marc Roos wrote:
  
The script has not been adapted for this - at the end

http://download.ceph.com/nfs-ganesha/rpm-V2.5-stable/luminous/x86_64/

  
nfs-ganesha-rgw-2.5.4-.el7.x86_64.rpm

  ^





-Original Message-
From: Marc Roos
Sent: dinsdag 29 augustus 2017 12:10
To: amare...@redhat.com; Marc Roos; wooer...@gmail.com
Cc: ceph-us...@ceph.com
Subject: RE: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

  
nfs-ganesha-2.5.2-.el7.x86_64.rpm

  ^
Is this correct?

-Original Message-
From: Marc Roos
Sent: dinsdag 29 augustus 2017 11:40
To: amaredia; wooertim
Cc: ceph-users
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

  
Ali, Very very nice! I was creating the rpm's based on a old rpm source

spec. And it was a hastle to get them to build, and I am not sure if I
even used to correct compile settings.



-Original Message-
From: Ali Maredia [mailto:amare...@redhat.com]
Sent: maandag 28 augustus 2017 22:29
To: TYLin
Cc: Marc Roos; ceph-us...@ceph.com
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

Marc,

These rpms (and debs) are built with the latest ganesha 2.5 stable
release and the latest luminous release on download.ceph.com:

http://download.ceph.com/nfs-ganesha/

I just put them up late last week, and I will be maintaining them in the
future.

-Ali

- Original Message -

From: "TYLin" 
To: "Marc Roos" 
Cc: ceph-us...@ceph.com
Sent: Sunday, August 20, 2017 11:58:05 PM
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

You can get rpm from here

https://download.gluster.org/pub/gluster/glusterfs/nfs-ganesha/old/2.3
.0/CentOS/nfs-ganesha.repo

You have to fix the path mismatch error in the repo file manually.


On Aug 20, 2017, at 5:38 AM, Marc Roos 

wrote:




Where can you get the nfs-ganesha-ceph rpm? Is there a repository
that has these?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] nfs-ganesha rpm build script has not been adapted for this -

2018-01-09 Thread Marc Roos
 
The script has not been adapted for this - at the end
http://download.ceph.com/nfs-ganesha/rpm-V2.5-stable/luminous/x86_64/

 
nfs-ganesha-rgw-2.5.4-.el7.x86_64.rpm  
 ^





-Original Message-
From: Marc Roos 
Sent: dinsdag 29 augustus 2017 12:10
To: amare...@redhat.com; Marc Roos; wooer...@gmail.com
Cc: ceph-us...@ceph.com
Subject: RE: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

 
nfs-ganesha-2.5.2-.el7.x86_64.rpm 
 ^
Is this correct?

-Original Message-
From: Marc Roos
Sent: dinsdag 29 augustus 2017 11:40
To: amaredia; wooertim
Cc: ceph-users
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

 
Ali, Very very nice! I was creating the rpm's based on a old rpm source 
spec. And it was a hastle to get them to build, and I am not sure if I 
even used to correct compile settings.



-Original Message-
From: Ali Maredia [mailto:amare...@redhat.com]
Sent: maandag 28 augustus 2017 22:29
To: TYLin
Cc: Marc Roos; ceph-us...@ceph.com
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

Marc,

These rpms (and debs) are built with the latest ganesha 2.5 stable 
release and the latest luminous release on download.ceph.com:

http://download.ceph.com/nfs-ganesha/

I just put them up late last week, and I will be maintaining them in the 
future.

-Ali

- Original Message -
> From: "TYLin" 
> To: "Marc Roos" 
> Cc: ceph-us...@ceph.com
> Sent: Sunday, August 20, 2017 11:58:05 PM
> Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7
> 
> You can get rpm from here
> 
> https://download.gluster.org/pub/gluster/glusterfs/nfs-ganesha/old/2.3
> .0/CentOS/nfs-ganesha.repo
> 
> You have to fix the path mismatch error in the repo file manually.
> 
> > On Aug 20, 2017, at 5:38 AM, Marc Roos 
wrote:
> > 
> > 
> > 
> > Where can you get the nfs-ganesha-ceph rpm? Is there a repository 
> > that has these?
> > 
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS cache size limits

2018-01-09 Thread John Spray
On Mon, Jan 8, 2018 at 8:02 PM, Marc Roos  wrote:
>
> I guess the mds cache holds files, attributes etc but how many files
> will the default "mds_cache_memory_limit": "1073741824" hold?

We always used to get asked how much memory a given mds_cache_size (in
inodes) would require, I guess it was only a matter of time until the
reverse question was asked :-)

John

>
>
> -Original Message-
> From: Stefan Kooman [mailto:ste...@bit.nl]
> Sent: vrijdag 5 januari 2018 12:54
> To: Patrick Donnelly
> Cc: Ceph Users
> Subject: Re: [ceph-users] MDS cache size limits
>
> Quoting Patrick Donnelly (pdonn...@redhat.com):
>>
>> It's expected but not desired: http://tracker.ceph.com/issues/21402
>>
>> The memory usage tracking is off by a constant factor. I'd suggest
>> just lowering the limit so it's about where it should be for your
>> system.
>
> Thanks for the info. Yeah, we did exactly that (observe and adjust
> setting accordingly). Is this something worth mentioning in the
> documentation? Escpecially when this "factor" is a constant? Over time
> (with issue 21402 being worked on) things will change. Ceph operators
> will want to make use of as much cache as possible without
> overcommitting (MDS won't notice until there is no more memory left,
> restart, and looses all its cache :/).
>
> Gr. Stefan
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Real life EC+RBD experience is required

2018-01-09 Thread Konstantin Shalygin

Hello.


My real life experience tells me that this kind of setup will use much more
hardware resources and will show lower benchmarks compared to recommended
replicated pools on the same hardware.


Writes to ec in some cases better than replicated pools.

http://en.community.dell.com/cfs-file/__key/telligent-evolution-components-attachments/13-4624-00-00-20-44-29-13/Dell_5F00_R730xd_5F00_RedHat_5F00_Ceph_5F00_Performance_5F00_SizingGuide_5F00_WhitePaper.pdf?forcedownload=true



  But I am wondering, are there some
real-life companies that are using EC-encoded pools to host RBD images?



Yes, with replicated cache tier pool. Because direct writes to ec pool 
is not possible before Luminous release.
This is acceptable performance for mail boxes or archives (terabytes of 
screenshots, documents, logs).




k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Public IP

2018-01-09 Thread nithish B
Hello John,
Thank you for the clarification. I am using Google cloud platform for this
setup and I don't think I can assign a public ip directly to an interface
there. Hence the question.

Thanks


On Jan 8, 2018 1:51 PM, "John Petrini"  wrote:

> ceph will always bind to the local IP. It can't bind to an IP that isn't
> assigned directly to the server such as a NAT'd IP. So your public network
> should be the local network that's configured on each server. If you
> cluster network is 10.128.0.0/16 for instance your public network might
> be 10.129.0.0/16.
>
> The public bind addr allows you to specify a NAT'd IP for each of your
> monitors. You monitors will then advertise this IP address so that your
> clients know to reach them at their NAT'd IP's rather than their local
> IP's.
>
> This does NOT apply for OSD IP's. Your clients must be able to route to
> the OSD's directly. If your OSD servers are behind a NAT I don't think that
> configuration is possible nor do I think it would be a good idea to route
> your storage traffic through a NAT.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] C++17 and C++ ABI on master

2018-01-09 Thread kefu chai
On Tue, Jan 9, 2018 at 6:14 AM, Sage Weil  wrote:
> On Mon, 8 Jan 2018, Adam C. Emerson wrote:
>> Good day,
>>
>> I've just merged some changs into master that set us up to compile
>> with C++17. This will require a reasonably new compiler to build
>> master.
>
> Yay!
>
>> Due to a change in how 'noexcept' is handled (it is now part of the type
>> signature of a function), mangled symbol names of noexcept functions are
>> different, so if you have custom clients using the C++ libraries, you may
>> need to recompile.
>>
>> Do not worry, there should be no change to the C ABI. Any C clients
>> should be unaffected.
>
> I added cards to the backlog for libradospp, librbdpp, and libcephfspp.

librados and librdb are a little bit complicated. most part of its C++
API is implemented using its C API and underlying RadosClient. but
there are couple C functions, which are implemented using librados'
public C++ interface. for example, IoCtx::from_rados_ioctx_t() is used
by rados_lock_exclusive(). the same applies to librbd. so we need to
decouple the C and C++ API so that they do not depend on each other. i
have a wip branch at
https://github.com/tchaikov/ceph/tree/wip-librados-cxx, in case anyone
would like to take a look. or, we can go with another route: to hide
all the C++ symbols in librados. and expose all C and C++ symbols in
its C++ counterpart. this turns this problem a symbol visibility
problem. but the downside of this solution is that we will have two
copies of APIs in each compiled library.

> Anybody out there interested in working on that?  libcephfspp might be the
> one to start with since it doesn't depend on librados and is a much
> simpler API.

libcephfs is a pure C API. so i think there is no need to split out a
C++ library from it.

>
> sage
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-09 Thread Sage Weil
On Tue, 9 Jan 2018, Jan Fajerski wrote:
> On Tue, Jan 02, 2018 at 04:54:55PM +, John Spray wrote:
> > On Tue, Jan 2, 2018 at 10:43 AM, Jan Fajerski  wrote:
> > > Hi lists,
> > > Currently the ceph status output formats all numbers with binary unit
> > > prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
> > > 1048576 objects.  I received a bug report from a user that printing object
> > > counts with a base 2 multiplier is confusing (I agree) so I opened a bug
> > > and
> > > https://github.com/ceph/ceph/pull/19117.
> > > In the PR discussion a couple of questions arose that I'd like to get some
> > > opinions on:
> > 
> > > - Should we print binary unit prefixes (MiB, GiB, ...) since that would be
> > > technically correct?
> > 
> > I'm not a fan of the technically correct base 2 units -- they're still
> > relatively rarely used, and I've spent most of my life using kB to
> > mean 1024, not 1000.
> We could start changing the "rarely used" part ;) But I can certainly live
> with keeping the old units.
> > 
> > > - Should counters (like object counts) be formatted with a base 10
> > > multiplier or  a multiplier woth base 2?
> > 
> > I prefer base 2 for any dimensionless quantities (or rates thereof) in
> > computing.  Metres and kilograms go in base 10, bytes go in base 2.
> > 
> > It's all very subjective and a matter of opinion of course, and my
> > feelings aren't particularly strong :-)
> As far as I understand the standards regarding this (IEC 60027, ISO/IEC 8,
> probably more) are talking about base 2 units for digital data related units
> only. I might of course misunderstand.
> What is problematic I find is that other tools will (mostly?) use base 10
> units for everything not data related. Say I plot the object count of ceph in
> Grafana.  It'll use base 10 multipliers for a dimensionless number. Since
> Grafana (and I imagine other toolsllike this) consume raw numbers we'll end up
> with Grafana displaying a different object count then "ceph -s". Say 1.04M vs
> 1M. Now this is not terrible but it'll get worse with higher counts quickly.
> In the original tracker issue it's noted that this was reported with cluster
> containing 7150896726 objects. The difference from grafana to "ceph -s" was
> 7150M vs 6835M.

Right.

I find the *iB units annoying myself, and I'm not sure I'll ever be able 
to say "pebibyte" out loud, but I can't think of a good reason not to be 
correct and precise.

As a practical matter, I wonder if the PR should eliminate si_t entirely 
and replace it with dec_si_t and bin_si_t.  Or, since the binary units 
aren't actually SI units, replace si_t with dec_si_t (to be explicit!) and 
bin_unit_t, or {dec,bin}_unit_t, or similar.  I suspect that si_t vs iec_t 
or similar won't be sufficient for the developer to choose the right 
thing.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Real life EC+RBD experience is required

2018-01-09 Thread Алексей Ступников
Hello ceph users, I have a question for you.

I have checked ceph documentation and a number of online conversations in
order to find some details and real-life experience about RBD image hosted
on EC-encoded pools. I understand that it is essential to use Bluestore
storage and create separate replicated metadata pool [1], and I also have
found information that this kind of implementation is still suboptimal [2].

My real life experience tells me that this kind of setup will use much more
hardware resources and will show lower benchmarks compared to recommended
replicated pools on the same hardware. But I am wondering, are there some
real-life companies that are using EC-encoded pools to host RBD images?
What do they think about? When is it good idea to use such kind of setup?

Any answers are appreciated.

[1] http://docs.ceph.com/docs/luminous/rados/operations/erasure-code/
[2] http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/

BR,
Alexey Stupnikov.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-09 Thread Jan Fajerski

On Tue, Jan 02, 2018 at 04:54:55PM +, John Spray wrote:

On Tue, Jan 2, 2018 at 10:43 AM, Jan Fajerski  wrote:

Hi lists,
Currently the ceph status output formats all numbers with binary unit
prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
1048576 objects.  I received a bug report from a user that printing object
counts with a base 2 multiplier is confusing (I agree) so I opened a bug and
https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some
opinions on:



- Should we print binary unit prefixes (MiB, GiB, ...) since that would be
technically correct?


I'm not a fan of the technically correct base 2 units -- they're still
relatively rarely used, and I've spent most of my life using kB to
mean 1024, not 1000.
We could start changing the "rarely used" part ;) But I can certainly live with 
keeping the old units.



- Should counters (like object counts) be formatted with a base 10
multiplier or  a multiplier woth base 2?


I prefer base 2 for any dimensionless quantities (or rates thereof) in
computing.  Metres and kilograms go in base 10, bytes go in base 2.

It's all very subjective and a matter of opinion of course, and my
feelings aren't particularly strong :-)
As far as I understand the standards regarding this (IEC 60027, ISO/IEC 8, 
probably more) are talking about base 2 units for digital data related units 
only. I might of course misunderstand.
What is problematic I find is that other tools will (mostly?) use base 10 units 
for everything not data related. Say I plot the object count of ceph in Grafana.  
It'll use base 10 multipliers for a dimensionless number. Since Grafana (and I 
imagine other toolsllike this) consume raw numbers we'll end up with Grafana 
displaying a different object count then "ceph -s". Say 1.04M vs 1M. Now this is 
not terrible but it'll get worse with higher counts quickly.
In the original tracker issue it's noted that this was reported with cluster 
containing 7150896726 objects. The difference from grafana to "ceph -s" was 
7150M vs 6835M.


John


My proposal would be to both use binary unit prefixes and use base 10
multipliers for counters. I think this aligns with user expectations as well
as the relevant standard(s?).

Best,
Jan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-09 Thread Burkhard Linke

Hi,


On 01/08/2018 05:40 PM, Alessandro De Salvo wrote:

Thanks Lincoln,

indeed, as I said the cluster is recovering, so there are pending ops:


    pgs: 21.034% pgs not active
 1692310/24980804 objects degraded (6.774%)
 5612149/24980804 objects misplaced (22.466%)
 458 active+clean
 329 active+remapped+backfill_wait
 159 activating+remapped
 100 active+undersized+degraded+remapped+backfill_wait
 58  activating+undersized+degraded+remapped
 27  activating
 22  active+undersized+degraded+remapped+backfilling
 6   active+remapped+backfilling
 1   active+recovery_wait+degraded


If it's just a matter to wait for the system to complete the recovery 
it's fine, I'll deal with that, but I was wondendering if there is a 
more suble problem here.


OK, I'll wait for the recovery to complete and see what happens, thanks.


The blocked MDS might be caused by the 'activating' PGs. Do you have a 
warning about too much PGs per OSD? If that is the case, 
activating/creating/peering/whatever on the affected OSDs is blocked, 
which leads to blocked requests etc.


You can resolve this be increasing the number of allowed PGs per OSD 
('mon_max_pg_per_osd'). AFAIK it needs to be set for mon, mgr and osd 
instances. There was also been some discussion about this setting on the 
mailing list in the last weeks.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com