Re: [ceph-users] iSCSI on Ubuntu and HA / Multipathing

2019-07-10 Thread Michael Christie
On 07/11/2019 05:34 AM, Edward Kalk wrote:
> The Docs say : http://docs.ceph.com/docs/nautilus/rbd/iscsi-targets/
> 
>   * Red Hat Enterprise Linux/CentOS 7.5 (or newer); Linux kernel v4.16
> (or newer)
> 
> ^^Is there a version combination of CEPH and Ubuntu that works? Is
> anyone running iSCSI on Ubuntu ?
> -Ed
> 

There is still one rpm dep left. See attached hack patch.

We just need to make that kernel check generic and not use the rpm
labelCompare call. For RH/tcmu-runner based distros we do not need a
specific kernel version check plus the distro version check.
tcmu-runner/target_core_user should return an error if something is not
supported if the kernel is not supported.

Ricardo,

For SUSE/target_core_rbd, I think we just need to add a check for if
target_core_rbd is available, right? Maybe we want a backstore specific
callout/check?
diff --git a/rbd-target-api.py b/rbd-target-api.py
index 690a045..8dce0e8 100755
--- a/rbd-target-api.py
+++ b/rbd-target-api.py
@@ -16,7 +16,6 @@ import inspect
 import copy
 
 from functools import (reduce, wraps)
-from rpm import labelCompare
 import rados
 import rbd
 
@@ -2637,19 +2636,6 @@ def pre_reqs_errors():
 else:
 errors_found.append("OS is unsupported")
 
-# check the running kernel is OK (required kernel has patches to rbd.ko)
-os_info = os.uname()
-this_arch = os_info[-1]
-this_kernel = os_info[2].replace(".{}".format(this_arch), '')
-this_ver, this_rel = this_kernel.split('-', 1)
-
-# use labelCompare from the rpm module to handle the comparison
-if labelCompare(('1', this_ver, this_rel), ('1', k_vers, k_rel)) < 0:
-logger.error("Kernel version check failed")
-errors_found.append("Kernel version too old - {}-{} "
-"or above needed".format(k_vers,
- k_rel))
-
 return errors_found
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] out of date python-rtslib repo on https://shaman.ceph.com/

2019-06-19 Thread Michael Christie
On 06/17/2019 03:41 AM, Matthias Leopold wrote:
> thank you very much for updating python-rtslib!!
> could you maybe also do this for tcmu-runner (version 1.4.1)?

I am just about to make a new 1.5 release. Give me a week. I am working
on a last feature/bug for the gluster team, and then I am going to pass
the code to the gluster tcmu-runner devs for some review and testing.

> shaman repos are very convenient for installing and updating the ceph
> iscsi stack, I would be very happy if I could continue using it
> 
> matthias
> 
> Am 14.06.19 um 18:08 schrieb Matthias Leopold:
>> Hi,
>>
>> to the people running https://shaman.ceph.com/:
>> please update the repo for python-rtslib so recent ceph-iscsi packages
>> can be installed which need python-rtslib >= 2.1.fb68
>>
>> shaman python-rtslib version is 2.1.fb67
>> upstream python-rtslib version is 2.1.fb69
>>
>> thanks + thanks for running this service at all
>> matthias
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ISCSI Setup

2019-06-19 Thread Michael Christie
On 06/19/2019 12:34 AM, Brent Kennedy wrote:
> Recently upgraded a ceph cluster to nautilus 14.2.1 from Luminous, no
> issues.  One of the reasons for doing so was to take advantage of some
> of the new ISCSI updates that were added in Nautilus.  I installed
> CentOS 7.6 and did all the basic stuff to get the server online.  I then
> tried to use the
> http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/ document and
> hit a hard stop.  Apparently, the package versions for the required
> packages at the top nor the ceph-iscsi exist yet in any repositories. 

I am in the process of updating the upstream docs (Aaron wrote up the
changes to the RHCS docs and I am just converting to the upstream docs
and making into patches for a PR, and ceph-ansible
(https://github.com/ceph/ceph-ansible/pull/3977) for the transition from
ceph-iscsi-cli/config to ceph-iscsi.

The upstream GH for ceph-iscsi is here
https://github.com/ceph/ceph-iscsi

and it is built here:
https://shaman.ceph.com/repos/ceph-iscsi/

I think we are just waiting on one last patch for fqdn support from SUSE
so we can make a new ceph-iscsi release.


> Reminds me of when I first tried to setup RGWs.  Is there a hidden
> repository somewhere that hosts these required packages?  Also, I found
> a thread talking about those packages and the instructions being off,
> which concerns me.  Is there a good tutorial online somewhere?  I saw
> the ceph-ansible bits, but wasn’t sure if that would even work because
> of the package issue.  I use ansible to deploy machines all the time.  I
> also wonder if the ISCSI bits are considered production or Test ( I see
> RedHat has a bunch of docs talking about using iscsi, so I would think
> production ).
> 
>  
> 
> Thoughts anyone?
> 
>  
> 
> Regards,
> 
> -Brent
> 
>  
> 
> Existing Clusters:
> 
> Test: Nautilus 14.2.1 with 3 osd servers, 1 mon/man, 1 gateway ( all
> virtual on SSD )
> 
> US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4
> gateways behind haproxy LB
> 
> UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, 3
> gateways behind haproxy LB
> 
> US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3
> gateways behind haproxy LB
> 
>  
> 
>  
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Grow bluestore PV/LV

2019-05-16 Thread Michael Andersen
Thanks! I'm on mimic for now, but I'll give it a shot on a test nautilus
cluster.

On Wed, May 15, 2019 at 10:58 PM Yury Shevchuk  wrote:

> Hello Michael,
>
> growing (expanding) bluestore OSD is possible since Nautilus (14.2.0)
> using bluefs-bdev-expand tool as discussed in this thread:
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/034116.html
>
> -- Yury
>
> On Wed, May 15, 2019 at 10:03:29PM -0700, Michael Andersen wrote:
> > Hi
> >
> > After growing the size of an OSD's PV/LV, how can I get bluestore to see
> > the new space as available? It does notice the LV has changed size, but
> it
> > sees the new space as occupied.
> >
> > This is the same question as:
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023893.html
> > and
> > that original poster spent a lot of effort in explaining exactly what he
> > meant, but I could not find a reply to his email.
> >
> > Thanks
> > Michael
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Grow bluestore PV/LV

2019-05-15 Thread Michael Andersen
Hi

After growing the size of an OSD's PV/LV, how can I get bluestore to see
the new space as available? It does notice the LV has changed size, but it
sees the new space as occupied.

This is the same question as:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023893.html
and
that original poster spent a lot of effort in explaining exactly what he
meant, but I could not find a reply to his email.

Thanks
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] x pgs not deep-scrubbed in time

2019-04-04 Thread Michael Sudnick
Thanks, I'll mess around with them and see what I can do.

-Michael

On Thu, 4 Apr 2019 at 05:58, Alexandru Cucu  wrote:

> Hi,
>
> You are limited by your drives so not much can be done but it should
> alt least catch up a bit and reduce the number of pgs that have not
> been deep scrubbed in time.
>
>
> On Wed, Apr 3, 2019 at 8:13 PM Michael Sudnick
>  wrote:
> >
> > Hi Alex,
> >
> > I'm okay myself with the number of scrubs performed, would you expect
> tweaking any of those values to let the deep-scrubs finish in time/
> >
> > Thanks,
> >   Michael
> >
> > On Wed, 3 Apr 2019 at 10:30, Alexandru Cucu  wrote:
> >>
> >> Hello,
> >>
> >> You can increase *osd scrub max interval* and *osd deep scrub
> >> interval* if you don't want at least one scrub/deep scrub per week.
> >>
> >> I would also play with *osd max scrubs* and *osd scrub load threshold*
> >> to do more scrubbing work, but be careful as it will have a huge
> >> impact on performance.
> >>
> >> ---
> >> Alex Cucu
> >>
> >> On Wed, Apr 3, 2019 at 3:46 PM Michael Sudnick
> >>  wrote:
> >> >
> >> > Hello, was on IRC yesterday about this and got some input, but
> haven't figured out a solution yet. I have a 5 node, 41 OSD cluster which
> currently has the warning "295 pgs not deep-scrubbed in time". The number
> slowly increases as deep scrubs happen. In my cluster I'm primarily using
> 5400 RPM 2.5" disks, and that's my general bottleneck. Processors are 8/16
> core Intel® Xeon processor D-1541. 8 OSDs per node (one has 9), and each
> node hosts a MON, MGR and MDS.
> >> >
> >> > My CPU usage is low, it's a very low traffic cluster, just a home
> lab. CPU usage rarely spikes around 30%. RAM is fine, each node has 64GiB,
> and only about 33GiB is used. Network is overkill, 2x1GbE public, and
> 2x10GbE cluster. Disk %util when deep scrubs are happening can hit 80%, so
> that seems to be my bottleneck.
> >> >
> >> > I am running Nautilus 14.2.0. I've been running fine since release up
> to about 3 days ago where I had a disk die and replaced it.
> >> >
> >> > Any suggestions on what I can do? Thank you for any suggestions.
> >> >
> >> > -Michael
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] x pgs not deep-scrubbed in time

2019-04-03 Thread Michael Sudnick
Hi Alex,

I'm okay myself with the number of scrubs performed, would you expect
tweaking any of those values to let the deep-scrubs finish in time/

Thanks,
  Michael

On Wed, 3 Apr 2019 at 10:30, Alexandru Cucu  wrote:

> Hello,
>
> You can increase *osd scrub max interval* and *osd deep scrub
> interval* if you don't want at least one scrub/deep scrub per week.
>
> I would also play with *osd max scrubs* and *osd scrub load threshold*
> to do more scrubbing work, but be careful as it will have a huge
> impact on performance.
>
> ---
> Alex Cucu
>
> On Wed, Apr 3, 2019 at 3:46 PM Michael Sudnick
>  wrote:
> >
> > Hello, was on IRC yesterday about this and got some input, but haven't
> figured out a solution yet. I have a 5 node, 41 OSD cluster which currently
> has the warning "295 pgs not deep-scrubbed in time". The number slowly
> increases as deep scrubs happen. In my cluster I'm primarily using 5400 RPM
> 2.5" disks, and that's my general bottleneck. Processors are 8/16 core
> Intel® Xeon processor D-1541. 8 OSDs per node (one has 9), and each node
> hosts a MON, MGR and MDS.
> >
> > My CPU usage is low, it's a very low traffic cluster, just a home lab.
> CPU usage rarely spikes around 30%. RAM is fine, each node has 64GiB, and
> only about 33GiB is used. Network is overkill, 2x1GbE public, and 2x10GbE
> cluster. Disk %util when deep scrubs are happening can hit 80%, so that
> seems to be my bottleneck.
> >
> > I am running Nautilus 14.2.0. I've been running fine since release up to
> about 3 days ago where I had a disk die and replaced it.
> >
> > Any suggestions on what I can do? Thank you for any suggestions.
> >
> > -Michael
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] x pgs not deep-scrubbed in time

2019-04-03 Thread Michael Sudnick
Hello, was on IRC yesterday about this and got some input, but haven't
figured out a solution yet. I have a 5 node, 41 OSD cluster which currently
has the warning "295 pgs not deep-scrubbed in time". The number slowly
increases as deep scrubs happen. In my cluster I'm primarily using 5400 RPM
2.5" disks, and that's my general bottleneck. Processors are 8/16 core
Intel® Xeon processor D-1541. 8 OSDs per node (one has 9), and each node
hosts a MON, MGR and MDS.

My CPU usage is low, it's a very low traffic cluster, just a home lab. CPU
usage rarely spikes around 30%. RAM is fine, each node has 64GiB, and only
about 33GiB is used. Network is overkill, 2x1GbE public, and 2x10GbE
cluster. Disk %util when deep scrubs are happening can hit 80%, so that
seems to be my bottleneck.

I am running Nautilus 14.2.0. I've been running fine since release up to
about 3 days ago where I had a disk die and replaced it.

Any suggestions on what I can do? Thank you for any suggestions.

-Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore 32bit max_object_size limit

2019-01-18 Thread KEVIN MICHAEL HRPCEK


On 1/18/19 7:26 AM, Igor Fedotov wrote:

Hi Kevin,

On 1/17/2019 10:50 PM, KEVIN MICHAEL HRPCEK wrote:
Hey,

I recall reading about this somewhere but I can't find it in the docs or list 
archive and confirmation from a dev or someone who knows for sure would be 
nice. What I recall is that bluestore has a max 4GB file size limit based on 
the design of bluestore not the osd_max_object_size setting. The bluestore 
source seems to suggest that by setting the OBJECT_MAX_SIZE to a 32bit max, 
giving an error if osd_max_object_size is > OBJECT_MAX_SIZE, and not writing 
the data if offset+length >= OBJECT_MAX_SIZE. So it seems like the in osd file 
size int can't exceed 32 bits which is 4GB, like FAT32. Am I correct or maybe 
I'm reading all this wrong..?

You're correct, BlueStore doesn't support object larger than 
OBJECT_MAX_SIZE(i.e. 4Gb)

Thanks for confirming that!


If bluestore has a hard 4GB object limit using radosstriper to break up an 
object would work, but does using an EC pool that breaks up the object to 
shards smaller than OBJECT_MAX_SIZE have the same effect as radosstriper to get 
around a 4GB limit? We use rados directly and would like to move to bluestore 
but we have some large objects <= 13G that may need attention if this 4GB limit 
does exist and an ec pool doesn't get around it.
Theoretically object split using EC might help. But I'm not sure whether one 
needs to adjust osd_max_object_size greater than 4Gb to permit 13Gb object 
usage in EC pool. If it's needed than tosd_max_object_size <= OBJECT_MAX_SIZE 
constraint is violated and BlueStore wouldn't start.
In my experience I had to increase osd_max_object_size from the 128M default it 
changed to a couple versions ago to ~20G to be able to write our largest 
objects with some margin. Do you think there is another way to handle 
osd_max_object_size > OBJECT_MAX_SIZE so that bluestore will start and EC pools 
or striping can be used to write objects that are greater than OBJECT_MAX_SIZE 
but each stripe/shard ends up smaller than OBJECT_MAX_SIZE after striping or 
being in an ec pool?



https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L88
#define OBJECT_MAX_SIZE 0x // 32 bits

https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L4395

 // sanity check(s)
  auto osd_max_object_size =
cct->_conf.get_val("osd_max_object_size");
  if (osd_max_object_size >= (size_t)OBJECT_MAX_SIZE) {
derr << __func__ << " osd_max_object_size >= 0x" << std::hex << 
OBJECT_MAX_SIZE
  << "; BlueStore has hard limit of 0x" << OBJECT_MAX_SIZE << "." <<  
std::dec << dendl;
return -EINVAL;
  }


https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L12331
  if (offset + length >= OBJECT_MAX_SIZE) {
r = -E2BIG;
  } else {
_assign_nid(txc, o);
r = _do_write(txc, c, o, offset, length, bl, fadvise_flags);
txc->write_onode(o);
  }

Thanks!
Kevin


--
Kevin Hrpcek
NASA SNPP Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison



___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Thanks,

Igor

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore 32bit max_object_size limit

2019-01-17 Thread KEVIN MICHAEL HRPCEK
Hey,

I recall reading about this somewhere but I can't find it in the docs or list 
archive and confirmation from a dev or someone who knows for sure would be 
nice. What I recall is that bluestore has a max 4GB file size limit based on 
the design of bluestore not the osd_max_object_size setting. The bluestore 
source seems to suggest that by setting the OBJECT_MAX_SIZE to a 32bit max, 
giving an error if osd_max_object_size is > OBJECT_MAX_SIZE, and not writing 
the data if offset+length >= OBJECT_MAX_SIZE. So it seems like the in osd file 
size int can't exceed 32 bits which is 4GB, like FAT32. Am I correct or maybe 
I'm reading all this wrong..?

If bluestore has a hard 4GB object limit using radosstriper to break up an 
object would work, but does using an EC pool that breaks up the object to 
shards smaller than OBJECT_MAX_SIZE have the same effect as radosstriper to get 
around a 4GB limit? We use rados directly and would like to move to bluestore 
but we have some large objects <= 13G that may need attention if this 4GB limit 
does exist and an ec pool doesn't get around it.


https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L88
#define OBJECT_MAX_SIZE 0x // 32 bits

https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L4395

 // sanity check(s)
  auto osd_max_object_size =
cct->_conf.get_val("osd_max_object_size");
  if (osd_max_object_size >= (size_t)OBJECT_MAX_SIZE) {
derr << __func__ << " osd_max_object_size >= 0x" << std::hex << 
OBJECT_MAX_SIZE
  << "; BlueStore has hard limit of 0x" << OBJECT_MAX_SIZE << "." <<  
std::dec << dendl;
return -EINVAL;
  }


https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L12331
  if (offset + length >= OBJECT_MAX_SIZE) {
r = -E2BIG;
  } else {
_assign_nid(txc, o);
r = _do_write(txc, c, o, offset, length, bl, fadvise_flags);
txc->write_onode(o);
  }

Thanks!
Kevin


--
Kevin Hrpcek
NASA SNPP Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HDD spindown problem

2019-01-04 Thread Nieporte, Michael
Hello,


we likely faced the same issue with spindowns.


We set max spindown timers on all HDDs and disabled the tuned.service, which we 
were told might change back/affect the set timers.

To correctly disable tuned:
tuned-adm stop
tuned-adm off
systemctl stop tuned.service
systemctl disable tuned.service



Our spindown problem is 'fixed'.


Von: ceph-users  im Auftrag von Florian 
Engelmann 
Gesendet: Montag, 3. Dezember 2018 11:02:28
An: ceph-users@lists.ceph.com
Betreff: [ceph-users] HDD spindown problem

Hello,

we are fighting a HDD spin-down problem on our production ceph cluster
since two weeks now. The problem is not ceph related but I guess this
topic is interesting to the list and to be honest I hope to find a
solution here.

We do use 6 OSD Nodes like:
OS: Suse 12 SP3
Ceph: SES 5.5 (12.2.8)
Server: Supermicro 6048R-E1CR36L
Controller: LSI 3008 (LSI3008-IT)
Disk: 12x Seagate ST8000NM0055-1RM112 8TB (SN05 Firmware (some still
SN02 and SN04)
NVMe: 1x Intel DC P3700 800GB (used for 80GB RocksDB and 2GB WAL for
each OSD (only 7 Disks are online right now - up to 9 Disks will have
there RocksDB/WAL on one NVMe SSD)


Problem:
This Ceph cluster is used for objectstorage (RadosGW) only and is mostly
used for backups to S3 (RadosGW). There is not that much activity -
mostly at night time. We do not want any HDD to spin down but they do.
We tried to disable the spindown timers by using sdparm and also with
the Seagate tool SeaChest but "something" does re-enable them:


Disable standby on all HDD:
for i in sd{c..n}; do
/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64 -d
/dev/$i --onlySeagate --changePower --disableMode --powerMode standby ;
done


Monitor standby timer status:

while true; do for i in sd{c..n}; do echo  "$(date) $i
$(/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64
-d /dev/$i --onlySeagate --showEPCSettings -v0 | grep Stand)";  done;
sleep 1 ; done

This will show:
Mon Dec  3 10:42:54 CET 2018 sdc Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdd Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sde Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdf Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdg Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdh Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:54 CET 2018 sdi Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdj Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdk Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdl Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdm Standby Z   0 9000
65535120   Y Y
Mon Dec  3 10:42:55 CET 2018 sdn Standby Z   0 9000
65535120   Y Y


So everything is fine right now. Standby timer is 0 and disabled (no *
shown) while the default value is 9000 and the saved timer is  (we
saved this value so the disks have a huge time after reboots). But after
a unknown amount of time (in this case ~7 minutes) things start to get
weird:

Mon Dec  3 10:47:52 CET 2018 sdc Standby Z  *3500  9000
65535120   Y Y
[...]
65535120   Y Y
Mon Dec  3 10:48:07 CET 2018 sdc Standby Z  *3500  9000
65535120   Y Y
Mon Dec  3 10:48:09 CET 2018 sdc Standby Z  *3500  9000
65535120   Y Y
Mon Dec  3 10:48:12 CET 2018 sdc Standby Z  *4500  9000
65535120   Y Y
Mon Dec  3 10:48:14 CET 2018 sdc Standby Z  *4500  9000
65535120   Y Y
Mon Dec  3 10:48:16 CET 2018 sdc Standby Z  *4500  9000
65535120   Y Y
Mon Dec  3 10:48:19 CET 2018 sdc Standby Z  *4500  9000
65535120   Y Y
Mon Dec  3 10:48:21 CET 2018 sdc Standby Z  *4500  9000
65535120   Y Y
Mon Dec  3 10:48:23 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:26 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:28 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:30 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:32 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:35 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:37 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 10:48:40 CET 2018 sdc Standby Z  *5500  9000
65535120   Y Y
Mon Dec  3 

Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-21 Thread Michael Green
I was informed today that the CEPH environment I’ve been working on is no 
longer available. Unfortunately this happened before I could try any of your 
suggestions, Roman. 

Thank you for all the attention and advice. 

--
Michael Green


> On Dec 20, 2018, at 08:21, Roman Penyaev  wrote:
> 
>> On 2018-12-19 22:01, Marc Roos wrote:
>> I would be interested learning about the performance increase it has
>> compared to 10Gbit. I got the ConnectX-3 Pro but I am not using the rdma
>> because support is not default available.
> 
> Not too much, the following is the comparison on latest master using
> fio engine, which measures bare ceph messenger performance (no disk IO):
> https://github.com/ceph/ceph/pull/24678
> 
> 
> Mellanox MT27710 Family [ConnectX-4 Lx] 25gb/s:
> 
> 
>  bsiodepth=8,  async+posix   iodepth=8,  async+rdma
> - 
> --
>  4kIOPS=30.0k  BW=121MiB/s   0.257ms IOPS=47.9k  BW=187MiB/s  0.166ms
>  8kIOPS=30.8k  BW=240MiB/s   0.259ms IOPS=46.3k  BW=362MiB/s  0.172ms
> 16kIOPS=25.1k  BW=392MiB/s   0.318ms IOPS=45.2k  BW=706MiB/s  0.176ms
> 32kIOPS=23.1k  BW=722MiB/s   0.345ms IOPS=37.5k  BW=1173MiB/s 0.212ms
> 64kIOPS=18.0k  BW=1187MiB/s  0.420ms IOPS=41.0k  BW=2624MiB/s 0.189ms
> 128kIOPS=12.1k  BW=1518MiB/s  0.657ms IOPS=20.9k  BW=2613MiB/s 0.381ms
> 256kIOPS=3530   BW=883MiB/s   2.265ms IOPS=4624   BW=1156MiB/s 1.729ms
> 512kIOPS=2084   BW=1042MiB/s  3.387ms IOPS=2406   BW=1203MiB/s  3.32ms
>  1mIOPS=1119   BW=1119MiB/s  7.145ms IOPS=1277   BW=1277MiB/s  6.26ms
>  2mIOPS=551BW=1101MiB/s  14.51ms IOPS=631BW=1263MiB/s 12.66ms
>  4mIOPS=272BW=1085MiB/s  29.45ms IOPS=318BW=1268MiB/s 25.17ms
> 
> 
> 
>  bsiodepth=128,  async+posix   iodepth=128,  async+rdma
> - 
> --
>  4kIOPS=75.9k  BW=297MiB/s  1.683ms  IOPS=83.4k  BW=326MiB/s   1.535ms
>  8kIOPS=64.3k  BW=502MiB/s  1.989ms  IOPS=70.3k  BW=549MiB/s   1.819ms
> 16kIOPS=53.9k  BW=841MiB/s  2.376ms  IOPS=57.8k  BW=903MiB/s   2.214ms
> 32kIOPS=42.2k  BW=1318MiB/s 3.034ms  IOPS=59.4k  BW=1855MiB/s  2.154ms
> 64kIOPS=30.0k  BW=1934MiB/s 4.135ms  IOPS=42.3k  BW=2645MiB/s  3.023ms
> 128kIOPS=18.1k  BW=2268MiB/s 7.052ms  IOPS=21.2k  BW=2651MiB/s  
> 6.031ms
> 256kIOPS=5186   BW=1294MiB/s 24.71ms  IOPS=5253   BW=1312MiB/s  
> 24.39ms
> 512kIOPS=2897   BW=1444MiB/s 44.19ms  IOPS=2944   BW=1469MiB/s  
> 43.48ms
>  1mIOPS=1306   BW=1297MiB/s 97.98ms  IOPS=1421   BW=1415MiB/s  90.27ms
>  2mIOPS=612BW=1199MiB/s 208.6ms  IOPS=862BW=1705MiB/s  148.9ms
>  4mIOPS=316BW=1235MiB/s 409.1ms  IOPS=416BW=1664MiB/s  307.4ms
> 
> 
> 1. As you can see there is no big difference between posix and rdma.
> 
> 2. Even 25gb/s card is used we barely reach 20gb/s.  I have also results
>   on 100gb/s qlogic cards, no difference, because the bottleneck is not
>   a network.  This is especially visible on loads with bigger number of
>   iopdeth: bandwidth is not significantly changed. So even you increase
>   number of requests in-flight you reach the limit how fast those
>   requests are processed.
> 
> 3. Keep in mind this is only messenger performance, so on real ceph loads you
>   will get less, because of the whole IO stack involved.
> 
> 
> --
> Roman
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-19 Thread Michael Green
 
7f9ab0202700 -1 RDMAStack polling poll failed -4
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 0> 2018-12-20 02:27:00.343 
7f9ab0202700 -1 *** Caught signal (Aborted) **
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: in thread 7f9ab0202700 
thread_name:rdma-polling
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: ceph version 13.2.2 
(02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 1: (()+0x902970) [0x55c38d155970]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 2: (()+0xf6d0) [0x7f9ab80f46d0]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 3: (gsignal()+0x37) [0x7f9ab7114277]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 4: (abort()+0x148) [0x7f9ab7115968]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 5: 
(RDMADispatcher::polling()+0x1084) [0x7f9abb6e4c14]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 6: (()+0x6afaef) [0x7f9abb9bdaef]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 7: (()+0x7e25) [0x7f9ab80ece25]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: 8: (clone()+0x6d) [0x7f9ab71dcbad]
Dec 20 02:27:00 bonjovi0 ceph-osd[837488]: NOTE: a copy of the executable, or 
`objdump -rdS ` is needed to interpret this.

The above block repeats for each OSD.

Any advice where to go from here will be much appreciated.
--
Michael Green
Customer Support & Integration
Tel. +1 (518) 9862385
gr...@e8storage.com

E8 Storage has a new look, find out more 
<https://e8storage.com/when-performance-matters-a-new-look-for-e8-storage/> 










> On Dec 19, 2018, at 3:26 PM, Roman Penyaev  wrote:
> 
> On 2018-12-19 21:00, Michael Green wrote:
>> Thanks for the insights Mohammad and Roman. Interesting read.
>> My interest in RDMA is purely from testing perspective.
>> Still I would be interested if somebody who has RDMA enabled and
>> running, to share their ceph.conf.
> 
> Nothing special in my ceph.conf, only one line ms_cluster_type = async+rdma
> 
>> My RDMA related entries are taken from Mellanox blog here
>> https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide
>> <https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide>.
>> They used Luminous and built it from source. I'm running binary
>> distribution of Mimic here.
>> ms_type = async+rdma
>> ms_cluster = async+rdma
>> ms_async_rdma_device_name = mlx5_0
>> ms_async_rdma_polling_us = 0
>> ms_async_rdma_local_gid=
> 
> 
> ms_type = async+rdma should be enough, or ms_cluster_type=async+rdma,
> i.e. all osds will be connected over rdma, but public network stays on
> tcp sockets.
> 
> Others are optional.
> 
>> Or, if somebody with knowledge of the code could tell me when is this
>> "RDMAConnectedSocketImpl" error is printed might also be helpful.
>> 2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
>> crush map has features 288514051259236352, adjusting msgr requires
>> 2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
>> crush map has features 288514051259236352, adjusting msgr requires
>> 2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
>> crush map has features 1009089991638532096, adjusting msgr requires
>> 2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981
>> crush map has features 288514051259236352, adjusting msgr requires
>> 2018-12-19 21:45:33.138 7f52b8548140  0 mon.rio@-1(probing) e5  my
>> rank is now 0 (was -1)
>> 2018-12-19 21:45:33.141 7f529f3fe700 -1  RDMAConnectedSocketImpl
>> activate failed to transition to RTR state: (113) No route to host
> 
> The error means: no route to host :)  peers do not see each other,
> I suggest first try to install (or build) perftest and run ib_send_bw
> to test connectivity between client and server.  Also there are some
> testing examples from libuverbs (rdma-core), e.g. ibv_rc_pingpong,
> also good for benchmarking, testing, etc.
> 
> --
> Roman
> 
> 
> 
>> 2018-12-19 21:45:33.142 7f529f3fe700 -1
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
>> In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
>> 7f529f3fe700 time 2018-12-19 21:45:33.141972
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
>> 224: FAILED assert(!r)
>> --
>> Michael Green
>>> On Dec 19, 2018, at 5:21 AM, Roman Penyaev  wrote:
>>> Well, I am playing with ceph rdma implementation quite a while
>>> a

Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-19 Thread Michael Green
Thanks for the insights Mohammad and Roman. Interesting read.

My interest in RDMA is purely from testing perspective. 

Still I would be interested if somebody who has RDMA enabled and running, to 
share their ceph.conf. 

My RDMA related entries are taken from Mellanox blog here 
https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide 
<https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide>.
 They used Luminous and built it from source. I'm running binary distribution 
of Mimic here.

ms_type = async+rdma
ms_cluster = async+rdma
ms_async_rdma_device_name = mlx5_0
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid=

Or, if somebody with knowledge of the code could tell me when is this 
"RDMAConnectedSocketImpl" error is printed might also be helpful.

2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 crush 
map has features 1009089991638532096, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:33.138 7f52b8548140  0 mon.rio@-1(probing) e5  my rank is now 
0 (was -1)
2018-12-19 21:45:33.141 7f529f3fe700 -1  RDMAConnectedSocketImpl activate 
failed to transition to RTR state: (113) No route to host
2018-12-19 21:45:33.142 7f529f3fe700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
 In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 
7f529f3fe700 time 2018-12-19 21:45:33.141972
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
 224: FAILED assert(!r)
--
Michael Green



> On Dec 19, 2018, at 5:21 AM, Roman Penyaev  wrote:
> 
> 
> Well, I am playing with ceph rdma implementation quite a while
> and it has unsolved problems, thus I would say the status is
> "not completely broken", but "you can run it on your own risk
> and smile":
> 
> 1. On disconnect of previously active (high write load) connection
>   there is a race that can lead to osd (or any receiver) crash:
> 
>   https://github.com/ceph/ceph/pull/25447 
> <https://github.com/ceph/ceph/pull/25447>
> 
> 2. Recent qlogic hardware (qedr drivers) does not support
>   IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma
>   implementation, pull request from 1. also targets this
>   incompatibility.
> 
> 3. On high write load and many connections there is a chance,
>   that osd can run out of receive WRs and rdma connection (QP)
>   on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected.
>   This is fundamental design problem, which has to be fixed on
>   protocol level (e.g. propagate backpressure to senders).
> 
> 4. Unfortunately neither rdma or any other 0-latency network can
>   bring significant value, because the bottle neck is not a
>   network, please consider this for further reading regarding
>   transport performance in ceph:
> 
>   https://www.spinics.net/lists/ceph-devel/msg43555.html 
> <https://www.spinics.net/lists/ceph-devel/msg43555.html>
> 
>   Problems described above have quite a big impact on overall
>   transport performance.
> 
> --
> Roman
>>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-18 Thread Michael Green
I don't know. 
Ceph documentation on Mimic doesn't appear to go into too much details on RDMA 
in general, but still it's mentioned in the Ceph docs here and there.  Some 
examples:
Change log - http://docs.ceph.com/docs/master/releases/mimic/ 
<http://docs.ceph.com/docs/master/releases/mimic/>
Async messenger options - 
http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ 
<http://docs.ceph.com/docs/master/rados/configuration/ms-ref/>

I want to believe that the official docs wouldn't mention something that's 
completely broken?

There are multiple posts in this very mailing list from people trying to make 
it work. 
--
Michael Green
Customer Support & Integration
Tel. +1 (518) 9862385
gr...@e8storage.com

E8 Storage has a new look, find out more 
<https://e8storage.com/when-performance-matters-a-new-look-for-e8-storage/> 










> On Dec 18, 2018, at 6:55 AM, Виталий Филиппов  wrote:
> 
> Is RDMA officially supported? I'm asking because I recently tried to use DPDK 
> and it seems it's broken... i.e the code is there, but does not compile until 
> I fix cmake scripts, and after fixing the build OSDs just get segfaults and 
> die after processing something like 40-50 incoming packets.
> 
> Maybe RDMA is in the same state?
> 
> 13 декабря 2018 г. 2:42:23 GMT+03:00, Michael Green  
> пишет:
> Sorry for bumping the thread. I refuse to believe there are no people on this 
> list who have successfully enabled and run RDMA with Mimic. :)
> 
> Mike
> 
>> Hello collective wisdom,
>> 
>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>> (stable) here.
>> 
>> I have a working cluster here consisting of 3 monitor hosts,  64 OSD 
>> processes across 4 osd hosts, plus 2 MDSs, plus 2 MGRs. All of that is 
>> consumed by 10 client nodes.
>> 
>> Every host in the cluster, including clients is 
>> RHEL 7.5
>> Mellanox OFED 4.4-2.0.7.0
>> RoCE NICs are either MCX416A-CCAT or MCX414A-CCAT @ 50Gbit/sec
>> The NICs are all mlx5_0 port 1
>> 
>> ring and ib_send_bw work fine both ways on any two nodes in the cluster.
>> 
>> Full configuration of the cluster is pasted below, but RDMA related 
>> parameters are configured as following:
>> 
>> 
>> ms_public_type = async+rdma
>> ms_cluster = async+rdma
>> # Exclude clients for now 
>> ms_type = async+posix
>> 
>> ms_async_rdma_device_name = mlx5_0
>> ms_async_rdma_polling_us = 0
>> ms_async_rdma_port_num=1
>> 
>> When I try to start MON, it immediately fails as below. Anybody has seen 
>> this or could give any pointers what to/where to look next?
>> 
>> 
>> --ceph-mon.rio.log--begin--
>> 2018-12-12 22:35:30.011 7f515dc39140  0 set uid:gid to 167:167 (ceph:ceph)
>> 2018-12-12 22:35:30.011 7f515dc39140  0 ceph version 13.2.2 
>> (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process ceph-mon, 
>> pid 2129843
>> 2018-12-12 22:35:30.011 7f515dc39140  0 pidfile_write: ignore empty 
>> --pid-file
>> 2018-12-12 22:35:30.036 7f515dc39140  0 load: jerasure load: lrc load: isa
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option compression = 
>> kNoCompression
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
>> level_compaction_dynamic_level_bytes = true
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
>> write_buffer_size = 33554432
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option compression = 
>> kNoCompression
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
>> level_compaction_dynamic_level_bytes = true
>> 2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
>> write_buffer_size = 33554432
>> 2018-12-12 22:35:30.147 7f51442ed700  2 Event(0x55d927e95700 nevent=5000 
>> time_id=1).set_owner idx=1 owner=139987012998912
>> 2018-12-12 22:35:30.147 7f51442ed700 10 stack operator() starting
>> 2018-12-12 22:35:30.147 7f5143aec700  2 Event(0x55d927e95200 nevent=5000 
>> time_id=1).set_owner idx=0 owner=139987004606208
>> 2018-12-12 22:35:30.147 7f5144aee700  2 Event(0x55d927e95c00 nevent=5000 
>> time_id=1).set_owner idx=2 owner=139987021391616
>> 2018-12-12 22:35:30.147 7f5143aec700 10 stack operator() starting
>> 2018-12-12 22:35:30.147 7f5144aee700 10 stack operator() starting
>> 2018-12-12 22:35:30.147 7f515dc39140  0 starting mon.rio rank 0 at public 
>> addr 192.168.1.58:6789/0 at bind addr 192.168.1.58:6789/0 mon_data 
>> /var/lib/ceph/mon/ceph-rio fsid 376540c8-a362-41cc-9a58-9c8ceca0e4ee
>> 2018-12-12 22:35:30.147 7f515dc39140 10 -- - bind bind 192.168.1.58:6789/0
>> 2018-12-12 22:35:

[ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-12 Thread Michael Green
8:01ae
#
## BONJOVI1
#
#[osd.16]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.17]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.18]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.19]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.20]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.21]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.22]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.23]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.24]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.25]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.26]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.27]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.28]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.29]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.30]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
#[osd.31]
#ms_async_rdma_local_gid=::::::c0a8:01af
#
## PRINCE
#
#[osd.32]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.33]
#ms_async_rdma_local_gid=::::::c0a8:0198
#[osd.34]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.35]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.36]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.37]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.38]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.39]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.40]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.41]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.42]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.43]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.44]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.45]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.46]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#[osd.47]
#ms_async_rdma_local_gid=::::::c0a8:0198
#
#
## RINGO
#
#[osd.48]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.49]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.50]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.51]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.52]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.53]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.54]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.55]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.56]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.57]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.58]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.59]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.60]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.61]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.62]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[osd.63]
#ms_async_rdma_local_gid=::::::c0a8:018e
#
#[mon.rio]
#ms_async_rdma_local_gid=::::::c0a8:013a
#
#[mon.salvador]
#ms_async_rdma_local_gid=::::::c0a8:013b
#
#[mon.medellin]
#ms_async_rdma_local_gid=::::::c0a8:0141

-ceph.conf---end-


--
Michael Green










___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd IO monitoring

2018-12-04 Thread Michael Green
Interesting, thanks for sharing.

I'm looking at the example output in the PR 25114:

write_bytes
 409600/107
 409600/107

 write_latency
2618503617/107

How should these values be interpreted?
--
Michael Green








> On Dec 3, 2018, at 2:47 AM, Jan Fajerski  wrote:
> 
>>  Question: what tools are available to monitor IO stats on RBD level?
>>  That is, IOPS, Throughput, IOs inflight and so on?
> There is some brand new code for rbd io monitoring. This PR 
> (https://github.com/ceph/ceph/pull/25114) added rbd client side perf counters 
> and this PR (https://github.com/ceph/ceph/pull/25358) will add those counters 
> as prometheus metrics. There is also room for an "rbd top" tool, though I 
> haven't seen any code for this.
> I'm sure Mykola (the author of both PRs) could go into more detail if needed. 
> I expect this functionality to land in nautilus.
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd IO monitoring

2018-11-29 Thread Michael Green
Hello collective wisdom,

Ceph neophyte here, running v13.2.2 (mimic).

Question: what tools are available to monitor IO stats on RBD level? That is, 
IOPS, Throughput, IOs inflight and so on?
I'm testing with FIO and want to verify independently the IO load on each RBD 
image.

--
Michael Green
Customer Support & Integration
gr...@e8storage.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-17 Thread Michael Sudnick
What exactly are the symptoms of the problem? I use cephfs with 13.2.2 with
two active MDS daemons and at least on the surface everything looks fine.
Is there anything I should avoid doing until 13.2.3?

On Wed, Oct 17, 2018, 14:10 Patrick Donnelly  wrote:

> On Wed, Oct 17, 2018 at 11:05 AM Alexandre DERUMIER 
> wrote:
> >
> > Hi,
> >
> > Is it possible to have more infos or announce about this problem ?
> >
> > I'm currently waiting to migrate from luminious to mimic, (I need new
> quota feature for cephfs)
> >
> > is it safe to upgrade to 13.2.2 ?
> >
> > or better to wait to 13.2.3 ? or install 13.2.1 for now ?
>
> Upgrading to 13.2.1 would be safe.
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-26 Thread KEVIN MICHAEL HRPCEK
Hey, don't lose hope. I just went through 2 3-5 day outages after a mimic 
upgrade with no data loss. I'd recommend looking through the thread about it to 
see how close it is to your issue. From my point of view there seems to be some 
similarities. 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/029649.html.

At a similar point of desperation with my cluster I would shut all ceph 
processes down and bring them up in order. Doing this had my cluster almost 
healthy a few times until it fell over again due to mon issues. So solving any 
mon issues is the first priority. It seems like you may also benefit from 
setting mon_osd_cache_size to a very large number if you have enough memory on 
your mon servers.

I'll hop on the irc today.

Kevin

On 09/25/2018 05:53 PM, by morphin wrote:

After I tried too many things with so many helps on IRC. My pool
health is still in ERROR and I think I can't recover from this.
https://paste.ubuntu.com/p/HbsFnfkYDT/
At the end 2 of 3 mons crashed and started at same time and the pool
is offlined. Recovery takes more than 12hours and it is way too slow.
Somehow recovery seems to be not working.

If I can reach my data I will re-create the pool easily.
If I run ceph-object-tool script to regenerate mon store.db can I
acccess the RBD pool again?
by morphin , 25 Eyl 
2018 Sal, 20:03
tarihinde şunu yazdı:



Hi,

Cluster is still down :(

Up to not we have managed to compensate the OSDs. 118s of 160 OSD are
stable and cluster is still in the progress of settling. Thanks for
the guy Be-El in the ceph IRC channel. Be-El helped a lot to make
flapping OSDs stable.

What we learned up now is that this is the cause of unsudden death of
2 monitor servers of 3. And when they come back if they do not start
one by one (each after joining cluster) this can happen. Cluster can
be unhealty and it can take countless hour to come back.

Right now here is our status:
ceph -s : https://paste.ubuntu.com/p/6DbgqnGS7t/
health detail: https://paste.ubuntu.com/p/w4gccnqZjR/

Since OSDs disks are NL-SAS it can take up to 24 hours for an online
cluster. What is most it has been said that we could be extremely luck
if all the data is rescued.

Most unhappily our strategy is just to sit and wait :(. As soon as the
peering and activating count drops to 300-500 pgs we will restart the
stopped OSDs one by one. For each OSD and we will wait the cluster to
settle down. The amount of data stored is OSD is 33TB. Our most
concern is to export our rbd pool data outside to a backup space. Then
we will start again with clean one.

I hope to justify our analysis with an expert. Any help or advise
would be greatly appreciated.
by morphin , 25 Eyl 
2018 Sal, 15:08
tarihinde şunu yazdı:



After reducing the recovery parameter values did not change much.
There are a lot of OSD still marked down.

I don't know what I need to do after this point.

[osd]
osd recovery op priority = 63
osd client op priority = 1
osd recovery max active = 1
osd max scrubs = 1


ceph -s
  cluster:
id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106
health: HEALTH_ERR
42 osds down
1 host (6 osds) down
61/8948582 objects unfound (0.001%)
Reduced data availability: 3837 pgs inactive, 1822 pgs
down, 1900 pgs peering, 6 pgs stale
Possible data damage: 18 pgs recovery_unfound
Degraded data redundancy: 457246/17897164 objects degraded
(2.555%), 213 pgs degraded, 209 pgs undersized
2554 slow requests are blocked > 32 sec
3273 slow ops, oldest one blocked for 1453 sec, daemons
[osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
have slow ops.

  services:
mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3,
SRV-SEKUARK4
osd: 168 osds: 118 up, 160 in

  data:
pools:   1 pools, 4096 pgs
objects: 8.95 M objects, 17 TiB
usage:   33 TiB used, 553 TiB / 586 TiB avail
pgs: 93.677% pgs not active
 457246/17897164 objects degraded (2.555%)
 61/8948582 objects unfound (0.001%)
 1676 down
 1372 peering
 528  stale+peering
 164  active+undersized+degraded
 145  stale+down
 73   activating
 40   active+clean
 29   stale+activating
 17   active+recovery_unfound+undersized+degraded
 16   stale+active+clean
 16   stale+active+undersized+degraded
 9activating+undersized+degraded
 3active+recovery_wait+degraded
 2activating+undersized
 2activating+degraded
 1creating+down
 1stale+active+recovery_unfound+undersized+degraded
 1stale+active+clean+scrubbing+deep
 1

Re: [ceph-users] Mimic upgrade failure

2018-09-24 Thread KEVIN MICHAEL HRPCEK
The cluster is healthy and stable. I'll leave a summary for the archive in case 
anyone else has a similar problem.

centos 7.5
ceph mimic 13.2.1
3 mon/mgr/mds hosts, 862 osd (41 hosts)

This was all triggered by an unexpected ~1 min network blip on our 10Gbit 
switch. The ceph cluster lost connectivity to each other and obviously tried to 
remap everything once connectivity returned and tons of OSDs were being marked 
down. This was made worse by the OSDs trying to use large amounts of memory 
while recovering and ending up swapping, hanging, and me ipmi resetting hosts.  
All of this caused a lot of osd map changes and the mons will have stored all 
of them without trimming due to the unhealthy PGs. I was able to get almost all 
PGs active and clean on a few occasions but the cluster would fall over again 
after about 2 hours with cephx auth errors or OSDs trying to mark each other 
down (the mons seemed to not be rotating cephx auth keys). Setting 
'osd_heartbeat_interval = 30' helped a bit, but I eventually disabled process 
cephx auth with 'auth_cluster_required = none'. Setting that stopped the OSDs 
from falling over after 2 hours. From the beginning of this the MONs were 
running 100% on the ms_dispatch thread and constantly reelecting a leader every 
minute and not holding a consistent quorum with paxos lease_timeouts in the 
logs. The ms_dispatch was reading through the 
/var/lib/ceph/mon/mon-$hostname/store.db/*.sst constantly and strace showed 
this taking anywhere from 60 seconds to a couple minutes. This was almost all 
cpu user time and not much iowait. I think what was happening is that the mons 
failed health checks due to spending so much time constantly reading through 
the db and that held up other mon tasks which caused constant reelections.

We eventually reduced the MON reelections by finding the average ms_dispatch 
sst read time on the rank 0 mon took 65 seconds and setting 'mon_lease = 75' so 
that the paxos lease would last longer than ms_dispatch running 100%.  I also 
greatly increased the rocksdb_cache_size and leveldb_cache_size on the mons to 
be big enough to cache the entire db, but that didn't seem to make much 
difference initially. After working with Sage, he set the mon_osd_cache_size = 
20 (default 10). The huge mon_osd_cache_size let the mons cache all osd 
maps on the first read and the ms_dispatch thread was able to use this cache 
instead of spinning 100% on rereading them every minute. This stopped the 
constant elections because the mon stopped failing health checks and was able 
to complete other tasks. Lastly there were some self inflicted osd corruptions 
from the ipmi resets that needed to be dealt with to get all PGs active+clean, 
and the cephx change was rolled back to operate normally.

Sage, thanks again for your assistance with this.

Kevin

tl;dr Cache as much as you can.



On 09/24/2018 09:24 AM, Sage Weil wrote:

Hi Kevin,

Do you have an update on the state of the cluster?

I've opened a ticket http://tracker.ceph.com/issues/36163 to track the
likely root cause we identified, and have a PR open at
https://github.com/ceph/ceph/pull/24247

Thanks!
sage


On Thu, 20 Sep 2018, Sage Weil wrote:


On Thu, 20 Sep 2018, KEVIN MICHAEL HRPCEK wrote:


Top results when both were taken with ms_dispatch at 100%. The mon one
changes alot so I've included 3 snapshots of those. I'll update
mon_osd_cache_size.

After disabling auth_cluster_required and a cluster reboot I am having
less problems keeping OSDs in the cluster since they seem to not be
having auth problems around the 2 hour uptime mark. The mons still have
their problems but 859/861 OSDs are up with 2 crashing. I found a brief
mention on a forum or somewhere that the mons will only trim their
storedb when the cluster is healthy. If that's true do you think it is
likely that once all osds are healthy and unset some no* cluster flags
the mons will be able to trim their db and the result will be that
ms_dispatch no longer takes to long to churn through the db? Our primary
theory here is that ms_dispatch is taking too long and the mons reach a
timeout and then reelect in a nonstop cycle.



It's the PGs that need to all get healthy (active+clean) before the
osdmaps get trimmed.  Other health warnigns (e.g. about noout being set)
aren't related.



ceph-mon
34.24%34.24%  libpthread-2.17.so[.] pthread_rwlock_rdlock
+   34.00%34.00%  libceph-common.so.0   [.] crush_hash32_3



If this is the -g output you need to hit enter on lines like this to see
the call graph...  Or you can do 'perf record -g -p ' and then 'perf
report --stdio' (or similar) to dump it all to a file, fully expanded.

Thanks!
sage



+5.01% 5.01%  libceph-common.so.0   [.] ceph::decode >, 
std::less, mempool::pool_allocator<(mempool::pool_index_t)15, 
std::pair > >, 
std::_Select1st > > >, std::less > >, 
std::_Select1st > > >, std::less::copy
+0.79% 0.79%  libceph-c

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
Select1st > > >,
std::less, mempoo
2.92%  libceph-common.so.0   [.] ceph::buffer::ptr::release
2.65%  libceph-common.so.0   [.]
ceph::buffer::list::iterator_impl::advance
2.57%  ceph-mon  [.] std::_Rb_tree > >, std::_Select1st > > >,
std::less, mempoo
2.27%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
1.99%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
1.93%  libc-2.17.so  [.] __memcpy_ssse3_back
1.91%  libceph-common.so.0   [.] ceph::buffer::ptr::append
1.87%  libceph-common.so.0   [.] crush_hash32_3@plt
1.84%  libceph-common.so.0   [.]
ceph::buffer::list::iterator_impl::copy
1.75%  libtcmalloc.so.4.4.5  [.]
tcmalloc::CentralFreeList::FetchFromOneSpans
1.63%  libceph-common.so.0   [.] ceph::encode >,
std::less, mempool::pool_allocator<(mempool::pool_index_t)15,
std::pair > >
1.57%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
1.55%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
1.47%  libceph-common.so.0   [.] ceph::buffer::ptr::raw_length
1.33%  libtcmalloc.so.4.4.5  [.] tc_deletearray_nothrow
1.09%  libceph-common.so.0   [.] ceph::decode >,
denc_traits >, void> >
1.07%  libtcmalloc.so.4.4.5  [.] operator new[]
1.02%  libceph-common.so.0   [.] ceph::buffer::list::iterator::copy
1.01%  libtcmalloc.so.4.4.5  [.] tc_posix_memalign
0.85%  ceph-mon  [.] ceph::buffer::ptr::release@plt
0.76%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out@plt
0.74%  libceph-common.so.0   [.] crc32_iscsi_00

strace
munmap(0x7f2eda736000, 2463941) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", O_RDONLY) =
429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst",
{st_mode=S_IFREG|0644, st_size=1658656, ...}) = 0
mmap(NULL, 1658656, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eea87e000
close(429)  = 0
munmap(0x7f2ea8c97000, 2468005) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) =
429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst",
{st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0
mmap(NULL, 2484001, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eda74b000
close(429)  = 0
munmap(0x7f2ee21dc000, 2472343) = 0

Kevin


On 09/19/2018 06:50 AM, Sage Weil wrote:

On Wed, 19 Sep 2018, KEVIN MICHAEL HRPCEK wrote:


Sage,

Unfortunately the mon election problem came back yesterday and it makes
it really hard to get a cluster to stay healthy. A brief unexpected
network outage occurred and sent the cluster into a frenzy and when I
had it 95% healthy the mons started their nonstop reelections. In the
previous logs I sent were you able to identify why the mons are
constantly electing? The elections seem to be triggered by the below
paxos message but do you know which lease timeout is being reached or
why the lease isn't renewed instead of calling for an election?

One thing I tried was to shutdown the entire cluster and bring up only
the mon and mgr. The mons weren't able to hold their quorum with no osds
running and the ceph-mon ms_dispatch thread runs at 100% for > 60s at a
time.



This is odd... with no other dameons running I'm not sure what would be
eating up the CPU.  Can you run a 'perf top -p `pidof ceph-mon`' (or
similar) on the machine to see what the process is doing?  You might need
to install ceph-mon-dbg or ceph-debuginfo to get better symbols.



2018-09-19 03:56:21.729 7f4344ec1700 1 mon.sephmon2@1(peon).paxos(paxos
active c 133382665..133383355) lease_timeout -- calling new election



A workaround is probably to increase the lease timeout.  Try setting
mon_lease = 15 (default is 5... could also go higher than 15) in the
ceph.conf for all of the mons.  This is a bit of a band-aid but should
help you keep the mons in quorum until we sort out what is going on.

sage





Thanks
Kevin

On 09/10/2018 07:06 AM, Sage Weil wrote:

I took a look at the mon log you sent.  A few things I noticed:

- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and half of those seem to
be recoveries (cancellation of the failure message).

It does smell a bit like a networking issue, or some tunable that relates
to the messaging layer.  It might be worth looking at an OSD log for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection?  That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).

I'd also confirm that with nodown set the mon quorum stabilizes...

sage




On Mon, 10 Sep 2018, Kevin Hrpcek wrote:



Update for the list archive.

I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier
after
everything was on mimic since the random m

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
0.19   [.] std::_Rb_tree_increment
   1.93%  libc-2.17.so<http://libc-2.17.so>  [.] __memcpy_ssse3_back
   1.91%  libceph-common.so.0   [.] ceph::buffer::ptr::append
   1.87%  libceph-common.so.0   [.] crush_hash32_3@plt
   1.84%  libceph-common.so.0   [.] 
ceph::buffer::list::iterator_impl::copy
   1.75%  libtcmalloc.so.4.4.5  [.] tcmalloc::CentralFreeList::FetchFromOneSpans
   1.63%  libceph-common.so.0   [.] ceph::encode >, std::less, 
mempool::pool_allocator<(mempool::pool_index_t)15, std::pair > >
   1.57%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
   1.55%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
   1.47%  libceph-common.so.0   [.] ceph::buffer::ptr::raw_length
   1.33%  libtcmalloc.so.4.4.5  [.] tc_deletearray_nothrow
   1.09%  libceph-common.so.0   [.] ceph::decode >, 
denc_traits >, void> >
   1.07%  libtcmalloc.so.4.4.5  [.] operator new[]
   1.02%  libceph-common.so.0   [.] ceph::buffer::list::iterator::copy
   1.01%  libtcmalloc.so.4.4.5  [.] tc_posix_memalign
   0.85%  ceph-mon  [.] ceph::buffer::ptr::release@plt
   0.76%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out@plt
   0.74%  libceph-common.so.0   [.] crc32_iscsi_00

strace
munmap(0x7f2eda736000, 2463941) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", O_RDONLY) = 429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", 
{st_mode=S_IFREG|0644, st_size=1658656, ...}) = 0
mmap(NULL, 1658656, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eea87e000
close(429)  = 0
munmap(0x7f2ea8c97000, 2468005) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) = 429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", 
{st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0
mmap(NULL, 2484001, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eda74b000
close(429)  = 0
munmap(0x7f2ee21dc000, 2472343) = 0

Kevin


On 09/19/2018 06:50 AM, Sage Weil wrote:

On Wed, 19 Sep 2018, KEVIN MICHAEL HRPCEK wrote:


Sage,

Unfortunately the mon election problem came back yesterday and it makes
it really hard to get a cluster to stay healthy. A brief unexpected
network outage occurred and sent the cluster into a frenzy and when I
had it 95% healthy the mons started their nonstop reelections. In the
previous logs I sent were you able to identify why the mons are
constantly electing? The elections seem to be triggered by the below
paxos message but do you know which lease timeout is being reached or
why the lease isn't renewed instead of calling for an election?

One thing I tried was to shutdown the entire cluster and bring up only
the mon and mgr. The mons weren't able to hold their quorum with no osds
running and the ceph-mon ms_dispatch thread runs at 100% for > 60s at a
time.



This is odd... with no other dameons running I'm not sure what would be
eating up the CPU.  Can you run a 'perf top -p `pidof ceph-mon`' (or
similar) on the machine to see what the process is doing?  You might need
to install ceph-mon-dbg or ceph-debuginfo to get better symbols.



2018-09-19 03:56:21.729 7f4344ec1700 1 mon.sephmon2@1(peon).paxos(paxos
active c 133382665..133383355) lease_timeout -- calling new election



A workaround is probably to increase the lease timeout.  Try setting
mon_lease = 15 (default is 5... could also go higher than 15) in the
ceph.conf for all of the mons.  This is a bit of a band-aid but should
help you keep the mons in quorum until we sort out what is going on.

sage





Thanks
Kevin

On 09/10/2018 07:06 AM, Sage Weil wrote:

I took a look at the mon log you sent.  A few things I noticed:

- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and half of those seem to
be recoveries (cancellation of the failure message).

It does smell a bit like a networking issue, or some tunable that relates
to the messaging layer.  It might be worth looking at an OSD log for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection?  That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).

I'd also confirm that with nodown set the mon quorum stabilizes...

sage




On Mon, 10 Sep 2018, Kevin Hrpcek wrote:



Update for the list archive.

I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier after
everything was on mimic since the random mass OSD heartbeat failures stopped
and the constant mon election problem went away. I'm still battling with the
cluster reacting poorly to host reboots or small map changes, but I feel like
my current pg:osd ratio may be playing a factor in that since we are 2x normal
pg count while migrating data to new EC pools.

I'm not sure of the root cause but it seems like the mix of l

Re: [ceph-users] Mimic upgrade failure

2018-09-19 Thread KEVIN MICHAEL HRPCEK
t;, void> >
   1.07%  libtcmalloc.so.4.4.5  [.] operator new[]
   1.02%  libceph-common.so.0   [.] ceph::buffer::list::iterator::copy
   1.01%  libtcmalloc.so.4.4.5  [.] tc_posix_memalign
   0.85%  ceph-mon  [.] ceph::buffer::ptr::release@plt
   0.76%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out@plt
   0.74%  libceph-common.so.0   [.] crc32_iscsi_00

strace
munmap(0x7f2eda736000, 2463941) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", O_RDONLY) = 429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", 
{st_mode=S_IFREG|0644, st_size=1658656, ...}) = 0
mmap(NULL, 1658656, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eea87e000
close(429)  = 0
munmap(0x7f2ea8c97000, 2468005) = 0
open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) = 429
stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", 
{st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0
mmap(NULL, 2484001, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eda74b000
close(429)  = 0
munmap(0x7f2ee21dc000, 2472343) = 0

Kevin


On 09/19/2018 06:50 AM, Sage Weil wrote:

On Wed, 19 Sep 2018, KEVIN MICHAEL HRPCEK wrote:


Sage,

Unfortunately the mon election problem came back yesterday and it makes
it really hard to get a cluster to stay healthy. A brief unexpected
network outage occurred and sent the cluster into a frenzy and when I
had it 95% healthy the mons started their nonstop reelections. In the
previous logs I sent were you able to identify why the mons are
constantly electing? The elections seem to be triggered by the below
paxos message but do you know which lease timeout is being reached or
why the lease isn't renewed instead of calling for an election?

One thing I tried was to shutdown the entire cluster and bring up only
the mon and mgr. The mons weren't able to hold their quorum with no osds
running and the ceph-mon ms_dispatch thread runs at 100% for > 60s at a
time.



This is odd... with no other dameons running I'm not sure what would be
eating up the CPU.  Can you run a 'perf top -p `pidof ceph-mon`' (or
similar) on the machine to see what the process is doing?  You might need
to install ceph-mon-dbg or ceph-debuginfo to get better symbols.



2018-09-19 03:56:21.729 7f4344ec1700 1 mon.sephmon2@1(peon).paxos(paxos
active c 133382665..133383355) lease_timeout -- calling new election



A workaround is probably to increase the lease timeout.  Try setting
mon_lease = 15 (default is 5... could also go higher than 15) in the
ceph.conf for all of the mons.  This is a bit of a band-aid but should
help you keep the mons in quorum until we sort out what is going on.

sage





Thanks
Kevin

On 09/10/2018 07:06 AM, Sage Weil wrote:

I took a look at the mon log you sent.  A few things I noticed:

- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and half of those seem to
be recoveries (cancellation of the failure message).

It does smell a bit like a networking issue, or some tunable that relates
to the messaging layer.  It might be worth looking at an OSD log for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection?  That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).

I'd also confirm that with nodown set the mon quorum stabilizes...

sage




On Mon, 10 Sep 2018, Kevin Hrpcek wrote:



Update for the list archive.

I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier after
everything was on mimic since the random mass OSD heartbeat failures stopped
and the constant mon election problem went away. I'm still battling with the
cluster reacting poorly to host reboots or small map changes, but I feel like
my current pg:osd ratio may be playing a factor in that since we are 2x normal
pg count while migrating data to new EC pools.

I'm not sure of the root cause but it seems like the mix of luminous and mimic
did not play well together for some reason. Maybe it has to do with the scale
of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster
has scaled to this size.

Kevin


On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:


Nothing too crazy for non default settings. Some of those osd settings were
in place while I was testing recovery speeds and need to be brought back
closer to defaults. I was setting nodown before but it seems to mask the
problem. While its good to stop the osdmap changes, OSDs would come up, get
marked up, but at some point go down again (but the process is still
running) and still stay up in the map. Then when I'd unset nodown the
cluster would immediately mark 250+ osd down again and i'd be back where I
started.

This morning I went ahead and finished the osd upgrades to mimic to remove
that variabl

Re: [ceph-users] Mimic upgrade failure

2018-09-18 Thread KEVIN MICHAEL HRPCEK
Sage,

Unfortunately the mon election problem came back yesterday and it makes it 
really hard to get a cluster to stay healthy. A brief unexpected network outage 
occurred and sent the cluster into a frenzy and when I had it 95% healthy the 
mons started their nonstop reelections. In the previous logs I sent were you 
able to identify why the mons are constantly electing? The elections seem to be 
triggered by the below paxos message but do you know which lease timeout is 
being reached or why the lease isn't renewed instead of calling for an election?

One thing I tried was to shutdown the entire cluster and bring up only the mon 
and mgr. The mons weren't able to hold their quorum with no osds running and 
the ceph-mon ms_dispatch thread runs at 100% for > 60s at a time.

2018-09-19 03:56:21.729 7f4344ec1700  1 mon.sephmon2@1(peon).paxos(paxos active 
c 133382665..133383355) lease_timeout -- calling new election

Thanks
Kevin

On 09/10/2018 07:06 AM, Sage Weil wrote:

I took a look at the mon log you sent.  A few things I noticed:

- The frequent mon elections seem to get only 2/3 mons about half of the
time.
- The messages coming in a mostly osd_failure, and half of those seem to
be recoveries (cancellation of the failure message).

It does smell a bit like a networking issue, or some tunable that relates
to the messaging layer.  It might be worth looking at an OSD log for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection?  That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).

I'd also confirm that with nodown set the mon quorum stabilizes...

sage




On Mon, 10 Sep 2018, Kevin Hrpcek wrote:



Update for the list archive.

I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier after
everything was on mimic since the random mass OSD heartbeat failures stopped
and the constant mon election problem went away. I'm still battling with the
cluster reacting poorly to host reboots or small map changes, but I feel like
my current pg:osd ratio may be playing a factor in that since we are 2x normal
pg count while migrating data to new EC pools.

I'm not sure of the root cause but it seems like the mix of luminous and mimic
did not play well together for some reason. Maybe it has to do with the scale
of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster
has scaled to this size.

Kevin


On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:


Nothing too crazy for non default settings. Some of those osd settings were
in place while I was testing recovery speeds and need to be brought back
closer to defaults. I was setting nodown before but it seems to mask the
problem. While its good to stop the osdmap changes, OSDs would come up, get
marked up, but at some point go down again (but the process is still
running) and still stay up in the map. Then when I'd unset nodown the
cluster would immediately mark 250+ osd down again and i'd be back where I
started.

This morning I went ahead and finished the osd upgrades to mimic to remove
that variable. I've looked for networking problems but haven't found any. 2
of the mons are on the same switch. I've also tried combinations of shutting
down a mon to see if a single one was the problem, but they keep electing no
matter the mix of them that are up. Part of it feels like a networking
problem but I haven't been able to find a culprit yet as everything was
working normally before starting the upgrade. Other than the constant mon
elections, yesterday I had the cluster 95% healthy 3 or 4 times, but it
doesn't last long since at some point the OSDs start trying to fail each
other through their heartbeats.
2018-09-09 17:37:29.079 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
prepare_failure osd.39 10.1.9.2:6802/168438 from osd.49 10.1.9.3:6884/317908
is reporting failure:1
2018-09-09 17:37:29.079 7eff774f5700  0 log_channel(cluster) log [DBG] :
osd.39 10.1.9.2:6802/168438 reported failed by osd.49 10.1.9.3:6884/317908
2018-09-09 17:37:29.083 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
prepare_failure osd.93 10.1.9.9:6853/287469 from osd.372
10.1.9.13:6801/275806 is reporting failure:1

I'm working on getting things mostly good again with everything on mimic and
will see if it behaves better.

Thanks for your input on this David.


[global]
mon_initial_members = sephmon1, sephmon2, sephmon3
mon_host = 10.1.9.201,10.1.9.202,10.1.9.203
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 10.1.0.0/16
osd backfill full ratio = 0.92
osd failsafe nearfull ratio = 0.90
osd max object size = 21474836480
mon max pg per osd = 350

[mon]
mon warn on legacy crush tunables = false
mon pg warn max per osd = 300
mon osd down out subtree limit = host
mon osd nearfull ratio = 0.90
mon osd full ratio = 0.97
mon health 

[ceph-users] Error-code 2002/API 405 S3 REST API. Creating a new bucket

2018-09-17 Thread Michael Schäfer
Hi, 

We have a problem with the radosgw using the S3 REST API. 
Trying to create a new bucket does not work. 
We got an 405 on API level and the  log does indicate an 2002 error.
Do anybody know, what this error-code does mean? Find the radosgw-log attached 

Bests,
Michael

2018-09-17 11:58:03.388 7f65250c2700  1 == starting new request 
req=0x7f65250b9830 =
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.20::GET 
/egobackup::initializing for trans_id = 
tx00014-005b9f88bb-d393-default
2018-09-17 11:58:03.388 7f65250c2700 10 rgw api priority: s3=5 s3website=4
2018-09-17 11:58:03.388 7f65250c2700 10 host=85.214.24.54
2018-09-17 11:58:03.388 7f65250c2700 20 subdomain= domain= in_hosted_domain=0 
in_hosted_domain_s3website=0
2018-09-17 11:58:03.388 7f65250c2700 20 final domain/bucket subdomain= domain= 
in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_
uri=/egobackup
2018-09-17 11:58:03.388 7f65250c2700 20 get_handler 
handler=25RGWHandler_REST_Bucket_S3
2018-09-17 11:58:03.388 7f65250c2700 10 handler=25RGWHandler_REST_Bucket_S3
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.81:s3:GET 
/egobackup::getting op 0
2018-09-17 11:58:03.388 7f65250c2700 10 op=32RGWGetBucketLocation_ObjStore_S3
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.86:s3:GET 
/egobackup:get_bucket_location:verifying requester
2018-09-17 11:58:03.388 7f65250c2700 20 
rgw::auth::StrategyRegistry::s3_main_strategy_t: trying 
rgw::auth::s3::AWSAuthStrategy
2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy: trying 
rgw::auth::s3::S3AnonymousEngine
2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::S3AnonymousEngine denied 
with reason=-1
2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy: trying 
rgw::auth::s3::LocalEngine
2018-09-17 11:58:03.388 7f65250c2700 10 get_canon_resource(): 
dest=/egobackup?location
2018-09-17 11:58:03.388 7f65250c2700 10 string_to_sign:
GET
1B2M2Y8AsgTpgAmY7PhCfg==

Mon, 17 Sep 2018 10:58:03 GMT
/egobackup?location
2018-09-17 11:58:03.388 7f65250c2700 15 string_to_sign=GET
1B2M2Y8AsgTpgAmY7PhCfg==

Mon, 17 Sep 2018 10:58:03 GMT
/egobackup?location
2018-09-17 11:58:03.388 7f65250c2700 15 server 
signature=fbEd2DlKyKC8JOXTgMZSXV68ngc=
2018-09-17 11:58:03.388 7f65250c2700 15 client 
signature=fbEd2DlKyKC8JOXTgMZSXV68ngc=
2018-09-17 11:58:03.388 7f65250c2700 15 compare=0
2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::LocalEngine granted 
access
2018-09-17 11:58:03.388 7f65250c2700 20 rgw::auth::s3::AWSAuthStrategy granted 
access
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.000226:s3:GET 
/egobackup:get_bucket_location:normalizing buckets and tenants
2018-09-17 11:58:03.388 7f65250c2700 10 s->object= s->bucket=egobackup
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.000235:s3:GET 
/egobackup:get_bucket_location:init permissions
2018-09-17 11:58:03.388 7f65250c2700 20 get_system_obj_state: 
rctx=0x7f65250b7a30 obj=default.rgw.meta:root:egobackup state=0x55b1bc2e1220 
s->prefetch_data=0
2018-09-17 11:58:03.388 7f65250c2700 10 cache get: 
name=default.rgw.meta+root+egobackup : miss
2018-09-17 11:58:03.388 7f65250c2700 10 cache put: 
name=default.rgw.meta+root+egobackup info.flags=0x0
2018-09-17 11:58:03.388 7f65250c2700 10 adding default.rgw.meta+root+egobackup 
to cache LRU end
2018-09-17 11:58:03.388 7f65250c2700 10 init_permissions on egobackup[] failed, 
ret=-2002
2018-09-17 11:58:03.388 7f65250c2700 20 op->ERRORHANDLER: err_no=-2002 
new_err_no=-2002
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_status: e=0, 
sent=24, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header: e=0, 
sent=0, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_content_length: 
e=0, sent=21, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header: e=0, 
sent=0, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_header: e=0, 
sent=0, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::complete_header: e=0, 
sent=159, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::set_account: e=1
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::send_body: e=1, 
sent=219, total=0
2018-09-17 11:58:03.388 7f65250c2700 30 AccountingFilter::complete_request: 
e=1, sent=0, total=219
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.001272:s3:GET 
/egobackup:get_bucket_location:op status=0
2018-09-17 11:58:03.388 7f65250c2700  2 req 20:0.001276:s3:GET 
/egobackup:get_bucket_location:http status=404
2018-09-17 11:58:03.388 7f65250c2700  1 == req done req=0x7f65250b9830 op 
status=0 http_status=404 ==
2018-09-17 11:58:03.388 7f65250c2700 20 process_request() returned -2002
2018-09-17 11:58:03.388 7f65250c2700  1 civetweb: 0x55b1bc68e000: 
81.169.156.122 - - [17/Sep/2018:11:58:03 +0100] "GET /egobackup?location 
HTTP/1.1" 404 423 - -
2018-09-17 11:58:03.388 7f65

Re: [ceph-users] Cephfs kernel driver availability

2018-07-23 Thread Michael Kuriger
If you're using CentOS/RHEL you can try the elrepo kernels

Mike Kuriger 



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Monday, July 23, 2018 5:07 AM
To: Bryan Henderson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cephfs kernel driver availability

On Sun, Jul 22, 2018 at 9:03 PM Bryan Henderson  wrote:
>
> Is there some better place to get a filesystem driver for the longterm
> stable Linux kernel (3.16) than the regular kernel.org source distribution?

The general advice[1] on this is not to try and use a 3.x kernel with
CephFS.  The only exception is if your distro provider is doing
special backports (latest RHEL releases have CephFS backports).  This
causes some confusion, because a number of distros that have shipped
"stable" kernels with older, known unstable CephFS code.

If you're building your own kernels then you definitely want to be on
a recent 4.x

John

1. 
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_cephfs_best-2Dpractices_-23which-2Dkernel-2Dversion=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=nAbQCNqk5k58F3w1fk-APMYb49ODP3WlGtdkQNjwU4Q=

> The reason I ask is that I have been trying to get some clients running
> Linux kernel 3.16 (the current long term stable Linux kernel) and so far
> I have run into two serious bugs that, it turns out, were found and fixed
> years ago in more current mainline kernels.
>
> In both cases, I emailed Ben Hutchings, the apparent maintainer of 3.16,
> asking if the fixes could be added to 3.16, but was met with silence.  This
> leads me to believe that there are many more bugs in the 3.16 cephfs
> filesystem driver waiting for me.  Indeed, I've seen panics not yet explained.
>
> So what are other people using?  A less stable kernel?  An out-of-tree driver?
> FUSE?  Is there a working process for getting known bugs fixed in 3.16?
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=lLPHUayL4gqcIGSbOL6XkIuUPBs14rsGI6hFq1UtXvI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=lLPHUayL4gqcIGSbOL6XkIuUPBs14rsGI6hFq1UtXvI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSDs for data drives

2018-07-16 Thread Michael Kuriger
I dunno, to me benchmark tests are only really useful to compare different 
drives.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Paul 
Emmerich
Sent: Monday, July 16, 2018 8:41 AM
To: Satish Patel
Cc: ceph-users
Subject: Re: [ceph-users] SSDs for data drives

This doesn't look like a good benchmark:

(from the blog post)

dd if=/dev/zero of=/mnt/rawdisk/data.bin bs=1G count=20 oflag=direct
1. it writes compressible data which some SSDs might compress, you should use 
urandom
2. that workload does not look like something Ceph will do to your disk, like 
not at all
If you want a quick estimate of an SSD in worst-case scenario: run the usual 4k 
oflag=direct,dsync test (or better: fio).
A bad SSD will get < 1k IOPS, a good one > 10k
But that doesn't test everything. In particular, performance might degrade as 
the disks fill up. Also, it's the absolute
worst-case, i.e., a disk used for multiple journal/wal devices


Paul

2018-07-16 10:09 GMT-04:00 Satish Patel 
mailto:satish@gmail.com>>:
https://blog.cypressxt.net/hello-ceph-and-samsung-850-evo/

On Thu, Jul 12, 2018 at 3:37 AM, Adrian Saul
mailto:adrian.s...@tpgtelecom.com.au>> wrote:
>
>
> We started our cluster with consumer (Samsung EVO) disks and the write
> performance was pitiful, they had periodic spikes in latency (average of
> 8ms, but much higher spikes) and just did not perform anywhere near where we
> were expecting.
>
>
>
> When replaced with SM863 based devices the difference was night and day.
> The DC grade disks held a nearly constant low latency (contantly sub-ms), no
> spiking and performance was massively better.   For a period I ran both
> disks in the cluster and was able to graph them side by side with the same
> workload.  This was not even a moderately loaded cluster so I am glad we
> discovered this before we went full scale.
>
>
>
> So while you certainly can do cheap and cheerful and let the data
> availability be handled by Ceph, don’t expect the performance to keep up.
>
>
>
>
>
>
>
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com]
>  On Behalf Of
> Satish Patel
> Sent: Wednesday, 11 July 2018 10:50 PM
> To: Paul Emmerich mailto:paul.emmer...@croit.io>>
> Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
> Subject: Re: [ceph-users] SSDs for data drives
>
>
>
> Prices going way up if I am picking Samsung SM863a for all data drives.
>
>
>
> We have many servers running on consumer grade sad drives and we never
> noticed any performance or any fault so far (but we never used ceph before)
>
>
>
> I thought that is the whole point of ceph to provide high availability if
> drive go down also parellel read from multiple osd node
>
>
>
> Sent from my iPhone
>
>
> On Jul 11, 2018, at 6:57 AM, Paul Emmerich 
> mailto:paul.emmer...@croit.io>> wrote:
>
> Hi,
>
>
>
> we‘ve no long-term data for the SM variant.
>
> Performance is fine as far as we can tell, but the main difference between
> these two models should be endurance.
>
>
>
>
>
> Also, I forgot to mention that my experiences are only for the 1, 2, and 4
> TB variants. Smaller SSDs are often proportionally slower (especially below
> 500GB).
>
>
>
> Paul
>
>
> Robert Stanford mailto:rstanford8...@gmail.com>>:
>
> Paul -
>
>
>
>  That's extremely helpful, thanks.  I do have another cluster that uses
> Samsung SM863a just for journal (spinning disks for data).  Do you happen to
> have an opinion on those as well?
>
>
>
> On Wed, Jul 11, 2018 at 4:03 AM, Paul Emmerich 
> mailto:paul.emmer...@croit.io>>
> wrote:
>
> PM/SM863a are usually great disks and should be the default go-to option,
> they outperform
>
> even the more expensive PM1633 in our experience.
>
> (But that really doesn't matter if it's for the full OSD and not as
> dedicated WAL/journal)
>
>
>
> We got a cluster with a few hundred SanDisk Ultra II (discontinued, i
> believe) that was built on a budget.
>
> Not the best disk but great value. They have been running since ~3 years now
> with very few failures and
>
> okayish overall performance.
>
>
>
> We also got a few clusters with a few hundred SanDisk Extreme Pro, but we
> are not yet sure about their
>
> long-time durability as they are only ~9 months old (average of ~1000 write
> IOPS on each disk over that time).
>
> Some of them report only 50-60% lifetime left.
>
>
>
> For NVMe, the Intel NVMe 750 is still a great disk
>
>
>
> Be carefuly to get these exact models. Seemingly similar disks might be just
> completely bad, for
>
> example, the Samsung PM961 is just unusable for Ceph in our experience.
>
>
>
> Paul
>
>
>
> 2018-07-11 10:14 GMT+02:00 Wido den Hollander 
> mailto:w...@42on.com>>:
>
>
>
> On 07/11/2018 10:10 AM, 

Re: [ceph-users] Ceph Mimic on CentOS 7.5 dependency issue (liboath)

2018-06-23 Thread Michael Kuriger
CentOS 7.5 is pretty new.  Have you tried CentOS 7.4?

Mike Kuriger 
Sr. Unix Systems Engineer 



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brian :
Sent: Saturday, June 23, 2018 1:41 AM
To: Stefan Kooman
Cc: ceph-users
Subject: Re: [ceph-users] Ceph Mimic on CentOS 7.5 dependency issue (liboath)

Hi Stefan

$ sudo yum provides liboath
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.strencom.net
 * epel: mirror.sax.uk.as61049.net
 * extras: mirror.strencom.net
 * updates: mirror.strencom.net
liboath-2.4.1-9.el7.x86_64 : Library for OATH handling
Repo: epel



On Sat, Jun 23, 2018 at 9:02 AM, Stefan Kooman  wrote:
> Hi list,
>
> I'm trying to install "Ceph mimic" on a CentOS 7.5 client (base
> install). I Added the "rpm-mimic" repo from our mirror and tried to
> install ceph-common, but I run into a dependency problem:
>
> --> Finished Dependency Resolution
> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>Requires: liboath.so.0()(64bit)
> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
> Error: Package: 2:ceph-common-13.2.0-0.el7.x86_64 
> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>Requires: liboath.so.0(LIBOATH_1.2.0)(64bit)
> Error: Package: 2:librgw2-13.2.0-0.el7.x86_64 
> (ceph.download.bit.nl_rpm-mimic_el7_x86_64)
>
> Is this "oath" package something I need to install from a 3rd party repo?
>
> Gr. Stefan
>
>
> --
> | BIT BV  
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bit.nl_=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=aPpOV3zxQyodG4OBQXMWTPfFJBgGMq-9tNFoSSEhMxQ=
> Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=TPzdw4kJULbx2F0LQ1N-L3aQsxWzkKkW0X6b6NJJ5OI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=7oT7QCZjOE1RiCQwuYT5PejOv8n637nUi2yb5vE1aaQ=TPzdw4kJULbx2F0LQ1N-L3aQsxWzkKkW0X6b6NJJ5OI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install ceph manually with some problem

2018-06-18 Thread Michael Kuriger
Don’t use the installer scripts.  Try  yum install ceph

Mike Kuriger
Sr. Unix Systems Engineer
T: 818-649-7235 M: 818-434-6195
[ttp://www.hotyellow.com/deximages/dex-thryv-logo.jpg]

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ch Wan
Sent: Monday, June 18, 2018 2:40 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Install ceph manually with some problem

Hi, recently I'm trying to build ceph luminous on centos-7, following 
the documents:
sudo ./install-deps.sh
./do_cmake.sh
cd build && sudo make install

But when I run /usr/local/bin/ceph -v, it failed with there error:
Traceback (most recent call last):
  File "/usr/local/bin/ceph", line 125, in 
import rados
ImportError: No module named rados

I noticed that where are some warn messages while make install
Copying /data/ceph/ceph/build/src/pybind/rgw/rgw.egg-info to 
/usr/local/lib64/python2.7/site-packages/rgw-2.0.0-py2.7.egg-info
running install_scripts
writing list of installed files to '/dev/null'
running install
Checking .pth file support in /usr/local/lib/python2.7/site-packages/
/bin/python2.7 -E -c pass
TEST FAILED: /usr/local/lib/python2.7/site-packages/ does NOT support .pth files
error: bad install directory or PYTHONPATH
You are attempting to install a package to a directory that is not
on PYTHONPATH and which Python does not read ".pth" files from.  The
installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:
/usr/local/lib/python2.7/site-packages/
and your PYTHONPATH environment variable currently contains:
''
Here are some of your options for correcting the problem:
* You can choose a different installation directory, i.e., one that is
  on PYTHONPATH or supports .pth files
* You can add the installation directory to the PYTHONPATH environment
  variable.  (It must then also be on PYTHONPATH whenever you run
  Python and want to use the package(s) you are installing.)
* You can set up the installation directory to support ".pth" files by
  using one of the approaches described here:
  
https://pythonhosted.org/setuptools/easy_install.html#custom-installation-locations

But there is not rados.py under  /usr/local/lib/python2.7/site-packages/
[ceph@ceph-test ceph]$ ll /usr/local/lib/python2.7/site-packages/
total 132
-rw-r--r-- 1 root root 43675 Jun  8 00:21 ceph_argparse.py
-rw-r--r-- 1 root root 14242 Jun  8 00:21 ceph_daemon.py
-rw-r--r-- 1 root root 17426 Jun  8 00:21 ceph_rest_api.py
-rw-r--r-- 1 root root 51076 Jun  8 00:21 ceph_volume_client.py

Would someone help me please?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot add new OSDs in mimic

2018-06-10 Thread Michael Kuriger
Oh boy! Thankfully I upgraded our sandbox cluster so I’m not in a sticky 
situation right now :-D

Mike Kuriger
Sr. Unix Systems Engineer


From: Sergey Malinin [mailto:h...@newmail.com]
Sent: Friday, June 08, 2018 4:22 PM
To: Michael Kuriger; Paul Emmerich
Cc: ceph-users
Subject: Re: [ceph-users] cannot add new OSDs in mimic

Lack of developers response (I reported the issue on Jun, 4) leads me to 
believe that it’s not a trivial problem and we all should be getting prepared 
for a hard time playing with osdmaptool...
On Jun 9, 2018, 02:10 +0300, Paul Emmerich , wrote:

Hi,

we are also seeing this (I've also posted to the issue tracker). It only 
affects clusters upgraded from Luminous, not new ones.
Also, it's not about re-using OSDs. Deleting any OSD seems to trigger this bug 
for all new OSDs on upgraded clusters.

We are still using the pre-Luminous way to remove OSDs, i.e.:

* ceph osd down/stop service
* ceph osd crush remove
* ceph osd auth del
* ceph osd rm

Paul


2018-06-08 22:14 GMT+02:00 Michael Kuriger 
mailto:mk7...@dexyp.com>>:
Hi everyone,
I appreciate the suggestions. However, this is still an issue. I've tried 
adding the OSD using ceph-deploy, and manually from the OSD host. I'm not able 
to start newly added OSDs at all, even if I use a new ID. It seems the OSD is 
added to CEPH but I cannot start it. OSDs that existed prior to the upgrade to 
mimic are working fine. Here is a copy of an OSD log entry.

osd.58 0 failed to load OSD map for epoch 378084, got 0 bytes

fsid 1ce494ac-a218-4141-9d4f-295e6fa12f2a
last_changed 2018-06-05 15:40:50.179880
created 0.00
0: 
10.3.71.36:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.71.36-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=aap6xxqBQYKDPIw3OK6MnHfv1CiYL5-1B_jfyqjQZNw=>
 mon.ceph-mon3
1: 
10.3.74.109:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.74.109-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=hvu4xhA9c0NETJSWu7jnt0ZjFtwU-qKTT2-1Yngk4Hs=>
 mon.ceph-mon2
2: 
10.3.74.214:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.74.214-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=BzB5PjlNjh3v9xIJ7IQ9W6ygQBt9eMeuomBA6-CBotY=>
 mon.ceph-mon1

   -91> 2018-06-08 12:48:20.697 7fada058e700  1 -- 
10.3.56.69:6800/1807239<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.56.69-3A6800_1807239=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=j3rbfSYmPYmJsZM_o6ylRmfPbpKJi7Fiwl8vsUAnxJQ=>
 <== mon.0 
10.3.71.36:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.71.36-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=aap6xxqBQYKDPIw3OK6MnHfv1CiYL5-1B_jfyqjQZNw=>
 7  auth_reply(proto 2 0 (0) Success) v1  194+0+0 
(645793352 0 0) 0x559f7a3dafc0 con 0x559f7994ec00
   -90> 2018-06-08 12:48:20.697 7fada058e700 10 monclient: _check_auth_rotating 
have uptodate secrets (they expire after 2018-06-08 12:47:50.699337)
   -89> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: wait_auth_rotating 
done
   -88> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: _send_command 1 
[{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["58"]}]
   -87> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: _send_mon_message to 
mon.ceph-mon3 at 
10.3.71.36:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.71.36-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=aap6xxqBQYKDPIw3OK6MnHfv1CiYL5-1B_jfyqjQZNw=>
   -86> 2018-06-08 12:48:20.698 7fadbc9d7140  1 -- 
10.3.56.69:6800/1807239<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.56.69-3A6800_1807239=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=j3rbfSYmPYmJsZM_o6ylRmfPbpKJi7Fiwl8vsUAnxJQ=>
 --> 
10.3.71.36:6789/0<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.71.36-3A6789_0=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=aap6xxqBQYKDPIw3OK6MnHfv1CiYL5-1B_jfyqjQZNw=>
 -- mon_command({"prefix": "osd crush set-device-class", "class": "hdd", "ids": 
["58"]} v 0) v1 -- 0x559f793e73c0 con 0
   -85> 2018-06-08 12:48:20.700 7fadabaa4700  5 -- 
10.3.56.69:6800/1807239<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.3.56.69-3A6800_1807239=DwMFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=XnK-r3TbnkB2B_YLVF5z_TQHgbOpI4xOAk-vuFGv05k=j3rbfSYmPYmJsZM_o6ylRmfPbpKJi7Fiwl8vsUAnxJQ=>
 >> 
10.3.71.36:6789/0<https://urldefense.proofp

Re: [ceph-users] mimic: failed to load OSD map for epoch X, got 0 bytes

2018-06-08 Thread Michael Sudnick
I'm getting the same issue.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot add new OSDs in mimic

2018-06-08 Thread Michael Kuriger
Hi everyone,
I appreciate the suggestions. However, this is still an issue. I've tried 
adding the OSD using ceph-deploy, and manually from the OSD host. I'm not able 
to start newly added OSDs at all, even if I use a new ID. It seems the OSD is 
added to CEPH but I cannot start it. OSDs that existed prior to the upgrade to 
mimic are working fine. Here is a copy of an OSD log entry. 

osd.58 0 failed to load OSD map for epoch 378084, got 0 bytes

fsid 1ce494ac-a218-4141-9d4f-295e6fa12f2a
last_changed 2018-06-05 15:40:50.179880
created 0.00
0: 10.3.71.36:6789/0 mon.ceph-mon3
1: 10.3.74.109:6789/0 mon.ceph-mon2
2: 10.3.74.214:6789/0 mon.ceph-mon1

   -91> 2018-06-08 12:48:20.697 7fada058e700  1 -- 10.3.56.69:6800/1807239 <== 
mon.0 10.3.71.36:6789/0 7  auth_reply(proto 2 0 (0) Success) v1  
194+0+0 (645793352 0 0) 0x559f7a3dafc0 con 0x559f7994ec00
   -90> 2018-06-08 12:48:20.697 7fada058e700 10 monclient: _check_auth_rotating 
have uptodate secrets (they expire after 2018-06-08 12:47:50.699337)
   -89> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: wait_auth_rotating 
done
   -88> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: _send_command 1 
[{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["58"]}]
   -87> 2018-06-08 12:48:20.698 7fadbc9d7140 10 monclient: _send_mon_message to 
mon.ceph-mon3 at 10.3.71.36:6789/0
   -86> 2018-06-08 12:48:20.698 7fadbc9d7140  1 -- 10.3.56.69:6800/1807239 --> 
10.3.71.36:6789/0 -- mon_command({"prefix": "osd crush set-device-class", 
"class": "hdd", "ids": ["58"]} v 0) v1 -- 0x559f793e73c0 con 0
   -85> 2018-06-08 12:48:20.700 7fadabaa4700  5 -- 10.3.56.69:6800/1807239 >> 
10.3.71.36:6789/0 conn(0x559f7994ec00 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=25741 cs=1 l=1). rx mon.0 seq 
8 0x559f793e73c0 mon_command_ack([{"prefix": "osd crush set-device-class", 
"class": "hdd", "ids": ["58"]}]=0 osd.58 already set to class hdd. 
set-device-class item id 58 name 'osd.58' device_class 'hdd': no change.  
v378738) v1
   -84> 2018-06-08 12:48:20.701 7fada058e700  1 -- 10.3.56.69:6800/1807239 <== 
mon.0 10.3.71.36:6789/0 8  mon_command_ack([{"prefix": "osd crush 
set-device-class", "class": "hdd", "ids": ["58"]}]=0 osd.58 already set to 
class hdd. set-device-class item id 58 name 'osd.58' device_class 'hdd': no 
change.  v378738) v1  211+0+0 (4063854475 0 0) 0x559f793e73c0 con 
0x559f7994ec00
   -83> 2018-06-08 12:48:20.701 7fada058e700 10 monclient: 
handle_mon_command_ack 1 [{"prefix": "osd crush set-device-class", "class": 
"hdd", "ids": ["58"]}]
   -82> 2018-06-08 12:48:20.701 7fada058e700 10 monclient: _finish_command 1 = 
0 osd.58 already set to class hdd. set-device-class item id 58 name 'osd.58' 
device_class 'hdd': no change.
   -81> 2018-06-08 12:48:20.701 7fadbc9d7140 10 monclient: _send_command 2 
[{"prefix": "osd crush create-or-move", "id": 58, "weight":0.5240, "args": 
["host=sacephnode12", "root=default"]}]
   -80> 2018-06-08 12:48:20.701 7fadbc9d7140 10 monclient: _send_mon_message to 
mon.ceph-mon3 at 10.3.71.36:6789/0
   -79> 2018-06-08 12:48:20.701 7fadbc9d7140  1 -- 10.3.56.69:6800/1807239 --> 
10.3.71.36:6789/0 -- mon_command({"prefix": "osd crush create-or-move", "id": 
58, "weight":0.5240, "args": ["host=sacephnode12", "root=default"]} v 0) v1 -- 
0x559f793e7600 con 0
   -78> 2018-06-08 12:48:20.703 7fadabaa4700  5 -- 10.3.56.69:6800/1807239 >> 
10.3.71.36:6789/0 conn(0x559f7994ec00 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=25741 cs=1 l=1). rx mon.0 seq 
9 0x559f793e7600 mon_command_ack([{"prefix": "osd crush create-or-move", "id": 
58, "weight":0.5240, "args": ["host=sacephnode12", "root=default"]}]=0 
create-or-move updated item name 'osd.58' weight 0.524 at location 
{host=sacephnode12,root=default} to crush map v378738) v1
   -77> 2018-06-08 12:48:20.703 7fada058e700  1 -- 10.3.56.69:6800/1807239 <== 
mon.0 10.3.71.36:6789/0 9  mon_command_ack([{"prefix": "osd crush 
create-or-move", "id": 58, "weight":0.5240, "args": ["host=sacephnode12", 
"root=default"]}]=0 create-or-move updated item name 'osd.58' weight 0.524 at 
location {host=sacephnode12,root=default} to crush map v378738) v1  258+0+0 
(1998484028 0 0) 0x559f793e7600 con 0x559f7994ec00
   -76> 2018-06-08 12:48:20.703 7fada058e700 10 monclient: 
handle_mon_command_ack 2 [{"prefix": "osd crush create-or-move", "id": 58, 
"weight":0.5240, "args": ["host=sacephnode12", "root=default"]}]
   -75> 2018-06-08 12:48:20.703 7fada058e700 10 monclient: _finish_command 2 = 
0 create-or-move updated item name 'osd.58' weight 0.524 at location 
{host=sacephnode12,root=default} to crush map
   -74> 2018-06-08 12:48:20.703 7fadbc9d7140  0 osd.58 0 done with init, 
starting boot process
   -73> 2018-06-08 12:48:20.703 7fadbc9d7140 10 monclient: _renew_subs
   -72> 2018-06-08 12:48:20.703 7fadbc9d7140 10 monclient: _send_mon_message to 
mon.ceph-mon3 at 10.3.71.36:6789/0
   -71> 2018-06-08 12:48:20.703 7fadbc9d7140  1 -- 

Re: [ceph-users] cannot add new OSDs in mimic

2018-06-07 Thread Michael Kuriger
Yes, I followed the procedure. Also, I'm not able to create new OSD's at all in 
mimic, even on a newly deployed osd server. Same error. Even if I pass the --id 
{1d} parameter to the ceph-volume command, it still uses the first available ID 
and not the one I specify.


Mike Kuriger 
Sr. Unix Systems Engineer 
T: 818-649-7235 M: 818-434-6195 



-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com] 
Sent: Thursday, June 07, 2018 1:53 PM
To: Michael Kuriger
Cc: ceph-users
Subject: Re: [ceph-users] cannot add new OSDs in mimic

It is actually documented in replacing osd case,
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_rados_operations_add-2Dor-2Drm-2Dosds_-23replacing-2Dan-2Dosd=DwIFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=aq6X3Wv3kt3ORFoya83IqZqUQY0UzkWP_E09S0RuOk8=8qLqOnvmldGsBQFSfdTyP9q4tPrD5oViYvgvybXJDm8=,
I hope you followed that procedure?

On Thu, Jun 7, 2018 at 1:11 PM, Michael Kuriger  wrote:
> Do you mean:
> ceph osd destroy {ID}  --yes-i-really-mean-it
>
> Mike Kuriger
>
>
>
> -Original Message-
> From: Vasu Kulkarni [mailto:vakul...@redhat.com]
> Sent: Thursday, June 07, 2018 12:28 PM
> To: Michael Kuriger
> Cc: ceph-users
> Subject: Re: [ceph-users] cannot add new OSDs in mimic
>
> There is a osd destroy command but not documented, did you run that as well?
>
> On Thu, Jun 7, 2018 at 12:21 PM, Michael Kuriger  wrote:
>> CEPH team,
>> Is there a solution yet for adding OSDs in mimic - specifically re-using old 
>> IDs?  I was looking over this BUG report - 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_24423=DwIFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=0PCKiecm216R95S_krqboYMskCBoolGysrvgHZo8LEM=hfI2uudTfY0lGtBI6iIXvZWvNpme4xwBJe2SWx0_N3I=
>>  and my issue is similar.  I removed a bunch of OSD's after upgrading to 
>> mimic and I'm not able to re-add them using the new volume format.  I 
>> haven't tried manually adding them using 'never used' IDs.  I'll try that 
>> now but was hoping there would be a fix.
>>
>> Thanks!
>>
>> Mike Kuriger
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=0PCKiecm216R95S_krqboYMskCBoolGysrvgHZo8LEM=2aoWc5hTz041_26Stz6zPtLiB5zGFw2GbX3TPjsvieI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot add new OSDs in mimic

2018-06-07 Thread Michael Kuriger
Do you mean:
ceph osd destroy {ID}  --yes-i-really-mean-it

Mike Kuriger 



-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com] 
Sent: Thursday, June 07, 2018 12:28 PM
To: Michael Kuriger
Cc: ceph-users
Subject: Re: [ceph-users] cannot add new OSDs in mimic

There is a osd destroy command but not documented, did you run that as well?

On Thu, Jun 7, 2018 at 12:21 PM, Michael Kuriger  wrote:
> CEPH team,
> Is there a solution yet for adding OSDs in mimic - specifically re-using old 
> IDs?  I was looking over this BUG report - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_24423=DwIFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=0PCKiecm216R95S_krqboYMskCBoolGysrvgHZo8LEM=hfI2uudTfY0lGtBI6iIXvZWvNpme4xwBJe2SWx0_N3I=
>  and my issue is similar.  I removed a bunch of OSD's after upgrading to 
> mimic and I'm not able to re-add them using the new volume format.  I haven't 
> tried manually adding them using 'never used' IDs.  I'll try that now but was 
> hoping there would be a fix.
>
> Thanks!
>
> Mike Kuriger
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=0PCKiecm216R95S_krqboYMskCBoolGysrvgHZo8LEM=2aoWc5hTz041_26Stz6zPtLiB5zGFw2GbX3TPjsvieI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cannot add new OSDs in mimic

2018-06-07 Thread Michael Kuriger
CEPH team,
Is there a solution yet for adding OSDs in mimic - specifically re-using old 
IDs?  I was looking over this BUG report - 
https://tracker.ceph.com/issues/24423 and my issue is similar.  I removed a 
bunch of OSD's after upgrading to mimic and I'm not able to re-add them using 
the new volume format.  I haven't tried manually adding them using 'never used' 
IDs.  I'll try that now but was hoping there would be a fix.

Thanks!

Mike Kuriger 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd@ service keeps restarting after removing osd

2018-06-04 Thread Michael Burk
On Thu, May 31, 2018 at 4:40 PM Gregory Farnum  wrote:

> On Thu, May 24, 2018 at 9:15 AM Michael Burk 
> wrote:
>
>> Hello,
>>
>> I'm trying to replace my OSDs with higher capacity drives. I went through
>> the steps to remove the OSD on the OSD node:
>> # ceph osd out osd.2
>> # ceph osd down osd.2
>> # ceph osd rm osd.2
>> Error EBUSY: osd.2 is still up; must be down before removal.
>> # systemctl stop ceph-osd@2
>> # ceph osd rm osd.2
>> removed osd.2
>> # ceph osd crush rm osd.2
>> removed item id 2 name 'osd.2' from crush map
>> # ceph auth del osd.2
>> updated
>>
>> umount /var/lib/ceph/osd/ceph-2
>>
>> It no longer shows in the crush map, and I am ready to remove the drive.
>> However, the ceph-osd@ service keeps restarting and mounting the disk in
>> /var/lib/ceph/osd. I do "systemctl stop ceph-osd@2" and umount the disk,
>> but then the service starts again and mounts the drive.
>>
>> # systemctl stop ceph-osd@2
>> # umount /var/lib/ceph/osd/ceph-2
>>
>> /dev/sdb1 on /var/lib/ceph/osd/ceph-2 type xfs
>> (rw,noatime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota)
>>
>> ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous
>> (stable)
>>
>> What am I missing?
>>
>
> Obviously this is undesired!
> In general, when using ceph-disk (as you presumably are) the OSD is
> designed to turn on automatically when a formatted disk gets mounted. I'd
> imagine that something (quite possibly included with ceph) is auto-mounting
> the disk after you umount this. We have a ceph-disk@.service which is
> supposed to get fired once, but perhaps there's something else I'm missing
> so that udev fires an event, it gets captured by one of the ceph tools that
> sees there's an available drive tagged for Ceph, and then it auto-mounts?
> I'm not sure why this would be happening for you and not others, though.
>
​I'm guessing it's because I'm replacing a batch of disks at once. The time
between stopping ceph-osd@ and seeing it start again is at least several
seconds, so if you just do one disk and remove it right away you probably
wouldn't have this problem. But since I do several at a time and watch
"ceph -w" after each one, it can be several minutes before I get to the
point of removing the volumes from the array controller.

>
> All this changes with ceph-volume, which will be the default in Mimic, by
> the way.
>
> Hmm, just poking
> ​​
> at things a little more, I think maybe you wanted to put a "ceph-disk
> deactivate" invocation in there. Try playing around with that?
>
​Ahh, good catch. I will test this. Thank you!

-Greg
>
>
>
>>
>> Thanks,
>> Michael
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwMFaQ=IGDlg0lD0b-nebmJJ0Kp8A=qU6xgLFc0snK-gHqOeYRpxxSm8JB8HqWbxrxGdpZx2E=sAWnMQ02cvRKmc43pHH7v9N24R9-z5aJh2CBcNg0TH0=ABqD5_qw0NG3jokSJgWNmAFO82cgFvAIQtbofvuiiUQ=>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-osd@ service keeps restarting after removing osd

2018-05-24 Thread Michael Burk
Hello,

I'm trying to replace my OSDs with higher capacity drives. I went through
the steps to remove the OSD on the OSD node:
# ceph osd out osd.2
# ceph osd down osd.2
# ceph osd rm osd.2
Error EBUSY: osd.2 is still up; must be down before removal.
# systemctl stop ceph-osd@2
# ceph osd rm osd.2
removed osd.2
# ceph osd crush rm osd.2
removed item id 2 name 'osd.2' from crush map
# ceph auth del osd.2
updated

umount /var/lib/ceph/osd/ceph-2

It no longer shows in the crush map, and I am ready to remove the drive.
However, the ceph-osd@ service keeps restarting and mounting the disk in
/var/lib/ceph/osd. I do "systemctl stop ceph-osd@2" and umount the disk,
but then the service starts again and mounts the drive.

# systemctl stop ceph-osd@2
# umount /var/lib/ceph/osd/ceph-2

/dev/sdb1 on /var/lib/ceph/osd/ceph-2 type xfs
(rw,noatime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota)

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous
(stable)

What am I missing?

Thanks,
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] a big cluster or several small

2018-05-14 Thread Michael Kuriger
The more servers you have in your cluster, the less impact a failure causes to 
the cluster. Monitor your systems and keep them up to date.  You can also 
isolate data with clever crush rules and creating multiple zones.

Mike Kuriger


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marc 
Boisis
Sent: Monday, May 14, 2018 9:50 AM
To: ceph-users
Subject: [ceph-users] a big cluster or several small


Hi,

Hello,
Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients only, 1 
single pool (size=3).

We want to divide this cluster into several to minimize the risk in case of 
failure/crash.
For example, a cluster for the mail, another for the file servers, a test 
cluster ...
Do you think it's a good idea ?

Do you have experience feedback on multiple clusters in production on the same 
hardware:
- containers (LXD or Docker)
- multiple cluster on the same host without virtualization (with ceph-deploy 
... --cluster ...)
- multilple pools
...


Do you have any advice?





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-30 Thread Michael Sudnick
Mine repaired themselves after a regular deep scrub. Weird that I couldn't
trigger one manually.

On 30 April 2018 at 14:23, David Turner <drakonst...@gmail.com> wrote:

> My 3 inconsistent PGs finally decided to run automatic scrubs and now 2 of
> the 3 will allow me to run deep-scrubs and repairs on them.  The deep-scrub
> did not show any new information about the objects other than that they
> were missing in one of the copies.  Running a repair fixed the
> inconsistency.
>
> On Tue, Apr 24, 2018 at 4:53 PM David Turner <drakonst...@gmail.com>
> wrote:
>
>> Neither the issue I created nor Michael's [1] ticket that it was rolled
>> into are getting any traction.  How are y'all fairing with your clusters?
>> I've had 3 PGs inconsistent with 5 scrub errors for a few weeks now.  I
>> assumed that the third PG was just like the first 2 in that it couldn't be
>> scrubbed, but I just checked the last scrub timestamp of the 3 PGs and the
>> third one is able to run scrubs.  I'm going to increase the logging on it
>> after I finish a round of maintenance we're performing on some OSDs.
>> Hopefully I'll find something more about these objects.
>>
>>
>> [1] http://tracker.ceph.com/issues/23576
>>
>> On Fri, Apr 6, 2018 at 12:30 PM David Turner <drakonst...@gmail.com>
>> wrote:
>>
>>> I'm using filestore.  I think the root cause is something getting stuck
>>> in the code.  As such I went ahead and created a [1] bug tracker for this.
>>> Hopefully it gets some traction as I'm not particularly looking forward to
>>> messing with deleting PGs with the ceph-objectstore-tool in production.
>>>
>>> [1] http://tracker.ceph.com/issues/23577
>>>
>>> On Fri, Apr 6, 2018 at 11:40 AM Michael Sudnick <
>>> michael.sudn...@gmail.com> wrote:
>>>
>>>> I've tried a few more things to get a deep-scrub going on my PG. I
>>>> tried instructing the involved osds to scrub all their PGs and it looks
>>>> like that didn't do it.
>>>>
>>>> Do you have any documentation on the object-store-tool? What I've found
>>>> online talks about filestore and not bluestore.
>>>>
>>>> On 6 April 2018 at 09:27, David Turner <drakonst...@gmail.com> wrote:
>>>>
>>>>> I'm running into this exact same situation.  I'm running 12.2.2 and I
>>>>> have an EC PG with a scrub error.  It has the same output for [1] rados
>>>>> list-inconsistent-obj as mentioned before.  This is the [2] full health
>>>>> detail.  This is the [3] excerpt from the log from the deep-scrub that
>>>>> marked the PG inconsistent.  The scrub happened when the PG was starting 
>>>>> up
>>>>> after using ceph-objectstore-tool to split its filestore subfolders.  This
>>>>> is using a script that I've used for months without any side effects.
>>>>>
>>>>> I have tried quite a few things to get this PG to deep-scrub or
>>>>> repair, but to no avail.  It will not do anything.  I have set every osd's
>>>>> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
>>>>> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
>>>>> issuing a deep-scrub.  And it will sit there for over an hour without
>>>>> deep-scrubbing.  My current testing of this is to set all osds to 1,
>>>>> increase all of the osds for this PG to 4, and then issue the repair... 
>>>>> but
>>>>> similarly nothing happens.  Each time I issue the deep-scrub or repair, 
>>>>> the
>>>>> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
>>>>> nothing shows up in the log for the OSD and the PG state stays
>>>>> 'active+clean+inconsistent'.
>>>>>
>>>>> My next step, unless anyone has a better idea, is to find the exact
>>>>> copy of the PG with the missing object, use object-store-tool to back up
>>>>> that copy of the PG and remove it.  Then starting the OSD back up should
>>>>> backfill the full copy of the PG and be healthy again.
>>>>>
>>>>>
>>>>>
>>>>> [1] $ rados list-inconsistent-obj 145.2e3
>>>>> No scrub information available for pg 145.2e3
>>>>> error 2: (2) No such file or directory
>>>>>
>>>>> [2] $ ceph health detail
>>>>> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
>>&

Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-06 Thread Michael Sudnick
I've tried a few more things to get a deep-scrub going on my PG. I tried
instructing the involved osds to scrub all their PGs and it looks like that
didn't do it.

Do you have any documentation on the object-store-tool? What I've found
online talks about filestore and not bluestore.

On 6 April 2018 at 09:27, David Turner <drakonst...@gmail.com> wrote:

> I'm running into this exact same situation.  I'm running 12.2.2 and I have
> an EC PG with a scrub error.  It has the same output for [1] rados
> list-inconsistent-obj as mentioned before.  This is the [2] full health
> detail.  This is the [3] excerpt from the log from the deep-scrub that
> marked the PG inconsistent.  The scrub happened when the PG was starting up
> after using ceph-objectstore-tool to split its filestore subfolders.  This
> is using a script that I've used for months without any side effects.
>
> I have tried quite a few things to get this PG to deep-scrub or repair,
> but to no avail.  It will not do anything.  I have set every osd's
> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
> issuing a deep-scrub.  And it will sit there for over an hour without
> deep-scrubbing.  My current testing of this is to set all osds to 1,
> increase all of the osds for this PG to 4, and then issue the repair... but
> similarly nothing happens.  Each time I issue the deep-scrub or repair, the
> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
> nothing shows up in the log for the OSD and the PG state stays
> 'active+clean+inconsistent'.
>
> My next step, unless anyone has a better idea, is to find the exact copy
> of the PG with the missing object, use object-store-tool to back up that
> copy of the PG and remove it.  Then starting the OSD back up should
> backfill the full copy of the PG and be healthy again.
>
>
>
> [1] $ rados list-inconsistent-obj 145.2e3
> No scrub information available for pg 145.2e3
> error 2: (2) No such file or directory
>
> [2] $ ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 145.2e3 is active+clean+inconsistent, acting
> [234,132,33,331,278,217,55,358,79,3,24]
>
> [3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
> [DBG] : 145.2e3 deep-scrub starts
> 2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
> 2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3 deep-scrub 1 errors
>
> On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick <michael.sudn...@gmail.com>
> wrote:
>
>> Hi Kjetil,
>>
>> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
>> happening. I've tried it a few times over the last few days. My cluster is
>> recovering from a failed disk (which was probably the reason for the
>> inconsistency), do I need to wait for the cluster to heal before
>> repair/deep scrub works?
>>
>> -Michael
>>
>> On 2 April 2018 at 14:13, Kjetil Joergensen <kje...@medallia.com> wrote:
>>
>>> Hi,
>>>
>>> scrub or deep-scrub the pg, that should in theory get you back to
>>> list-inconsistent-obj spitting out what's wrong, then mail that info to the
>>> list.
>>>
>>> -KJ
>>>
>>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
>>> michael.sudn...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a small cluster with an inconsistent pg. I've tried ceph pg
>>>> repair multiple times to no luck. rados list-inconsistent-obj 49.11c
>>>> returns:
>>>>
>>>> # rados list-inconsistent-obj 49.11c
>>>> No scrub information available for pg 49.11c
>>>> error 2: (2) No such file or directory
>>>>
>>>> I'm a bit at a loss here as what to do to recover. That pg is part of a
>>>> cephfs_data pool with compression set to force/snappy.
>>>>
>>>> Does anyone have an suggestions?
>>>>
>>>> -Michael
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Kjetil Joergensen <kje...@medallia.com>
>>> SRE, Medallia Inc
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-02 Thread Michael Sudnick
Hi Kjetil,

I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
happening. I've tried it a few times over the last few days. My cluster is
recovering from a failed disk (which was probably the reason for the
inconsistency), do I need to wait for the cluster to heal before
repair/deep scrub works?

-Michael

On 2 April 2018 at 14:13, Kjetil Joergensen <kje...@medallia.com> wrote:

> Hi,
>
> scrub or deep-scrub the pg, that should in theory get you back to
> list-inconsistent-obj spitting out what's wrong, then mail that info to the
> list.
>
> -KJ
>
> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <michael.sudn...@gmail.com
> > wrote:
>
>> Hello,
>>
>> I have a small cluster with an inconsistent pg. I've tried ceph pg repair
>> multiple times to no luck. rados list-inconsistent-obj 49.11c returns:
>>
>> # rados list-inconsistent-obj 49.11c
>> No scrub information available for pg 49.11c
>> error 2: (2) No such file or directory
>>
>> I'm a bit at a loss here as what to do to recover. That pg is part of a
>> cephfs_data pool with compression set to force/snappy.
>>
>> Does anyone have an suggestions?
>>
>> -Michael
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Kjetil Joergensen <kje...@medallia.com>
> SRE, Medallia Inc
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Have an inconsistent PG, repair not working

2018-04-01 Thread Michael Sudnick
Hello,

I have a small cluster with an inconsistent pg. I've tried ceph pg repair
multiple times to no luck. rados list-inconsistent-obj 49.11c returns:

# rados list-inconsistent-obj 49.11c
No scrub information available for pg 49.11c
error 2: (2) No such file or directory

I'm a bit at a loss here as what to do to recover. That pg is part of a
cephfs_data pool with compression set to force/snappy.

Does anyone have an suggestions?

-Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-14 Thread Michael Christie
On 03/14/2018 01:27 PM, Michael Christie wrote:
> On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
>> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdill...@redhat.com
>> <mailto:jdill...@redhat.com>> wrote:
>>
>> Maxim, can you provide steps for a reproducer?
>>
>>
>> Yes, but it involves adding two artificial delays: one in tcmu-runner
>> and another in kernel iscsi. If you're willing to take pains of
> 
> Send the patches for the changes.
> 
>> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to
>> reproduce.
>>
>> Generally, the idea of reproducer is simple: let's model a situation
>> when two stale requests got stuck in kernel mailbox waiting to be
>> consumed by tcmu-runner, and another one got stuck in iscsi layer --
>> immediately after reading iscsi request from the socket. If we unblock
>> tcmu-runner after newer data went through another gateway, the first
>> stale request will switch tcmu-runner state from LOCKED to UNLOCKED>> state, 
>> then the second stale request will trigger alua_thread to
>> re-acquire the lock, so when the third request comes to tcmu-runner, the
Where you send the patches that add your delays could you send the
target side /var/log/tcmu-runner.log with log_level = 4.

For this test above you should see the second request will be sent to
rbd's tcmu_rbd_aio_write function. That command should fail in
rbd_finish_aio_generic and tcmu_rbd_handle_blacklisted_cmd will be
called. We should then be blocking until IO in that iscsi connection is
flushed in tgt_port_grp_recovery_thread_fn. That function will not
return from the enable=0 until the iscsi connection is stopped and the
commands in it have completed.

Other commands you had in flight should eventually hit
tcmur_cmd_handler's tcmu_dev_in_recovery check and be failed there or if
they had already passed that check then the cmd would be sent to
tcmu_rbd_aio_write and they should be getting the blacklisted error like
above.


>> lock is already reacquired and it goes to OSD smoothly overwriting newer
>> data.
>>
>>  
>>
>>
>> On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov
>> <mpatla...@skytap.com <mailto:mpatla...@skytap.com>> wrote:
>> > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie
>> <mchri...@redhat.com <mailto:mchri...@redhat.com>> wrote:
>> >>
>> >> On 03/11/2018 08:54 AM, shadow_lin wrote:
>> >> > Hi Jason,
>> >> > How the old target gateway is blacklisted? Is it a feature of
>> the target
>> >> > gateway(which can support active/passive multipath) should
>> provide or is
>> >> > it only by rbd excusive lock?
>> >> > I think excusive lock only let one client can write to rbd at
>> the same
>> >> > time,but another client can obtain the lock later when the lock is
>> >> > released.
>> >>
>> >> For the case where we had the lock and it got taken:
>> >>
>> >> If IO was blocked, then unjammed and it has already passed the target
>> >> level checks then the IO will be failed by the OSD due to the
>> >> blacklisting. When we get IO errors from ceph indicating we are
>> >> blacklisted the tcmu rbd layer will fail the IO indicating the state
>> >> change and that the IO can be retried. We will also tell the target
>> >> layer rbd does not have the lock anymore and to just stop the iscsi
>> >> connection while we clean up the blacklisting, running commands and
>> >> update our state.
>> >
>> >
>> > Mike, can you please give more details on how you tell the target
>> layer rbd
>> > does not have the lock and to stop iscsi connection. Which
>> > tcmu-runner/kernel-target functions are used for that?
>> >
>> > In fact, I performed an experiment with three stale write requests
>> stuck on
>> > blacklisted gateway, and one of them managed to overwrite newer
>> data. I
>> > followed all instructions from
>> >
>> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ 
>> <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/>
>> and
>> > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/
>> <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/>, so I'm
>> interested
>> > what I'm missing...
>> >
>> > Thanks,
>> > Maxim
>> >
>> > Thanks,
>> > Maxim
>> >
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Jason
>>
>>
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-14 Thread Michael Christie
On 03/14/2018 01:26 PM, Michael Christie wrote:
> On 03/14/2018 01:06 PM, Maxim Patlasov wrote:
>> On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie <mchri...@redhat.com
>> <mailto:mchri...@redhat.com>> wrote:
>>
>> On 03/11/2018 08:54 AM, shadow_lin wrote:
>> > Hi Jason,
>> > How the old target gateway is blacklisted? Is it a feature of the 
>> target
>> > gateway(which can support active/passive multipath) should provide or 
>> is
>> > it only by rbd excusive lock?
>> > I think excusive lock only let one client can write to rbd at the same
>> > time,but another client can obtain the lock later when the lock is 
>> released.
>>
>> For the case where we had the lock and it got taken:
>>
>> If IO was blocked, then unjammed and it has already passed the target
>> level checks then the IO will be failed by the OSD due to the
>> blacklisting. When we get IO errors from ceph indicating we are
>> blacklisted the tcmu rbd layer will fail the IO indicating the state
>> change and that the IO can be retried. We will also tell the target
>> layer rbd does not have the lock anymore and to just stop the iscsi
>> connection while we clean up the blacklisting, running commands and
>> update our state.
>>
>>
>> Mike, can you please give more details on how you tell the target layer
>> rbd does not have the lock and to stop iscsi connection. Which
>> tcmu-runner/kernel-target functions are used for that?
> 
> For this case it would be tcmu_rbd_handle_blacklisted_cmd. Note for
> failback type of test, we might not hit that error if the initiator does
> a RTPG before it sends IO. In that case we would see
> tcmu_update_dev_lock_state get run first and the iscsi connection would
> not be dropped.
> 
>>
>> In fact, I performed an experiment with three stale write requests stuck
>> on blacklisted gateway, and one of them managed to overwrite newer data.
> 
> What is the test exactly? What OS for the initiator?
> 
> What kernel were you using and are you using the upstream tools/libs or
> the RHCS ones?
> 
> Can you run your tests and send the initiator side kernel logs and on
> the iscsi targets send the /var/log/tcmu-runner.log with debugging in
> enabled. To do that open
> 
> /etc/tcmu/tcmu.conf
> 
> on the iscsi target nodes and set
> 
> log_level = 5
> 
> If that is too much output drop it to level 4.
> 
> 
>> I followed all instructions from
>> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/
>> and http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, so I'm
>> interested what I'm missing...
> 
> You used the initiator settings in
> http://docs.ceph.com/docs/master/rbd/iscsi-initiators/
> too right?
> 

Ignore all these questions.  I'm pretty sure I know the issue.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-14 Thread Michael Christie
On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman  > wrote:
> 
> Maxim, can you provide steps for a reproducer?
> 
> 
> Yes, but it involves adding two artificial delays: one in tcmu-runner
> and another in kernel iscsi. If you're willing to take pains of

Send the patches for the changes.

> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to
> reproduce.
> 
> Generally, the idea of reproducer is simple: let's model a situation
> when two stale requests got stuck in kernel mailbox waiting to be
> consumed by tcmu-runner, and another one got stuck in iscsi layer --
> immediately after reading iscsi request from the socket. If we unblock
> tcmu-runner after newer data went through another gateway, the first
> stale request will switch tcmu-runner state from LOCKED to UNLOCKED
> state, then the second stale request will trigger alua_thread to
> re-acquire the lock, so when the third request comes to tcmu-runner, the
> lock is already reacquired and it goes to OSD smoothly overwriting newer
> data.
> 
>  
> 
> 
> On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov
> > wrote:
> > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie
> > wrote:
> >>
> >> On 03/11/2018 08:54 AM, shadow_lin wrote:
> >> > Hi Jason,
> >> > How the old target gateway is blacklisted? Is it a feature of
> the target
> >> > gateway(which can support active/passive multipath) should
> provide or is
> >> > it only by rbd excusive lock?
> >> > I think excusive lock only let one client can write to rbd at
> the same
> >> > time,but another client can obtain the lock later when the lock is
> >> > released.
> >>
> >> For the case where we had the lock and it got taken:
> >>
> >> If IO was blocked, then unjammed and it has already passed the target
> >> level checks then the IO will be failed by the OSD due to the
> >> blacklisting. When we get IO errors from ceph indicating we are
> >> blacklisted the tcmu rbd layer will fail the IO indicating the state
> >> change and that the IO can be retried. We will also tell the target
> >> layer rbd does not have the lock anymore and to just stop the iscsi
> >> connection while we clean up the blacklisting, running commands and
> >> update our state.
> >
> >
> > Mike, can you please give more details on how you tell the target
> layer rbd
> > does not have the lock and to stop iscsi connection. Which
> > tcmu-runner/kernel-target functions are used for that?
> >
> > In fact, I performed an experiment with three stale write requests
> stuck on
> > blacklisted gateway, and one of them managed to overwrite newer
> data. I
> > followed all instructions from
> >
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ 
> 
> and
> > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/
> , so I'm
> interested
> > what I'm missing...
> >
> > Thanks,
> > Maxim
> >
> > Thanks,
> > Maxim
> >
> >>
> >>
> >
> 
> 
> 
> --
> Jason
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-14 Thread Michael Christie
On 03/14/2018 01:06 PM, Maxim Patlasov wrote:
> On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie  > wrote:
> 
> On 03/11/2018 08:54 AM, shadow_lin wrote:
> > Hi Jason,
> > How the old target gateway is blacklisted? Is it a feature of the target
> > gateway(which can support active/passive multipath) should provide or is
> > it only by rbd excusive lock?
> > I think excusive lock only let one client can write to rbd at the same
> > time,but another client can obtain the lock later when the lock is 
> released.
> 
> For the case where we had the lock and it got taken:
> 
> If IO was blocked, then unjammed and it has already passed the target
> level checks then the IO will be failed by the OSD due to the
> blacklisting. When we get IO errors from ceph indicating we are
> blacklisted the tcmu rbd layer will fail the IO indicating the state
> change and that the IO can be retried. We will also tell the target
> layer rbd does not have the lock anymore and to just stop the iscsi
> connection while we clean up the blacklisting, running commands and
> update our state.
> 
> 
> Mike, can you please give more details on how you tell the target layer
> rbd does not have the lock and to stop iscsi connection. Which
> tcmu-runner/kernel-target functions are used for that?

For this case it would be tcmu_rbd_handle_blacklisted_cmd. Note for
failback type of test, we might not hit that error if the initiator does
a RTPG before it sends IO. In that case we would see
tcmu_update_dev_lock_state get run first and the iscsi connection would
not be dropped.

> 
> In fact, I performed an experiment with three stale write requests stuck
> on blacklisted gateway, and one of them managed to overwrite newer data.

What is the test exactly? What OS for the initiator?

What kernel were you using and are you using the upstream tools/libs or
the RHCS ones?

Can you run your tests and send the initiator side kernel logs and on
the iscsi targets send the /var/log/tcmu-runner.log with debugging in
enabled. To do that open

/etc/tcmu/tcmu.conf

on the iscsi target nodes and set

log_level = 5

If that is too much output drop it to level 4.


> I followed all instructions from
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/
> and http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, so I'm
> interested what I'm missing...

You used the initiator settings in
http://docs.ceph.com/docs/master/rbd/iscsi-initiators/
too right?

> 
> Thanks,
> Maxim
> 
> Thanks,
> Maxim
>  
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI over RBD

2018-01-04 Thread Michael Christie
On 01/04/2018 03:50 AM, Joshua Chen wrote:
> Dear all,
>   Although I managed to run gwcli and created some iqns, or luns,
> but I do need some working config example so that my initiator could
> connect and get the lun.
> 
>   I am familiar with targetcli and I used to do the following ACL style
> connection rather than password, 
> the targetcli setting tree is here:

What docs have you been using? Did you check out the gwcli man page and
upstream ceph doc:

http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/

Let me know what is not clear in there.

There is a bug in the upstream doc and instead of doing
> cd /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:/disks/

you do

> cd /disks

in step 3. Is that the issue you are hitting?


For gwcli, a client is the initiator. It only supports one way chap, so
there is just the 3 commands in those docs above.

1. create client/initiator-name. This is the same as creating the ACL in
targetcli.

> create  iqn.1994-05.com.redhat:15dbed23be9e

2. set CHAP username and password for that initiator. You have to do
this with gwcli right now due to a bug, or maybe feature :), in the
code. This is simiar to doing the set auth command in targetcli.

auth chap=/

3. export a image as a lun. This is equivalent to creating the lun in
targetcli.

disk add rbd.some-image


> 
> (or see this page )
> 
> #targetcli ls
> o- /
> .
> [...]
>   o- backstores
> ..
> [...]
>   | o- block
> ..
> [Storage Objects: 1]
>   | | o- vmware_5t
> ..
> [/dev/rbd/rbd/vmware_5t (5.0TiB) write-thru activated]
>   | |   o- alua
> ...
> [ALUA Groups: 1]
>   | | o- default_tg_pt_gp
> ...
> [ALUA state: Active/optimized]
>   | o- fileio
> .
> [Storage Objects: 0]
>   | o- pscsi
> ..
> [Storage Objects: 0]
>   | o- ramdisk
> 
> [Storage Objects: 0]
>   | o- user:rbd
> ...
> [Storage Objects: 0]
>   o- iscsi
> 
> [Targets: 1]
>   | o- iqn.2017-12.asiaa.cephosd1:vmware5t
> ...
> [TPGs: 1]
>   |   o- tpg1
> ..
> [gen-acls, no-auth]
>   | o- acls
> .
> [ACLs: 12]
>   | | o- iqn.1994-05.com.redhat:15dbed23be9e
> ..
> [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> .
> [lun0 block/vmware_5t (rw)]
>   | | o- iqn.1994-05.com.redhat:15dbed23be9e-ovirt1
> ... [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> .
> [lun0 block/vmware_5t (rw)]
>   | | o- iqn.1994-05.com.redhat:2af344ba6ae5-ceph-admin-test
> .. [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> .
> [lun0 block/vmware_5t (rw)]
>   | | o- iqn.1994-05.com.redhat:67669afedddf
> ..
> [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> .
> [lun0 block/vmware_5t (rw)]
>   | | o- iqn.1994-05.com.redhat:67669afedddf-ovirt3
> ... [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> .
> [lun0 block/vmware_5t (rw)]
>   | | o- iqn.1994-05.com.redhat:a7c1ec3c43f7
> ..
> [Mapped LUNs: 1]
>   | | | o- mapped_lun0
> 

Re: [ceph-users] rbd and cephfs (data) in one pool?

2017-12-27 Thread Michael Kuriger
Making the filesystem might blow away all the rbd images though.

Mike Kuriger
Sr. Unix Systems Engineer
T: 818-649-7235 M: 818-434-6195
[ttp://www.hotyellow.com/deximages/dex-thryv-logo.jpg]

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David 
Turner
Sent: Wednesday, December 27, 2017 1:44 PM
To: Chad William Seys
Cc: ceph-users
Subject: Re: [ceph-users] rbd and cephfs (data) in one pool?


Afaik, I don't think anything will stop you from doing it, but it is not a 
document or supported use-case.

On Wed, Dec 27, 2017, 3:52 PM Chad William Seys 
> wrote:
Hello,
   Is it possible to place rbd and cephfs data in the same pool?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replaced a disk, first time. Quick question

2017-12-04 Thread Michael Kuriger
I've seen that before (over 100%) but I forget the cause.  At any rate, the way 
I replace disks is to first set the osd weight to 0, wait for data to 
rebalance, then down / out the osd.  I don't think ceph does any reads from a 
disk once you've marked it out so hopefully there are other copies.

Mike Kuriger
Sr. Unix Systems Engineer
T: 818-649-7235 M: 818-434-6195
[ttp://www.hotyellow.com/deximages/dex-thryv-logo.jpg]

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Drew 
Weaver
Sent: Monday, December 04, 2017 8:39 AM
To: 'ceph-us...@ceph.com'
Subject: [ceph-users] Replaced a disk, first time. Quick question

Howdy,

I replaced a disk today because it was marked as Predicted failure. These were 
the steps I took

ceph osd out osd17
ceph -w #waited for it to get done
systemctl stop ceph-osd@osd17
ceph osd purge osd17 --yes-i-really-mean-it
umount /var/lib/ceph/osd/ceph-osdX

I noticed that after I ran the 'osd out' command that it started moving data 
around.

19446/16764 objects degraded (115.999%) <-- I noticed that number seems odd

So then I replaced the disk
Created a new label on it
Ceph-deploy osd prepare OSD5:sdd

THIS time, it started rebuilding

40795/16764 objects degraded (243.349%) <-- Now I'm really concerned.

Perhaps I don't quite understand what the numbers are telling me but is it 
normal for it to rebuilding more objects than exist?

Thanks,
-Drew


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Michael

Oscar Segarra wrote:

I'd like to hear your opinion about theese two configurations:

1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1 
OSD daemon

2.- 8 OSD daemons
You mean 1 OSD daemon on top of RAID5? I don't think I'd do that. You'll 
probably want redundancy at Ceph's level anyhow, and then where is the 
point...?
I'm a little bit worried that 8 osd daemons can affect performance 
because all jobs running and scrubbing.
If you ran RAID instead of Ceph, RAID might still perform better. But I 
don't believe anything much changes for the better if you run the Ceph 
on top of RAID rather than on top of individual OSD, unless your 
configuration is bad. I generally don't think you have to worry that 
much that a reasonably modern machine can't handle running a few extra 
jobs, either.


But you could certainly do some tests on your hardware to be sure.

Another question is the procedure of a replacement of a failed disk. 
In case of a big RAID, replacement is direct. In case of many OSDs, 
the procedure is a little bit tricky.


http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/ 

I wasn't using Ceph in 2014, but at least in my limited experience, 
today the most important step is done when you add the new drive and 
activate an OSD on it.


You probably still want to remove the leftovers of the old failed OSD 
for it to not clutter your list, but as far as I can tell replication 
and so on will trigger *before* you remove it. (There is a configurable 
timeout for how long an OSD can be down, after which the OSD is 
essentially treated as dead already, at which point replication and 
rebalancing starts).



-Michael


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rocksdb: Corruption: missing start of fragmented record

2017-11-13 Thread Michael

Konstantin Shalygin wrote:
> I think Christian talks about version 12.2.2, not 12.2.*

Which isn't released yet, yes. I could try building the development 
repository if you think that has a chance of resolving the issue?


Although I'd still like to know how I could theoretically get my hands 
at these rocksdb files manually, if anyone knows how to do that? I still 
have no idea how.


I also reported this as a bug last week, in case anyone has information 
or the same issue:

http://tracker.ceph.com/issues/22044
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-04 Thread Michael

Jon Light wrote:
I followed the instructions in the Github repo for cloning and setting 
up the build environment, checked out the 12.2.0 tag, modified OSD.cc 
with the fix, and then tried to build with dpkg-buildpackage. I got 
the following error:
"ceph/src/kv/RocksDBStore.cc:593:22: error: ‘perf_context’ is not a 
member of ‘rocksdb’"

I guess some changes have been made to RocksDB since 12.2.0?

Am I going about this the right way? Should I just simply recompile 
the OSD binary with the fix and then copy it to the nodes in my 
cluster? What's the best way to get this fix applied to my current 
installation?


Thanks
It's probably only an indirect help because you might just have the 
issue that you aren't using 12.2.1, but the way in which I got the patch 
applied on Ubuntu's bionic 12.2.1 is this: apt-get source ceph, wget 
patch file from github, cd to the ceph sources, quilt import patchfile>, quilt push, pdebuild and then install the ceph-osd .deb .
At least roughly - you still may have to perform related configuration 
tasks like enabling deb-src entries in the apt sources file, setting up 
the standard pbuilderrc for bionic, and such.


As you might see on the bug tracker, the patch did apparently avoid the 
immediate error for me, but Ceph then ran into another error.


- Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rocksdb: Corruption: missing start of fragmented record

2017-11-01 Thread Michael

Christian Balzer wrote:

Your exact system configuration (HW, drives, controller, settings, etc)
would be interesting as I can think of plenty scenarios on how to corrupt
things that normally shouldn't be affected by such actions
Oh, the hardware in question is consumer grade and not new. Some old i7 
machine. But my current guess is that the specific hardware is 
semi-unrelated.


I think there probably is just a WAL log or entry in that that wasn't 
finished writing to disk or corrupted. But WAL aren't something that 
should be fully assumed to be complete and correct in the case of some 
failure, right? They are WAL.


If this can't be fixed automatically with some command, I would simply 
like to have a look at & tinker trivially the with these DB files if 
possible (which I still haven't figured a way how to do).



Christian Balzer wrote:

Now that bit is quite disconcerting, though you're one release behind the
curve and from what I read .2 has plenty more bug fixes coming.

Fair point. I just tried with 12.2.1 (on pre-release Ubuntu bionic now).

Doesn't change anything - fsck doesn't fix rocksdb, the bluestore won't 
mount, the OSD won't activate and the error is the same.


Is there any fix in .2 that might address this, or do you just mean that 
in general there will be bug fixes?



Thanks for your response!

- Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rocksdb: Corruption: missing start of fragmented record

2017-11-01 Thread Michael

Hello everyone,

I've conducted some crash tests (unplugging drives, the machine, 
terminating and restarting ceph systemd services) with Ceph 12.2.0 on 
Ubuntu and quite easily managed to corrupt what appears to be rocksdb's 
log replay on a bluestore OSD:


# ceph-bluestore-tool fsck  --path /var/lib/ceph/osd/ceph-2/
[...]
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2859] 
Recovered from manifest file:db/MANIFEST-000975 
succeeded,manifest_file_number is 975, next_file_number is 1008, 
last_sequence is 51965907, log_number is 0,prev_log_number is 
0,max_column_family is 0
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/version_set.cc:2867] 
Column family [default] (ID 0), log number is 1005
4 rocksdb: EVENT_LOG_v1 {"time_micros": 1509298585082794, "job": 1, 
"event": "recovery_started", "log_files": [1003, 1005]}
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482] 
Recovering log #1003 mode 0
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:482] 
Recovering log #1005 mode 0
3 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl_open.cc:424] 
db/001005.log: dropping 3225 bytes; Corruption: missing start of 
fragmented record(2)
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:217] Shutdown: 
canceling all background work
4 rocksdb: 
[/build/ceph-pKGC1D/ceph-12.2.0/src/rocksdb/db/db_impl.cc:343] Shutdown 
complete

-1 rocksdb: Corruption: missing start of fragmented record(2)
-1 bluestore(/var/lib/ceph/osd/ceph-2/) _open_db erroring opening db:
1 bluefs umount
1 bdev(0x557f5b6a4240 /var/lib/ceph/osd/ceph-2//block) close

If I understand this right, rocksdb is  just trying to replay WAL type 
logs, of which presumably "001005.log" is corrupted. It then throws an 
error that stops everything.


I did try to mount the bluestore, as I was assuming that would probably 
where I'd find the rocksdb's files somewhere, but that also doesn't seem 
possible:


#ceph-objectstore-tool --op fsck --data-path /var/lib/ceph/osd/ceph-2/ 
--mountpoint /mnt/bluestore-repair/

fsck failed: (5) Input/output error
# ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-2 
--mountpoint /mnt/bluestore-repair/

Mount failed with '(5) Input/output error'
# ceph-objectstore-tool --op fuse --force --skip-journal-replay 
--data-path /var/lib/ceph/osd/ceph-2 --mountpoint /mnt/bluestore-repair/

Mount failed with '(5) Input/output error'

Adding --debug shows the ultimate culprit is just the above rocksdb 
error again.


Q: Is there some way in which I can tell rockdb to truncate or delete / 
skip the respective log entries? Or can I get access to rocksdb('s 
files) in some other way to just manipulate it or delete corrupted WAL 
files manually?


-Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Luminous]How to choose the proper ec profile?

2017-10-30 Thread Michael

On 10/30/2017 02:33 PM, shadow_lin wrote:

Hi all,
I am wondering how to choose the proper ec profile for new luminous ec 
rbd image.

If I set the k too high what the draw back would be?
Is it a good idea to set k=10 m=2? It sounds attempting the overhead 
of storage capacity is low and the redundancy is good.


Hi,

That is a consideration between how important your data is, how quickly 
you think you will notice [even if Ceph somehow doesn't or can't deal 
with it because of imperfect configuration or such] and how quickly you 
ultimately are done rebuilding redundancy.


I'd suggest to imagine a bad scenario for how long it will take and just 
consider how nervous it'd make you to be down to x OSDs/hosts of 
redundancy for that duration.


Apart from that, Ceph isn't a super stable super easy to comprehend type 
of software yet (no offense to anyone intended - you're overall doing a 
good job!). As far as I can guess, there could be the occasional moment 
when one extra copy allows you to use a simpler, more reliable "tabula 
rasa" approach to host / OSD problem fixing, while staying safe enough.



shadow_lin wrote:
What is the difference for storage safety(redundancy ) between k=10 
m=2 and k=4 m=2?


3 OSDs failing before you / Ceph restores them is a less likely event in 
a pool of 6 than one of 12. If you had k=10 million m=2 with 12TB HDD, 
it'd probably take just some seconds to minutes until you have data loss.


shadow_lin wrote:
What would be a good ec profile for archive purpose(decent write 
perfomance and just ok read performace)?


I don't actually know that - but the default is not bad if you ask me 
(not that it features writes faster than reads). Plus it lets you pick m.



- Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD daemons active in nodes after removal

2017-10-25 Thread Michael Kuriger
When I do this, I reweight all of the OSDs I want to remove to 0 first, wait 
for the rebalance, then proceed to remove the OSDs.  Doing it your way, you 
have to wait for the rebalance after removing each OSD one by one.

Mike Kuriger
Sr. Unix Systems Engineer
818-434-6195
[cid:image001.jpg@01D34D7C.72B6C360]

From: ceph-users  on behalf of Karun Josy 

Date: Wednesday, October 25, 2017 at 10:15 AM
To: "ceph-users@lists.ceph.com" 
Subject: [ceph-users] OSD daemons active in nodes after removal

Hello everyone! :)

I have an interesting problem. For a few weeks, we've been testing Luminous in 
a cluster made up of 8 servers and with about 20 SSD disks almost evenly 
distributed. It is running erasure coding.

Yesterday, we decided to bring the cluster to a minimum of 8 servers and 1 disk 
per server.

So, we went ahead and removed the additional disks from the ceph cluster, by 
executing commands like this from the admin server:

---
$ ceph osd out osd.20
osd.20 is already out.
$ ceph osd down osd.20
marked down osd.20.
$ ceph osd purge osd.20 --yes-i-really-mean-it
Error EBUSY: osd.20 is not `down`.

So I logged in  to the host it resides on and killed it systemctl stop 
ceph-osd@26
$ ceph osd purge osd.20 --yes-i-really-mean-it
purged osd.20


We waited for the cluster to be healthy once again and I physically removed the 
disks (hot swap, connected to an LSI 3008 controller). A few minutes after 
that, I needed to turn off one of the OSD servers to swap out a piece of 
hardware inside. So, I issued:

ceph osd set noout

And proceeded to turn off that 1 OSD server.

But the interesting thing happened then. Once that 1 server came back up, the 
cluster all of a sudden showed that out of the 8 nodes, only 2 were up!

8 (2 up, 5 in)

Even more interesting is that it seems Ceph, in each OSD server, still thinks 
the missing disks are there!

When I start ceph on each OSD server with "systemctl start ceph-osd.target", 
/var/logs/ceph gets filled with logs for disks that are not supposed to exist 
anymore.

The contents of the logs show something like:

# cat /var/log/ceph/ceph-osd.7.log
2017-10-20 08:45:16.389432 7f8ee6e36d00  0 set uid:gid to 167:167 (ceph:ceph)
2017-10-20 08:45:16.389449 7f8ee6e36d00  0 ceph version 12.2.1 
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process 
(unknown), pid 2591
2017-10-20 08:45:16.389639 7f8ee6e36d00 -1  ** ERROR: unable to open OSD 
superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory
2017-10-20 08:45:36.639439 7fb389277d00  0 set uid:gid to 167:167 (ceph:ceph)

The actual Ceph cluster sees only 8 disks, as you can see here:

$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   7.97388 root default
 -3   1.86469 host ceph-las1-a1-osd
  1   ssd 1.86469 osd.1   down0 1.0
 -5   0.87320 host ceph-las1-a2-osd
  2   ssd 0.87320 osd.2   down0 1.0
 -7   0.87320 host ceph-las1-a3-osd
  4   ssd 0.87320 osd.4   down  1.0 1.0
 -9   0.87320 host ceph-las1-a4-osd
  8   ssd 0.87320 osd.8 up  1.0 1.0
-11   0.87320 host ceph-las1-a5-osd
 12   ssd 0.87320 osd.12  down  1.0 1.0
-13   0.87320 host ceph-las1-a6-osd
 17   ssd 0.87320 osd.17up  1.0 1.0
-15   0.87320 host ceph-las1-a7-osd
 21   ssd 0.87320 osd.21  down  1.0 1.0
-17   0.87000 host ceph-las1-a8-osd
 28   ssd 0.87000 osd.28  down0 1.0


Linux, in the OSD servers, seems to also think the disks are in:

# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sde2   976M  183M  727M  21% /boot
/dev/sdd197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-7
/dev/sdc197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-6
/dev/sda197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-4
/dev/sdb197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-5
tmpfs   6.3G 0  6.3G   0% /run/user/0

It should show only one disk, not 4.

I tried to issue again the commands to remove the disks, this time, in the OSD 
server itself:

$ ceph osd out osd.X
osd.X does not exist.

$ ceph osd purge osd.X --yes-i-really-mean-it
osd.X does not exist

Yet, if I again issue "systemctl start ceph-osd.target", /var/log/ceph again 
shows logs for a disk that does not exist (to make sure, I deleted all logs 
prior).

So, it seems, somewhere, Ceph in the OSD still thinks there should be more 
disks?

The Ceph cluster is unusable though. We've tried everything to bring it back 
again. But as Dr. Bones would say, it's dead Jim.



___
ceph-users mailing list
ceph-users@lists.ceph.com

[ceph-users] Bluestore compression and existing CephFS filesystem

2017-10-19 Thread Michael Sudnick
Hello, I recently migrated to Bluestore on Luminous and have enabled
aggressive snappy compression on my CephFS data pool. I was wondering if
there was a way to see how much space was being saved. Also, are existing
files compressed at all, or do I have a bunch of resyncing ahead of me?
Sorry if this is in the documentation somewhere - I searched and haven't
been able to find anything.

Thank you,

-Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-13 Thread Michael Kuriger
You may not have enough OSDs to satisfy the crush ruleset.  

 
Mike Kuriger 
Sr. Unix Systems Engineer
818-434-6195 
 

On 10/13/17, 9:53 AM, "ceph-users on behalf of dE" 
 wrote:

Hi,

 I'm running ceph 10.2.5 on Debian (official package).

It cant seem to create any functional pools --

ceph health detail
HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs 
stuck inactive; too few PGs per OSD (21 < min 30)
pg 0.39 is stuck inactive for 652.741684, current state creating, last 
acting []
pg 0.38 is stuck inactive for 652.741688, current state creating, last 
acting []
pg 0.37 is stuck inactive for 652.741690, current state creating, last 
acting []
pg 0.36 is stuck inactive for 652.741692, current state creating, last 
acting []
pg 0.35 is stuck inactive for 652.741694, current state creating, last 
acting []
pg 0.34 is stuck inactive for 652.741696, current state creating, last 
acting []
pg 0.33 is stuck inactive for 652.741698, current state creating, last 
acting []
pg 0.32 is stuck inactive for 652.741701, current state creating, last 
acting []
pg 0.3 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.2e is stuck inactive for 652.741715, current state creating, last 
acting []
pg 0.2d is stuck inactive for 652.741719, current state creating, last 
acting []
pg 0.2c is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.2b is stuck inactive for 652.741723, current state creating, last 
acting []
pg 0.2a is stuck inactive for 652.741725, current state creating, last 
acting []
pg 0.29 is stuck inactive for 652.741727, current state creating, last 
acting []
pg 0.28 is stuck inactive for 652.741730, current state creating, last 
acting []
pg 0.27 is stuck inactive for 652.741732, current state creating, last 
acting []
pg 0.26 is stuck inactive for 652.741734, current state creating, last 
acting []
pg 0.3e is stuck inactive for 652.741707, current state creating, last 
acting []
pg 0.f is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.3f is stuck inactive for 652.741708, current state creating, last 
acting []
pg 0.10 is stuck inactive for 652.741763, current state creating, last 
acting []
pg 0.4 is stuck inactive for 652.741773, current state creating, last 
acting []
pg 0.5 is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3a is stuck inactive for 652.741717, current state creating, last 
acting []
pg 0.b is stuck inactive for 652.741771, current state creating, last 
acting []
pg 0.c is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.3b is stuck inactive for 652.741721, current state creating, last 
acting []
pg 0.d is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.3c is stuck inactive for 652.741722, current state creating, last 
acting []
pg 0.e is stuck inactive for 652.741776, current state creating, last 
acting []
pg 0.3d is stuck inactive for 652.741724, current state creating, last 
acting []
pg 0.22 is stuck inactive for 652.741756, current state creating, last 
acting []
pg 0.21 is stuck inactive for 652.741758, current state creating, last 
acting []
pg 0.a is stuck inactive for 652.741783, current state creating, last 
acting []
pg 0.20 is stuck inactive for 652.741761, current state creating, last 
acting []
pg 0.9 is stuck inactive for 652.741787, current state creating, last 
acting []
pg 0.1f is stuck inactive for 652.741764, current state creating, last 
acting []
pg 0.8 is stuck inactive for 652.741790, current state creating, last 
acting []
pg 0.7 is stuck inactive for 652.741792, current state creating, last 
acting []
pg 0.6 is stuck inactive for 652.741794, current state creating, last 
acting []
pg 0.1e is stuck inactive for 652.741770, current state creating, last 
acting []
pg 0.1d is stuck inactive for 652.741772, current state creating, last 
acting []
pg 0.1c is stuck inactive for 652.741774, current state creating, last 
acting []
pg 0.1b is stuck inactive for 652.741777, current state creating, last 
acting []
pg 0.1a is stuck inactive for 652.741784, current state creating, last 
acting []
pg 0.2 is stuck inactive for 652.741812, current state creating, last 
acting []
pg 0.31 is stuck inactive for 652.741762, current state creating, last 
acting []
pg 0.19 is stuck inactive for 652.741789, current state creating, last 
acting []
pg 0.11 is stuck inactive for 

Re: [ceph-users] can't figure out why I have HEALTH_WARN in luminous

2017-09-25 Thread Michael Kuriger
Thanks!!  I did see that warning, but it never occurred to me I need to disable 
it.

 
Mike Kuriger 
Sr. Unix Systems Engineer 
T: 818-649-7235 M: 818-434-6195 
 <http://www.dexyp.com/>
 

On 9/23/17, 5:52 AM, "John Spray" <jsp...@redhat.com> wrote:

On Fri, Sep 22, 2017 at 6:48 PM, Michael Kuriger <mk7...@dexyp.com> wrote:
> I have a few running ceph clusters.  I built a new cluster using luminous,
> and I also upgraded a cluster running hammer to luminous.  In both cases, 
I
> have a HEALTH_WARN that I can't figure out.  The cluster appears healthy
> except for the HEALTH_WARN in overall status.  For now, I’m monitoring
> health from the “status” instead of “overall_status” until I can find out
> what the issue is.
>
>
>
> Any ideas?  Thanks!

There is a setting called mon_health_preluminous_compat_warning (true
by default), that forces the old overall_status field to WARN, to
create the awareness that your script is using the old health output.

If you do a "ceph health detail -f json" you'll see an explanatory message.

We should probably have made that visible in "status" too (or wherever
we output the overall_status as warning like this) -

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ceph_ceph_pull_17930=DwIFaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=kRQ0vVhTnz9rNJj4pbOQiA=A0xUyyGKflx20twXI038NItlc5j0OPOjFMCPdhP9rJo=aYUvA8rOZCJa3EDrPJY6BGg4ypCyz0wYu9FXsCisRUo=
 

John

>
>
> # ceph health detail
>
> HEALTH_OK
>
>
>
> # ceph -s
>
>   cluster:
>
> id: 11d436c2-1ae3-4ea4-9f11-97343e5c673b
>
> health: HEALTH_OK
>
>
>
> # ceph -s --format json-pretty
>
>
>
> {
>
> "fsid": "11d436c2-1ae3-4ea4-9f11-97343e5c673b",
>
> "health": {
>
> "checks": {},
>
> "status": "HEALTH_OK",
>
> "overall_status": "HEALTH_WARN"
>
>
>
> 
>
>
>
>
>
>
>
> Mike Kuriger
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=kRQ0vVhTnz9rNJj4pbOQiA=A0xUyyGKflx20twXI038NItlc5j0OPOjFMCPdhP9rJo=_HTLRYg4_imosjEYnHqwbN8pHmASsm6bJ7Rs3tBa3OQ=
 
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can't figure out why I have HEALTH_WARN in luminous

2017-09-22 Thread Michael Kuriger
I have a few running ceph clusters.  I built a new cluster using luminous, and 
I also upgraded a cluster running hammer to luminous.  In both cases, I have a 
HEALTH_WARN that I can't figure out.  The cluster appears healthy except for 
the HEALTH_WARN in overall status.  For now, I’m monitoring health from the 
“status” instead of “overall_status” until I can find out what the issue is.

Any ideas?  Thanks!

# ceph health detail
HEALTH_OK

# ceph -s
  cluster:
id: 11d436c2-1ae3-4ea4-9f11-97343e5c673b
health: HEALTH_OK

# ceph -s --format json-pretty

{
"fsid": "11d436c2-1ae3-4ea4-9f11-97343e5c673b",
"health": {
"checks": {},
"status": "HEALTH_OK",
"overall_status": "HEALTH_WARN"





Mike Kuriger

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS billions of files and inline_data?

2017-08-16 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 16.08.2017 um 19:31 schrieb Henrik Korkuc:
> On 17-08-16 19:40, John Spray wrote:
>> On Wed, Aug 16, 2017 at 3:27 PM, Henrik Korkuc <li...@kirneh.eu> wrote:
> maybe you can suggest any recommendations how to scale Ceph for billions
> of objects? More PGs per OSD, more OSDs, more pools? Somewhere in the
> list it was mentioned that OSDs need to keep object list in memory, is
> it still valid for bluestore?
We started using cephfs in 2014 and scaled to 4 billion small files in a
separate pool plus 500 million in a second pool - 2only" 225 TB of data.

Unfortunately every objects creates another object in the data pool so
(due to size with a replication of 2, which is a real pain in the a*)
we're now at about 16 billion inodes distributed over 136 spinning
disks. XFS performed very bad with such huge number of files so we
switched all osd's to ext4 one by one which helped a lot (but keep an
eye on your total number of inodes).

I'm quite sure we made many configuration mistakes (replication of 2 ;
to few pg's in the beginning) and had to learn a lot the hard way while
keeping the site up & running.

As our disks are filling up and we would have to expand our storage -
needs rebalance which takes several months(!) - we decided to leave the
ceph-train and migrate to a more filesystem-like setup. We don't really
need objectstores and it seems cephfs can't manage such a huge number of
files (or we're unable to optimize it for that use-case). We will give
glusterfs with Raid6 underneath and nfs a try - more "basic" and
hopefully more robust.

-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Systemd dependency cycle in Luminous

2017-07-17 Thread Michael Andersen
Thanks for pointing me towards that! You saved me a lot of stress

On Jul 17, 2017 4:39 PM, "Tim Serong" <tser...@suse.com> wrote:

> On 07/17/2017 11:22 AM, Michael Andersen wrote:
> > Hi all
> >
> > I recently upgraded two separate ceph clusters from Jewel to Luminous.
> > (OS is Ubuntu xenial) Everything went smoothly except on one of the
> > monitors in each cluster I had a problem shutting down/starting up. It
> > seems the systemd dependencies are messed up. I get:
> >
> > systemd[1]: ceph-osd.target: Found ordering cycle on
> ceph-osd.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-osd@16.service
> /start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-mon.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph.target/start
> > systemd[1]: ceph-osd.target: Found dependency on ceph-osd.target/start
> >
> > Has anyone seen this? I ignored the first time this happened (and fixed
> > it by uninstalling, purging and reinstalling ceph on that one node) but
> > now it has happened while upgrading a completely different cluster and
> > this one would be quite a pain to uninstall/reinstall ceph on. Any ideas?
>
> I hit the same thing on SUSE Linux, but it should be fixed now, by
> https://github.com/ceph/ceph/pull/15835/commits/357dfa5954.  This went
> into the Luminous branch on July 3, so if your Luminous build is older
> than that, you won't have this fix yet.  See the above commit message
> for the full description, but TL;DR: having a MONs colocated with OSDs
> will sometimes (but not every time) confuse systemd, due to the various
> target files specifying dependencies between each other, without
> specifying explicit ordering.
>
> Regards,
>
> Tim
> --
> Tim Serong
> Senior Clustering Engineer
> SUSE
> tser...@suse.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Systemd dependency cycle in Luminous

2017-07-16 Thread Michael Andersen
Hi all

I recently upgraded two separate ceph clusters from Jewel to Luminous. (OS
is Ubuntu xenial) Everything went smoothly except on one of the monitors in
each cluster I had a problem shutting down/starting up. It seems the
systemd dependencies are messed up. I get:

systemd[1]: ceph-osd.target: Found ordering cycle on ceph-osd.target/start
systemd[1]: ceph-osd.target: Found dependency on ceph-osd@16.service/start
systemd[1]: ceph-osd.target: Found dependency on ceph-mon.target/start
systemd[1]: ceph-osd.target: Found dependency on ceph.target/start
systemd[1]: ceph-osd.target: Found dependency on ceph-osd.target/start

Has anyone seen this? I ignored the first time this happened (and fixed it
by uninstalling, purging and reinstalling ceph on that one node) but now it
has happened while upgrading a completely different cluster and this one
would be quite a pain to uninstall/reinstall ceph on. Any ideas?

Thanks
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing pg_num on cache pool

2017-05-27 Thread Michael Shuey
I don't recall finding a definitive answer - though it was some time ago.
IIRC, it did work but made the pool fragile; I remember having to rebuild
the pools for my test rig soon after.  Don't quite recall the root cause,
though - could have been newbie operator error on my part.  May have also
had something to do with my cache pool settings; at the time I was doing
heavy benchmarking with a limited-size pool, so it's possible I filled the
cache pool with data while the pg_num change was going on, causing subtle
breakage (despite being explicitly warned to NOT do that).


--
Mike Shuey

On Sat, May 27, 2017 at 8:52 AM, Konstantin Shalygin  wrote:

> # ceph osd pool set cephfs_data_cache pg_num 256
>>
>> Error EPERM: splits in cache pools must be followed by scrubs and
>> leave sufficient free space to avoid overfilling.  use
>> --yes-i-really-mean-it to force.
>>
>>
>> Is there something I need to do, before increasing PGs on a cache
>> pool?  Can this be (safely) done live?
>>
>
> Hello.
> You found answer on this question? I can't google anything about this
> warning.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph-disk prepare not properly preparing disks on one of my OSD nodes, running 11.2.0-0 on CentOS7

2017-04-16 Thread Michael Sudnick
Hello, I have a few ceph OSD hosting nodes. All, as far as I can tell, have
identical ceph versions. The problem I am hitting is as follows. I enter
the following for example:

ceph3# ceph-disk prepare --cluster ceph --cluster-uuid
aca834ef-5617-47fd-be18-283faba1f0b1 --fs-type xfs /dev/sdb

The disk seems to get formatted correctly, however when it attempts to
activate the disk the logs mentioned it was unable to detect the
superblock. There are no new osd entries in the crush map either. Manually
mounting the disk shows that it is missing a file with it's uuid, and the
whoami file, sorry for not being more descriptive, I worked around the
problem by preparing the disk on a working box and substituting it back in.

Any suggestions on where to look to start debugging this? SELinux is off on
all nodes.

Thanks for any help you are able to provide.

-Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG calculator improvement

2017-04-13 Thread Michael Kidd
Hello Frédéric,
  Thank you very much for the input.  I would like to ask for some feedback
from you, as well as the ceph-users list at large.

The PGCalc tool was created to help steer new Ceph users in the right
direction, but it's certainly difficult to account for every possible
scenario.  I'm struggling to find a way to implement something that would
work better for the scenario that you (Frédéric) describe, while still
being a useful starting point for the novice / more mainstream use cases.
I've also gotten complaints at the other end of the spectrum, that the tool
expects the user to know too much already, so accounting for the number of
objects is bound to add to this sentiment.

As the Ceph user base expands and the use cases diverge, we are definitely
finding more edge cases that are causing pain.  I'd love to make something
to help prevent these types of issues, but again, I worry about the
complexity introduced.

With this, I see a few possible ways forward:
* Simply re-wroding the %data to be % object count -- but this seems more
abstract, again leading to more confusion of new users.
* Increase complexity of the PG Calc tool, at the risk of further
alienating novice/mainstream users
* Add a disclaimer about the tool being a base for decision making, but
that certain edge cases require adjustments to the recommended PG count
and/or ceph.conf & sysctl values.
* Add a disclaimer urging the end user to secure storage consulting if
their use case falls into certain categories or they are new to Ceph to
ensure the cluster will meet their needs.

Having been on the storage consulting team and knowing the expertise they
have, I strongly believe that newcomers to Ceph (or new use cases inside of
established customers) should secure consulting before final decisions are
made on hardware... let alone the cluster is deployed.  I know it seems a
bit self-serving to make this suggestion as I work at Red Hat, but there is
a lot on the line when any establishment is storing potentially business
critical data.

I suspect the answer lies in a combination of the above or in something
I've not thought of.  Please do weigh in as any and all suggestions are
more than welcome.

Thanks,
Michael J. Kidd
Principal Software Maintenance Engineer
Red Hat Ceph Storage
+1 919-442-8878


On Wed, Apr 12, 2017 at 6:35 AM, Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

>
> Hi,
>
> I wanted to share a bad experience we had due to how the PG calculator
> works.
>
> When we set our production cluster months ago, we had to decide on the
> number of PGs to give to each pool in the cluster.
> As you know, the PG calc would recommended to give a lot of PGs to heavy
> pools in size, regardless the number of objects in the pools. How bad...
>
> We essentially had 3 pools to set on 144 OSDs :
>
> 1. a EC5+4 pool for the radosGW (.rgw.buckets) that would hold 80% of all
> datas in the cluster. PG calc recommended 2048 PGs.
> 2. a EC5+4 pool for zimbra's data (emails) that would hold 20% of all
> datas. PG calc recommended 512 PGs.
> 3. a replicated pool for zimbra's metadata (null size objects holding
> xattrs - used for deduplication) that would hold 0% of all datas. PG calc
> recommended 128 PGs, but we decided on 256.
>
> With 120M of objects in pool #3, as soon as we upgraded to Jewel, we hit
> the Jewel scrubbing bug (OSDs flapping).
> Before we could upgrade to patched Jewel, scrub all the cluster again
> prior to increasing the number of PGs on this pool, we had to take more
> than a hundred of snapshots (for backup/restoration purposes), with the
> number of objects still increasing in the pool. Then when a snapshot was
> removed, we hit the current Jewel snap trimming bug affecting pools with
> too many objects for the number of PGs. The only way we could stop the
> trimming was to stop OSDs resulting in PGs being degraded and not trimming
> anymore (snap trimming only happens on active+clean PGs).
>
> We're now just getting out of this hole, thanks to Nick's post regarding
> osd_snap_trim_sleep and RHCS support expertise.
>
> If the PG calc had considered not only the pools weight but also the
> number of expected objects in the pool (which we knew by that time), we
> wouldn't have it these 2 bugs.
> We hope this will help improving the ceph.com and RHCS PG calculators.
>
> Regards,
>
> Frédéric.
>
> --
>
> Frédéric Nass
>
> Sous-direction Infrastructures
> Direction du Numérique
> Université de Lorraine
>
> Tél : +33 3 72 74 11 35
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Michael Andersen
I definitely had all the rbd volumes unmounted. I am not sure if they were
unmapped. I can try that.

On Fri, Feb 10, 2017 at 9:10 PM, Brad Hubbard <bhubb...@redhat.com> wrote:

> On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> > Just making sure the list sees this for those that are following.
> >
> > On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <mich...@steelcode.com>
> wrote:
> >> Right, so yes libceph is loaded
> >>
> >> root@compound-7:~# lsmod | egrep "ceph|rbd"
> >> rbd69632  0
> >> libceph   245760  1 rbd
> >> libcrc32c  16384  3 xfs,raid456,libceph
> >>
> >> I stopped all the services and unloaded the modules
> >>
> >> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target
> >> root@compound-7:~# modprobe -r rbd
> >> root@compound-7:~# modprobe -r libceph
> >> root@compound-7:~# lsmod | egrep "ceph|rbd"
> >>
> >> Then rebooted
> >> root@compound-7:~# reboot
> >>
> >> And sure enough the reboot happened OK.
> >>
> >> So that solves my immediate problem, I now know how to work around it
> >> (thanks!), but I would love to work out how to not need this step. Any
>
> Can you double-check that all rbd volumes are unmounted on this host
> when shutting down? Maybe unmap them just for good measure.
>
> I don't believe the libceph module should need to talk to the cluster
> unless it has active connections at the time of shutdown.
>
> >> further info I can give to help?
> >>
> >>
> >>
> >> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <
> mich...@steelcode.com>
> >> wrote:
> >>>
> >>> Sorry this email arrived out of order. I will do the modprobe -r test
> >>>
> >>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubb...@redhat.com>
> wrote:
> >>>>
> >>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <
> mich...@steelcode.com>
> >>>> wrote:
> >>>> > I believe I did shutdown mon process. Is that not done by the
> >>>> >
> >>>> > sudo systemctl stop ceph\*.service ceph\*.target
> >>>> >
> >>>> > command? Also, as I noted, the mon process does not show up in ps
> after
> >>>> > I do
> >>>> > that, but I still get the shutdown halting.
> >>>> >
> >>>> > The libceph kernel module may be installed. I did not do so
> >>>> > deliberately but
> >>>> > I used ceph-deploy so if it installs that then that is why it's
> there.
> >>>> > I
> >>>> > also run some kubernetes pods with rbd persistent volumes on these
> >>>> > machines,
> >>>> > although no rbd volumes are in use or mounted when I try shut down.
> In
> >>>> > fact
> >>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is
> >>>> > libceph
> >>>> > required for rbd?
> >>>>
> >>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no.
> >>>>
> >>>> As a test try modprobe -r on both the libceph and rbd modules before
> >>>> shutdown and see if that helps ("modprobe -r rbd" should unload
> >>>> libceph as well but verify that).
> >>>>
> >>>> >
> >>>> > But even so, is it normal for the libceph kernel module to prevent
> >>>> > shutdown?
> >>>> > Is there another stage in the shutdown procedure that I am missing?
> >>>> >
> >>>> >
> >>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com>
> wrote:
> >>>> >
> >>>> > That looks like dmesg output from the libceph kernel module. Do you
> >>>> > have the libceph kernel module loaded?
> >>>> >
> >>>> > If the answer to that question is "yes" the follow-up question is
> >>>> > "Why?" as it is not required for a MON or OSD host.
> >>>> >
> >>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen
> >>>> > <mich...@steelcode.com>
> >>>> > wrote:
> >>>> >> Yeah, all three mons have OSDs on the same machines.
> >>>> >&g

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Michael Andersen
I believe I did shutdown mon process. Is that not done by the

sudo systemctl stop ceph\*.service ceph\*.target

command? Also, as I noted, the mon process does not show up in ps after I
do that, but I still get the shutdown halting.

The libceph kernel module may be installed. I did not do so deliberately
but I used ceph-deploy so if it installs that then that is why it's there.
I also run some kubernetes pods with rbd persistent volumes on these
machines, although no rbd volumes are in use or mounted when I try shut
down. In fact I unmapped all rbd volumes across the whole cluster to make
sure. Is libceph required for rbd?

But even so, is it normal for the libceph kernel module to prevent
shutdown? Is there another stage in the shutdown procedure that I am
missing?

On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com> wrote:

That looks like dmesg output from the libceph kernel module. Do you
have the libceph kernel module loaded?

If the answer to that question is "yes" the follow-up question is
"Why?" as it is not required for a MON or OSD host.

On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <mich...@steelcode.com>
wrote:
> Yeah, all three mons have OSDs on the same machines.
>
> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> wrote:
>>
>> Is your primary MON running on the host which some OSDs are running on?
>>
>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>> <mich...@steelcode.com> wrote:
>> > Hi
>> >
>> > I am running a small cluster of 8 machines (80 osds), with three
>> > monitors on
>> > Ubuntu 16.04. Ceph version 10.2.5.
>> >
>> > I cannot reboot the monitors without physically going into the
>> > datacenter
>> > and power cycling them. What happens is that while shutting down, ceph
>> > gets
>> > stuck trying to contact the other monitors but networking has already
>> > shut
>> > down or something like that. I get an endless stream of:
>> >
>> > libceph: connect 10.20.0.10:6789 error -101
>> > libceph: connect 10.20.0.13:6789 error -101
>> > libceph: connect 10.20.0.17:6789 error -101
>> >
>> > where in this case 10.20.0.10 is the machine I am trying to shut down
>> > and
>> > all three IPs are the MONs.
>> >
>> > At this stage of the shutdown, the machine doesn't respond to pings,
and
>> > I
>> > cannot even log in on any of the virtual terminals. Nothing to do but
>> > poweroff at the server.
>> >
>> > The other non-mon servers shut down just fine, and the cluster was
>> > healthy
>> > at the time I was rebooting the mon (I only reboot one machine at a
>> > time,
>> > waiting for it to come up before I do the next one).
>> >
>> > Also worth mentioning that if I execute
>> >
>> > sudo systemctl stop ceph\*.service ceph\*.target
>> >
>> > on the server, the only things I see are:
>> >
>> > root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
>> > root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]
>> >
>> > and even then, when no ceph daemons are left running, doing a reboot
>> > goes
>> > into the same loop.
>> >
>> > I can't really find any mention of this online, but I feel someone must
>> > have
>> > hit this. Any idea how to fix it? It's really annoying because its hard
>> > for
>> > me to get access to the datacenter.
>> >
>> > Thanks
>> > Michael
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Michael Andersen
Yeah, all three mons have OSDs on the same machines.

On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> wrote:

> Is your primary MON running on the host which some OSDs are running on?
>
> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
> <mich...@steelcode.com> wrote:
> > Hi
> >
> > I am running a small cluster of 8 machines (80 osds), with three
> monitors on
> > Ubuntu 16.04. Ceph version 10.2.5.
> >
> > I cannot reboot the monitors without physically going into the datacenter
> > and power cycling them. What happens is that while shutting down, ceph
> gets
> > stuck trying to contact the other monitors but networking has already
> shut
> > down or something like that. I get an endless stream of:
> >
> > libceph: connect 10.20.0.10:6789 error -101
> > libceph: connect 10.20.0.13:6789 error -101
> > libceph: connect 10.20.0.17:6789 error -101
> >
> > where in this case 10.20.0.10 is the machine I am trying to shut down and
> > all three IPs are the MONs.
> >
> > At this stage of the shutdown, the machine doesn't respond to pings, and
> I
> > cannot even log in on any of the virtual terminals. Nothing to do but
> > poweroff at the server.
> >
> > The other non-mon servers shut down just fine, and the cluster was
> healthy
> > at the time I was rebooting the mon (I only reboot one machine at a time,
> > waiting for it to come up before I do the next one).
> >
> > Also worth mentioning that if I execute
> >
> > sudo systemctl stop ceph\*.service ceph\*.target
> >
> > on the server, the only things I see are:
> >
> > root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
> > root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]
> >
> > and even then, when no ceph daemons are left running, doing a reboot goes
> > into the same loop.
> >
> > I can't really find any mention of this online, but I feel someone must
> have
> > hit this. Any idea how to fix it? It's really annoying because its hard
> for
> > me to get access to the datacenter.
> >
> > Thanks
> > Michael
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cannot shutdown monitors

2017-02-10 Thread Michael Andersen
Hi

I am running a small cluster of 8 machines (80 osds), with three monitors
on Ubuntu 16.04. Ceph version 10.2.5.

I cannot reboot the monitors without physically going into the datacenter
and power cycling them. What happens is that while shutting down, ceph gets
stuck trying to contact the other monitors but networking has already shut
down or something like that. I get an endless stream of:

libceph: connect 10.20.0.10:6789 error -101
libceph: connect 10.20.0.13:6789 error -101
libceph: connect 10.20.0.17:6789 error -101

where in this case 10.20.0.10 is the machine I am trying to shut down and
all three IPs are the MONs.

At this stage of the shutdown, the machine doesn't respond to pings, and I
cannot even log in on any of the virtual terminals. Nothing to do but
poweroff at the server.

The other non-mon servers shut down just fine, and the cluster was healthy
at the time I was rebooting the mon (I only reboot one machine at a time,
waiting for it to come up before I do the next one).

Also worth mentioning that if I execute

sudo systemctl stop ceph\*.service ceph\*.target

on the server, the only things I see are:

root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]

and even then, when no ceph daemons are left running, doing a reboot goes
into the same loop.

I can't really find any mention of this online, but I feel someone must
have hit this. Any idea how to fix it? It's really annoying because its
hard for me to get access to the datacenter.

Thanks
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running 'ceph health' as non-root user

2017-02-01 Thread Michael Hartz
Actually it wasn't. Light-headed, I failed to recognize, that 
/etc/ceph/ceph.conf is only a symlink to the Proxmox fuse location below 
/etc/pve which permissions aren't easily changed.

I have now resorted to a workaround, dumping the output of 'ceph health' as a 
cron job and reading that periodically. That is fully sufficient for my 
situation.

Many thanks



01.02.2017, 09:58, "Henrik Korkuc" <li...@kirneh.eu>:
> On 17-02-01 10:55, Michael Hartz wrote:
>>  I am running ceph as part of a Proxmox Virtualization cluster, which is 
>> doing great.
>>
>>  However for monitoring purpose I would like to periodically check with 
>> 'ceph health' as a non-root user.
>>  This fails with the following message:
>>>  su -c 'ceph health' -s /bin/bash nagios
>>  Error initializing cluster client: PermissionDeniedError('error calling 
>> conf_read_file',)
>>
>>  Please note: running the command as root user works as intended.
>>
>>  Someone else suggested to allow group permissions on the admin keyring, 
>> i.e. chmod 660 /etc/ceph/ceph.client.admin.keyring
>>  Link: https://github.com/thelan/ceph-zabbix/issues/12
>>  This didn't work.
>>
>>  Has anyone hints on this?
>
> is /etc/ceph/ceph.conf readable for that user?
>
>>  ___
>>  ceph-users mailing list
>>  ceph-users@lists.ceph.com
>>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running 'ceph health' as non-root user

2017-02-01 Thread Michael Hartz
I am running ceph as part of a Proxmox Virtualization cluster, which is doing 
great.

However for monitoring purpose I would like to periodically check with 'ceph 
health' as a non-root user.
This fails with the following message:
> su -c 'ceph health' -s /bin/bash nagios
Error initializing cluster client: PermissionDeniedError('error calling 
conf_read_file',)

Please note: running the command as root user works as intended.

Someone else suggested to allow group permissions on the admin keyring, i.e. 
chmod 660 /etc/ceph/ceph.client.admin.keyring
Link: https://github.com/thelan/ceph-zabbix/issues/12
This didn't work.

Has anyone hints on this?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Javascript error at http://ceph.com/pgcalc/

2017-01-11 Thread Michael Kidd
Hello John,
  Thanks for the bug report.  Unfortunately, I'm not able to reproduce the
error.  I tested from both Firefox and Chrome on linux.  Can you let me
know what os/browser you're using?  Also, I've not tested any non 'en-US'
characters, so I can't attest to how it will behave with other alphabets.

Thanks,

Michael J. Kidd
Sr. Software Maintenance Engineer
Red Hat Ceph Storage
+1 919-442-8878 <(919)%20442-8878>

On Wed, Jan 11, 2017 at 6:01 PM, 林自均 <johnl...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your link!
>
> However, when I am using your clone of pgcalc, the newly created pool
> didn't follow my values in the "Add Pool" dialog. For example, no matter
> what I fill in "Pool Name", I always get "newPool" as the name.
>
> By the way, where can I find the git repository of pgcalc? I can't find it
> at https://github.com/ceph/. Thanks!
>
> Best,
> John Lin
>
> Michael Kidd <linuxk...@redhat.com> 於 2017年1月12日 週四 上午7:02寫道:
>
>> Hello John,
>>   Apologies for the error.  We will be working to correct it, but in the
>> interim, you can use http://linuxkidd.com/ceph/pgcalc.html
>>
>> Thanks,
>>
>> Michael J. Kidd
>> Sr. Software Maintenance Engineer
>> Red Hat Ceph Storage
>> +1 919-442-8878 <(919)%20442-8878>
>>
>> On Wed, Jan 11, 2017 at 12:03 AM, 林自均 <johnl...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am not sure if this is the correct mailing list. Correct me if I am
>> wrong.
>>
>> I failed to add a pool at http://ceph.com/pgcalc/ because of a
>> Javascript error:
>>
>> (index):345 Uncaught TypeError: $(...).dialog is not a function
>> at addPool (http://ceph.com/pgcalc/:345:31)
>> at HTMLButtonElement.onclick (http://ceph.com/pgcalc/:534:191)
>>
>> Where can I report this error? Thanks.
>>
>> Best,
>> John Lin
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH - best books and learning sites

2016-12-29 Thread Michael Hackett
Hello Andre,

The Ceph site would be the best place to get the information you are
looking for, specifically the docs section:
http://docs.ceph.com/docs/master/.

Karan Singh actually wrote two books which can be useful as initial
resources as well

Learning Ceph:

https://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?ie=UTF8=1483032651=8-1=ceph

Ceph Cookbook:

https://www.amazon.com/Ceph-Cookbook-Karan-Singh-ebook/dp/B0171UHJGY/ref=sr_1_2?ie=UTF8=1483032651=8-2=ceph

These resources as well as the community are a good place to start.

Thanks,

Mike Hackett
Sr Software Maintenance Engineer
Red Hat Ceph Storage Product Lead


On Thu, Dec 29, 2016 at 6:20 AM, Andre Forigato 
wrote:

> Hello,
>
> I'm starting to study Ceph for implementation in our company.
>
> I need the help of the community.
> I'm looking for Ceph's best books and learning sites.
>
> Are the people using Suse or Redhat distribution?
> My question is what best Linux distribution should I use?
>
>
> Thanks to the Ceph community.
>
> Andre
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fixing zones

2016-09-28 Thread Michael Parson

On Wed, 28 Sep 2016, Orit Wasserman wrote:

see blow

On Tue, Sep 27, 2016 at 8:31 PM, Michael Parson <mpar...@bl.org> wrote:





We googled around a bit and found the fix-zone script:

https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone

Which ran fine until the last command, which errors out with:

+ radosgw-admin zone default --rgw-zone=default
WARNING: failed to initialize zonegroup



That is a known issue (default zone is a realm propety) it should not
effect you because radosgw uses the "default" zone
if it doesn't find any zone.


the 'default' rgw-zone seems OK:

$ sudo radosgw-admin zone get --zone-id=default
{
"id": "default",
"name": "default",
"domain_root": ".rgw_",


the underscore doesn't look good here and in the other pools
are you sure this are the pools you used before?


The underscores were done by the script referenced above, but you're
right, I don't see the underscores in my osd pool list:

$ sudo ceph osd pool ls | grep rgw | sort
default.rgw.buckets.data
.rgw
.rgw.buckets
.rgw.buckets.extra
.rgw.buckets.index
.rgw.control
.rgw.gc
.rgw.meta
.rgw.root
.rgw.root.backup

--
Michael Parson
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fixing zones

2016-09-27 Thread Michael Parson

(I tried to start this discussion on irc, but I wound up with the wrong
paste buffer and wound up getting kicked off for a paste flood, sorry,
that was on me :(  )

We were having some weirdness with our Ceph and did an upgrade up to
10.2.3, which fixed some, but not all of our problems.

It looked like our users pool might have been corrupt, so we moved it
aside and created a new set:

$ sudo ceph osd pool rename .users old.users
$ sudo ceph osd pool rename .users.email old.users.email
$ sudo ceph osd pool rename .users.swift old.users.swift
$ sudo ceph osd pool rename .users.uid old.users.uid


$ sudo ceph osd pool create .users 16 16
$ sudo ceph osd pool create .users.email 16 16
$ sudo ceph osd pool create .users.swift 16 16
$ sudo ceph osd pool create .users.uid 16 16

This allowed me to create new users and swift subusers under them, but
only the first one is allowing auth, all others are getting 403s when
attempting to auth.

We googled around a bit and found the fix-zone script:

https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone

Which ran fine until the last command, which errors out with:

+ radosgw-admin zone default --rgw-zone=default
WARNING: failed to initialize zonegroup

the 'default' rgw-zone seems OK:

$ sudo radosgw-admin zone get --zone-id=default
{
"id": "default",
"name": "default",
"domain_root": ".rgw_",
"control_pool": ".rgw.control_",
"gc_pool": ".rgw.gc_",
"log_pool": ".log_",
"intent_log_pool": ".intent-log_",
"usage_log_pool": ".usage_",
"user_keys_pool": ".users_",
"user_email_pool": ".users.email_",
"user_swift_pool": ".users.swift_",
"user_uid_pool": ".users.uid_",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": ".rgw.buckets.index_",
"data_pool": ".rgw.buckets_",
"data_extra_pool": ".rgw.buckets.extra_",
"index_type": 0
}
}
],
"metadata_heap": ".rgw.meta",
"realm_id": "a113de3d-c506-4112-b419-0d5c94ded7af"
}

Not really sure where to go from here, any help would be appreciated.

--
Michael Parson
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-07 Thread Michael Sudnick
I've had to force recreate some PGs on my cephfs data pool due to some
cascading disk failures in my homelab cluster. Is there a way to easily
determine which files I need to restore from backup? My metadata pool is
completely intact.

Thanks for any help and suggestions.

Sincerely,
  Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG down, primary OSD no longer exists

2016-09-06 Thread Michael Sudnick
I was wondering if someone could help me recover a PG, a few days ago I had
a bunch of disks die in a small home-lab cluster. I removed the disks from
their hosts, and rm'ed the OSDs. Now I have a PG stuck down, that will not
peer whose acting OSDs (and the primary) are one of the OSDs I had rm'ed
earlier. I believe generally one would mark the OSD lost to try to get the
PG to recovery, however as I rm'ed the OSD earlier I cannot mark it as lost.

ceph health details gives:
pg 33.34e is stuck inactive since forever, current state
stale+down+remapped+peering, last acting [24,10]
pg 33.34e is stuck unclean since forever, current state
stale+down+remapped+peering, last acting [24,10]
pg 33.34e is stale+down+remapped+peering, acting [24,10]

OSD.24 no longer exists and the disk is fried. So recovery is not possible.

Thanks for any suggestsion,

Sincerely,
  Michael Sudnick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repairing a broken leveldb

2016-07-11 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

while rebalancing a drive experienced read errors so I think leveldb was
corrupted. Unfortunately there's currently no second copy which is
up2date so I can forget this pg. Only one pg is affected (I moved all
other pg's away as they had active copies on another osd.

In "daily business" this osd is still running, but crashes when starting
backfilling. [1]. This pg holds meta-data for our cephfs so loosing data
would be painful.

Any ideas how to recover/repair leveldb or at least skip the broken
part? Thanks in advance.

  "up": [
34,
105],
  "acting": [
9],
  "backfill_targets": [
"34",
"105"],
  "actingbackfill": [
"9",
"34",
    "105"],

[1] http://www.michael-metz.de/ceph-osd.9.log.gz

-- 
Kind regards
 Michael Metz-Martini

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mount /etc/fstab

2016-06-27 Thread Michael Hanscho
On 2016-06-27 11:40, John Spray wrote:
> On Sun, Jun 26, 2016 at 10:51 AM, Michael Hanscho <rese...@gmx.net> wrote:
>> On 2016-06-26 10:30, Christian Balzer wrote:
>>>
>>> Hello,
>>>
>>> On Sun, 26 Jun 2016 09:33:10 +0200 Willi Fehler wrote:
>>>
>>>> Hello,
>>>>
>>>> I found an issue. I've added a ceph mount to my /etc/fstab. But when I
>>>> boot my system it hangs:
>>>>
>>>> libceph: connect 192.168.0.5:6789 error -101
>>>>
>>>> After the system is booted I can successfully run mount -a.
>>>>
>>>
>>> So what does that tell you?
>>> That Ceph can't connect during boot, because... there's no network yet.
>>>
>>> This is what the "_netdev" mount option is for.
>>>
>>
>> http://docs.ceph.com/docs/master/cephfs/fstab/
>>
>> No hint in the documentation - although a full page on cephfs and fstab?!
> 
> Yeah, that is kind of an oversight!  https://github.com/ceph/ceph/pull/9942

Thanks a lot!!

Gruesse
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unsubscribe

2016-06-26 Thread Michael Ferguson
Thanks

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mount /etc/fstab

2016-06-26 Thread Michael Hanscho
On 2016-06-26 10:30, Christian Balzer wrote:
> 
> Hello,
> 
> On Sun, 26 Jun 2016 09:33:10 +0200 Willi Fehler wrote:
> 
>> Hello,
>>
>> I found an issue. I've added a ceph mount to my /etc/fstab. But when I 
>> boot my system it hangs:
>>
>> libceph: connect 192.168.0.5:6789 error -101
>>
>> After the system is booted I can successfully run mount -a.
>>
> 
> So what does that tell you?
> That Ceph can't connect during boot, because... there's no network yet.
> 
> This is what the "_netdev" mount option is for.
> 

http://docs.ceph.com/docs/master/cephfs/fstab/

No hint in the documentation - although a full page on cephfs and fstab?!

Gruesse
Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Criteria for Ceph journal sizing

2016-06-20 Thread Michael Hanscho
Hi!
On 2016-06-20 14:32, Daleep Singh Bais wrote:
> Dear All,
> 
> Is their some criteria for deciding on Ceph journal size to be used,
> whether in respect to Data partition size etc? I have noticed that if
> not specified, it takes the journal size to be 5GB.
> 
> Any insight in this regard will be helpful for my understanding.

See documentation:
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/

osd journal size = {2 * (expected throughput * filestore max sync interval)}

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/28433

Gruesse
Michael


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Michael Kuriger
Hmm, if I only enable layering features I can get it to work.  But I’m puzzled 
why all the (default) features are not working with my system fully up to date.

Any ideas?  Is this not yet supported?


[root@test ~]# rbd create `hostname` --size 102400 --image-feature layering
[
root@test ~]# rbd map `hostname`
/dev/rbd0

[root@test ~]# rbd info `hostname`
rbd image ‘test.np.wc1.example.com':
size 102400 MB in 25600 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.13582ae8944a
format: 2
features: layering
flags: 




 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235








On 6/15/16, 9:56 AM, "ceph-users on behalf of Michael Kuriger" 
<ceph-users-boun...@lists.ceph.com on behalf of mk7...@yp.com> wrote:

>Still not working with newer client.  But I get a different error now.
>
>
>
>[root@test ~]# rbd ls
>
>test1
>
>
>
>[root@test ~]# rbd showmapped
>
>
>
>[root@test ~]# rbd map test1
>
>rbd: sysfs write failed
>
>RBD image feature set mismatch. You can disable features unsupported by the 
>kernel with "rbd feature disable".
>
>In some cases useful info is found in syslog - try "dmesg | tail" or so.
>
>rbd: map failed: (6) No such device or address
>
>
>
>[root@test ~]# dmesg | tail
>
>[52056.980880] rbd: loaded (major 251)
>
>[52056.990399] libceph: mon0 10.1.77.165:6789 session established
>
>[52056.992567] libceph: client4966 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52057.024913] rbd: image mk7193.np.wc1.yellowpages.com: image uses 
>unsupported features: 0x3c
>
>[52085.856605] libceph: mon0 10.1.77.165:6789 session established
>
>[52085.858696] libceph: client4969 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52085.883350] rbd: image test1: image uses unsupported features: 0x3c
>
>[52167.683868] libceph: mon1 10.1.78.75:6789 session established
>
>[52167.685990] libceph: client4937 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
>
>[52167.709796] rbd: image test1: image uses unsupported features: 0x3c
>
>
>
>[root@test ~]# uname -a
>
>Linux test.np.4.6.2-1.el7.elrepo.x86_64 #1 SMP Wed Jun 8 14:49:20 EDT 2016 
>x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>[root@test ~]# ceph --version
>
>ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
>
>
>
>
>
>
>
>
> 
>
>
>
> 
>
>Michael Kuriger
>
>Sr. Unix Systems Engineer
>
>* mk7...@yp.com |( 818-649-7235
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On 6/14/16, 12:28 PM, "Ilya Dryomov" <idryo...@gmail.com> wrote:
>
>
>
>>On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger <mk7...@yp.com> wrote:
>
>>> I just realized that this issue is probably because I’m running jewel 
>>> 10.2.1 on the servers side, but accessing from a client running hammer 
>>> 0.94.7 or infernalis 9.2.1
>
>>>
>
>>> Here is what happens if I run rbd ls from a client on infernalis.  I was 
>>> testing this access since we weren’t planning on building rpms for Jewel on 
>>> CentOS 6
>
>>>
>
>>> $ rbd ls
>
>>> 2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x562ed3ea0ac0).fault
>
>>> 2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 
>>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe608004ef0).fault
>
>>> 2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe60800c5f0).fault
>
>>> 2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
>>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe6080078e0).fault
>
>>> 2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 
>>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7fe608007110).fault
>
>>
>
>>Accessing jewel with older clients should work as long as you don't
>
>>enable jewel tunables and such; the same goes for older kernels.  Can
>
>>you do
>
>>
>
>>rbd --debug-ms=20 ls
>
>>
>
>>and attach the output?
>
>>
>
>>Thanks,
>
>>
>
>>Ilya
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=CwIGaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=87up-v2FeckUxAE8N-S9YPgbNa_YWlaYrV8efOsXeEs=k9uAOwbxafawJqm096e0GZqUPU2YbN3qm1GBol7ZvN4=
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-15 Thread Michael Kuriger
Still not working with newer client.  But I get a different error now.

[root@test ~]# rbd ls
test1

[root@test ~]# rbd showmapped

[root@test ~]# rbd map test1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the 
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address

[root@test ~]# dmesg | tail
[52056.980880] rbd: loaded (major 251)
[52056.990399] libceph: mon0 10.1.77.165:6789 session established
[52056.992567] libceph: client4966 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52057.024913] rbd: image mk7193.np.wc1.yellowpages.com: image uses unsupported 
features: 0x3c
[52085.856605] libceph: mon0 10.1.77.165:6789 session established
[52085.858696] libceph: client4969 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52085.883350] rbd: image test1: image uses unsupported features: 0x3c
[52167.683868] libceph: mon1 10.1.78.75:6789 session established
[52167.685990] libceph: client4937 fsid f1466aaa-b08b-4103-ba7f-69165d675ba1
[52167.709796] rbd: image test1: image uses unsupported features: 0x3c

[root@test ~]# uname -a
Linux test.np.4.6.2-1.el7.elrepo.x86_64 #1 SMP Wed Jun 8 14:49:20 EDT 2016 
x86_64 x86_64 x86_64 GNU/Linux

[root@test ~]# ceph --version
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)




 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235








On 6/14/16, 12:28 PM, "Ilya Dryomov" <idryo...@gmail.com> wrote:

>On Mon, Jun 13, 2016 at 8:37 PM, Michael Kuriger <mk7...@yp.com> wrote:
>> I just realized that this issue is probably because I’m running jewel 10.2.1 
>> on the servers side, but accessing from a client running hammer 0.94.7 or 
>> infernalis 9.2.1
>>
>> Here is what happens if I run rbd ls from a client on infernalis.  I was 
>> testing this access since we weren’t planning on building rpms for Jewel on 
>> CentOS 6
>>
>> $ rbd ls
>> 2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x562ed3ea0ac0).fault
>> 2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 
>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe608004ef0).fault
>> 2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe60800c5f0).fault
>> 2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
>> 10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe6080078e0).fault
>> 2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 
>> 10.1.78.75:6789/0 pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
>> c=0x7fe608007110).fault
>
>Accessing jewel with older clients should work as long as you don't
>enable jewel tunables and such; the same goes for older kernels.  Can
>you do
>
>rbd --debug-ms=20 ls
>
>and attach the output?
>
>Thanks,
>
>Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-13 Thread Michael Kuriger
I just realized that this issue is probably because I’m running jewel 10.2.1 on 
the servers side, but accessing from a client running hammer 0.94.7 or 
infernalis 9.2.1

Here is what happens if I run rbd ls from a client on infernalis.  I was 
testing this access since we weren’t planning on building rpms for Jewel on 
CentOS 6

$ rbd ls
2016-06-13 11:24:06.881591 7fe61e568700  0 -- :/3877046932 >> 
10.1.77.165:6789/0 pipe(0x562ed3ea7550 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x562ed3ea0ac0).fault
2016-06-13 11:24:09.882051 7fe61137f700  0 -- :/3877046932 >> 10.1.78.75:6789/0 
pipe(0x7fe608000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe608004ef0).fault
2016-06-13 11:24:12.882389 7fe61e568700  0 -- :/3877046932 >> 
10.1.77.165:6789/0 pipe(0x7fe608008350 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7fe60800c5f0).fault
2016-06-13 11:24:18.883642 7fe61e568700  0 -- :/3877046932 >> 
10.1.77.165:6789/0 pipe(0x7fe608008350 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7fe6080078e0).fault
2016-06-13 11:24:21.884259 7fe61137f700  0 -- :/3877046932 >> 10.1.78.75:6789/0 
pipe(0x7fe608000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe608007110).fault



 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235







On 6/13/16, 2:10 AM, "Ilya Dryomov" <idryo...@gmail.com> wrote:

>On Fri, Jun 10, 2016 at 9:29 PM, Michael Kuriger <mk7...@yp.com> wrote:
>> Hi Everyone,
>> I’ve been running jewel for a while now, with tunables set to hammer.  
>> However, I want to test the new features but cannot find a fully compatible 
>> Kernel for CentOS 7.  I’ve tried a few of the elrepo kernels - elrepo-kernel 
>> 4.6 works perfectly in CentOS 6, but not CentOS 7.  I’ve tried 3.10, 4.3, 
>> 4.5, and 4.6.
>>
>> What does seem to work with the 4.6 kernel is mounting, read/write to a 
>> cephfs, and rbd map / mounting works also.  I just can’t do 'rbd ls'
>>
>> 'rbd ls' does not work with 4.6 kernel but it does work with the stock 3.10 
>> kernel.
>
>"rbd ls" operation doesn't depend on the kernel.  What do you mean by
>"can't do" - no output at all?
>
>Something similar was reported here [1].  What's the output of "rados
>-p  stat rbd_directory"?
>
>> 'rbd mount' does not work with the stock 3.10 kernel, but works with the 4.6 
>> kernel.
>
>Anything in dmesg on 3.10?
>
>[1] 
>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_ceph-2Dusers-40lists.ceph.com_msg29515.html=CwIFaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=kJrwZaeBUVskcdX_AfCixcgqOLQYSWLx-mq-LdVsDt8=wbjUPx1PtYZMra-NyGZasBDZZ_T7fXvqidZ11xkSiVY=
> 
>
>Thanks,
>
>Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] which CentOS 7 kernel is compatible with jewel?

2016-06-10 Thread Michael Kuriger
Hi Everyone,
I’ve been running jewel for a while now, with tunables set to hammer.  However, 
I want to test the new features but cannot find a fully compatible Kernel for 
CentOS 7.  I’ve tried a few of the elrepo kernels - elrepo-kernel 4.6 works 
perfectly in CentOS 6, but not CentOS 7.  I’ve tried 3.10, 4.3, 4.5, and 4.6.

What does seem to work with the 4.6 kernel is mounting, read/write to a cephfs, 
and rbd map / mounting works also.  I just can’t do 'rbd ls'  

'rbd ls' does not work with 4.6 kernel but it does work with the stock 3.10 
kernel.
'rbd mount' does not work with the stock 3.10 kernel, but works with the 4.6 
kernel.  

Very odd.  Any advice? 

Thanks!

 

 
Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com |( 818-649-7235


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating from one Ceph cluster to another

2016-06-09 Thread Michael Kuriger
This is how I did it.  I upgraded my old cluster first (live one by one) .  
Then I added my new OSD servers to my running cluster.  Once they were all 
added I set the weight to 0 on all my original osd's.  This causes a lot of IO 
but all data will be migrated to the new servers.  Then you can remove the old 
OSD servers from the cluster.  



 
Michael Kuriger
Sr. Unix Systems Engineer

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
den Hollander
Sent: Thursday, June 09, 2016 12:47 AM
To: Marek Dohojda; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Migrating from one Ceph cluster to another


> Op 8 juni 2016 om 22:49 schreef Marek Dohojda <mdoho...@altitudedigital.com>:
> 
> 
> I have a ceph cluster (Hammer) and I just built a new cluster 
> (Infernalis).  This cluster contains VM boxes based on KVM.
> 
> What I would like to do is move all the data from one ceph cluster to 
> another.  However the only way I could find from my google searches 
> would be to move each image to local disk, copy this image across to 
> new cluster, and import it.
> 
> I am hoping that there is a way to just synch the data (and I do 
> realize that KVMs will have to be down for the full migration) from 
> one cluster to another.
> 

You can do this with the rbd command using export and import.

Something like:

$ rbd export image1 -|rbd import image1 -

Where you have both RBD commands connect to a different Ceph cluster. See 
--help on how to do that.

You can run this in a loop with the output of 'rbd ls'.

But that's about the only way.

Wido

> Thank you
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_lis
> tinfo.cgi_ceph-2Dusers-2Dceph.com=CwICAg=lXkdEK1PC7UK9oKA-BBSI8p1A
> amzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=lhisR2C1GH95fR5NYNEGWebX
> LILh56cyhY8u9v56o6M=ddR_8bexw5SKK1wD5UNp9Oijw0Z0I9RnhaIJbcfUS-8=
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=CwICAg=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=lhisR2C1GH95fR5NYNEGWebXLILh56cyhY8u9v56o6M=ddR_8bexw5SKK1wD5UNp9Oijw0Z0I9RnhaIJbcfUS-8=
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CoreOS Cluster of 7 machines and Ceph

2016-06-03 Thread Michael Shuey
Sorry for the late reply - been traveling.

I'm doing exactly that right now, using the ceph-docker container.
It's just in my test rack for now, but hardware arrived this week to
seed the production version.

I'm using separate containers for each daemon, including a container
for each OSD.  I've got a bit of cloudinit logic to loop over all
disks in a machine, fire off a "prepare" container if the disk isn't
partitioned, then start an "activate" container to bring the OSD up.
Works pretty well; I can power on a new machine, and get a stack of
new OSDs about 5 minutes later.

I've opted to not allow ANY containers to run on local disk, and we're
setting up appropriate volume plugins (for NFS, CephFS, and Ceph RBDs)
now.  IN THEORY (so far, so good) volumes will be dynamically mapped
into the container at startup.  This should let us orchestrate
containers with Swarm or Kubernetes, and give us the same volumes
wherever they land.  In a few weeks we'll start experimenting with a
vxlan network plugin as well, to allow similar flexibility with IPs
and subnets.  Once that's done, our registry will become just another
container (though we'll need a master registry, with storage on local
disk SOMEWHERE, to be able to handle a cold-boot of the Ceph
containers).

I'm curious where you're going with your environment.  To me,
Ceph+Docker seems like a nice match; if others are doing this, we
should definitely pool experiences.

--
Mike Shuey


On Thu, May 26, 2016 at 7:00 AM, EnDSgUy EnDSgUy  wrote:
> Hello All,
>
> I am looking for some help to design the Ceph for the cluster of 7
> machines running on CoreOS with fleet and docker. I am still thinking
> what's the best way for the moment.
>
> Has anybody done something similair and could advise on his
> experiences?
>
> The primary purpose is
> - be able to store "docker data containers" and only them in a
> redundant way (so let a specific directory be mounted to specific
> container). So Ceph should be availabe in a container.
> - ideally other types of containers should be running without
> redundancy just on the hard drive
> - docker images (registry) should also be stored in a redundant way
>
> Has anybody done something similair?
>
>
> Dmitry
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with Calamari setup

2016-06-02 Thread Michael Kuriger
For me, this same issue was caused by having too new a version of salt.  I’m 
running salt-2014.1.5-1 in centos 7.2, so yours will probably be different.  
But I thought it was worth mentioning.


[yp]



Michael Kuriger
Sr. Unix Systems Engineer
• mk7...@yp.com<mailto:mk7...@yp.com> |• 818-649-7235



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
fridifree
Sent: Wednesday, June 01, 2016 6:00 AM
To: Ceph Users
Subject: [ceph-users] Problems with Calamari setup

Hello, Everyone.

I'm trying to install a Calamari server in my organisation and I'm encountering 
some problems.

I have a small dev environment, just 4 OSD nodes and 5 monitors (one of them is 
also the RADOS GW). We chose to use Ubuntu 14.04 LTS for all our servers. The 
Calamari server is provisioned by VMware for now, the rest of the servers are 
physical.

The packages' versions are as follows:
- calamari-server - 1.3.1.1-1trusty
- calamari-client - 1.3.1.1-1trusty
- salt - 0.7.15
- diamond - 3.4.67

I used Calamari Survival 
Guide<https://urldefense.proofpoint.com/v2/url?u=http-3A__ceph.com_planet_ceph-2Dcalamari-2Dthe-2Dsurvival-2Dguide_=CwMFaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=HYKmd-dJJAf1DrtaQM7Hs2hJN80sTAo4DWNcW14cYtw=e5MDHG9JjYjQU6WqzfyaiZLcwVIj7FYJhk4g8_f45l0=>
 but without the 'build' part.

The problem is I've managed to install the server and the web page, but the 
Calamari server doesn't recognize the cluster. It does manage to OSD nodes 
connected to it, but without a cluster (that exists).

Also, the output of the "salt '*' ceph.get_heartbeats" command seems to look 
fine, as the Cthultu log (but maybe I'm looking for the wrong thing). 
Re-installing the cluster is not an option, we want to connect the Calamari as 
it is, without hurting the Ceph cluster.

Thanks so much!

Jacob Goldenberg,
Israel.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph pg status problem

2016-05-31 Thread Michael Hackett
Hello,

Check your CRUSH map and verify what your failure domain for you CRUSH rule
is set to (for example OSD or host). You need to verify that your failure
domain can satisfy you pool replication value. You may need to decrease
your pool replication value or modify your CRUSH map.

Thank you,

Mike Hackett
Sr Software Maintenance Engineer
Red Hat Ceph Storage


On Tue, May 31, 2016 at 7:43 AM, Patrick McGarry 
wrote:

> Moving this to ceph-user list where it belongs.
> On May 31, 2016 6:03 AM, "Liu Lan(上海_技术部_基础平台_运维部_刘览)" 
> wrote:
>
>> Hi team,
>>
>>
>>
>>I’m a newer of using ceph. Now I encourage the ceph health problem
>> about pg status as following:
>>
>>
>>
>> I created 256 pgs and the status stuck at undersized and I don’t know
>> what’s that means and how to resolve it. Please help me to check. Thanks.
>>
>>
>>
>> # ceph health detail
>>
>> HEALTH_ERR 256 pgs are stuck inactive for more than 300 seconds; 255 pgs
>> degraded; 256 pgs stuck inactive; 255 pgs undersized
>>
>> pg 0.c5 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.c4 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.c3 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.c2 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.c1 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.c0 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.bf is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.be is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.bd is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.bc is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [7]
>>
>> pg 0.bb is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.ba is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.b9 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [7]
>>
>> pg 0.b8 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.b7 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.b6 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.b5 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.b4 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.b3 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.b2 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.b1 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.b0 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.af is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.ae is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [11]
>>
>> pg 0.ad is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.ac is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [7]
>>
>> pg 0.ab is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [13]
>>
>> pg 0.aa is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.a9 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.a8 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.a7 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [12]
>>
>> pg 0.a6 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [9]
>>
>> pg 0.a5 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.a4 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.a3 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [8]
>>
>> pg 0.a2 is stuck inactive since forever, current state
>> undersized+degraded+peered, last acting [7]
>>
>> pg 0.a1 is stuck inactive since forever, 

Re: [ceph-users] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid

2016-05-26 Thread Michael Kuriger
Are you using an old ceph.conf with the original FSID from your first attempt 
(in your deploy directory)?


[yp]



Michael Kuriger
Sr. Unix Systems Engineer
* mk7...@yp.com<mailto:mk7...@yp.com> |* 818-649-7235



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Albert.K.Chong (git.usca07.Newegg) 22201
Sent: Thursday, May 26, 2016 8:49 AM
To: Albert.K.Chong (git.usca07.Newegg) 22201; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-disk: Error: No cluster conf found in /etc/ceph 
with fsid

Hi,

Can anyone help on this topic?


Albert

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Albert.K.Chong (git.usca07.Newegg) 22201
Sent: Wednesday, May 25, 2016 3:04 PM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] ceph-disk: Error: No cluster conf found in /etc/ceph with 
fsid

Hi,

I followed storage cluster Quick start guild with my centos 7 vm.  I failed the 
same step in more than 10 times including complete cleaning and reinstallation. 
 The last try I just create osd in the local drive to avoid some permission 
warning and run "ceph-deploy osd prepare .. and

[albert@admin-node my-cluster]$ ceph-deploy osd activate 
admin-node:/home/albert/my-cluster/cephd2
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/albert/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.33): /usr/bin/ceph-deploy osd activate 
admin-node:/home/albert/my-cluster/cephd2
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: activate
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  disk  : [('admin-node', 
'/home/albert/my-cluster/cephd2', None)]
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
admin-node:/home/albert/my-cluster/cephd2:
[admin-node][DEBUG ] connection detected need for sudo
[admin-node][DEBUG ] connected to host: admin-node
[admin-node][DEBUG ] detect platform information from remote host
[admin-node][DEBUG ] detect machine type
[admin-node][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.2.1511 Core
[ceph_deploy.osd][DEBUG ] activating host admin-node disk 
/home/albert/my-cluster/cephd2
[ceph_deploy.osd][DEBUG ] will use init type: systemd
[admin-node][DEBUG ] find the location of an executable
[admin-node][INFO  ] Running command: sudo /usr/sbin/ceph-disk -v activate 
--mark-init systemd --mount /home/albert/my-cluster/cephd2
[admin-node][WARNIN] main_activate: path = /home/albert/my-cluster/cephd2
[admin-node][WARNIN] activate: Cluster uuid is 
8f9bf207-6c6a-4764-8b9e-63f70810837b
[admin-node][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid
[admin-node][WARNIN] Traceback (most recent call last):
[admin-node][WARNIN]   File "/usr/sbin/ceph-disk", line 9, in 
[admin-node][WARNIN] load_entry_point('ceph-disk==1.0.0', 
'console_scripts', 'ceph-disk')()
[admin-node][WARNIN]   File 
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4964, in run
[admin-node][WARNIN] main(sys.argv[1:])
[admin-node][WARNIN]   File 
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4915, in main
[admin-node][WARNIN] args.func(args)
[admin-node][WARNIN]   File 
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3277, in 
main_activate
[admin-node][WARNIN] init=args.mark_init,
[admin-node][WARNIN]   File 
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3097, in activate_dir
[admin-node][WARNIN] (osd_id, cluster) = activate(path, 
activate_key_template, init)
[admin-node][WARNIN]   File 
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3173, in activate
[admin-node][WARNIN] ' with fsid %s' % ceph_fsid)
[admin-node][WARNIN] ceph_disk.main.Error: Error: No cluster conf found in 
/etc/ceph with fsid 8f9bf207-6c6a-4764-8b9e-63f70810837b
[admin-node][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: 
/usr/sbin/ceph-disk -v activate --mark-init systemd --mount 
/home/albert/my-cluster/cephd2


Need some help.  Really appreciated.


Albert
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't Start / Stop Ceph jewel under Centos 7.2

2016-05-26 Thread Michael Kuriger
Did you update to ceph version 10.2.1 
(3a66dd4f30852819c1bdaa8ec23c795d4ad77269)?  This issue should have been 
resolved with the last update.  (It was for us)



 
Michael Kuriger
Sr. Unix Systems Engineer
r mk7...@yp.com |  818-649-7235



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Hauke 
Homburg
Sent: Thursday, May 26, 2016 2:42 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Can't Start / Stop Ceph jewel under Centos 7.2


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I need to test my icinga Monitoring fpr Ceph- So i want to shutdown Ceph 
Services on one Server. But systemctl start ceph.target doesn't run.

How can stop the Services


Thanks for Help

Hauke

- --
www.w3-creative.de

www.westchat.de
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJXRsTaAAoJEEIVizQb/Y0m67AP+wQzrpYKSnNA7KURP4mHOo0m
zoPmDkbucNaMBM1tFXNVJRJhyN6MVHKjkrE3TOC20AXVg1F3/nGOuW+kPjoDFek1
sRizx6kfTVrh9rQzLoadpimg32cmxjglFfA9pByPZUusPx2tOjjy1t+ucGyP91Mv
b7UXnhL2crX0dtstkYiQAq1Lyb14KdVSO8wCYfzBWgK8nf6Pc0ylSkQj+9012O6c
FLceIJA6og8ALBXTl/t0Xw09oOWOxCPblnY8Gt4I1lBKmlqq5Ztc3PM1Pg+xRt7U
zwPyiG2PqpiNfF+bc3tCavLjV+L6nASoJJ1vpYEvu3Y7W+FleIRiI8qYMyOhjXsd
SfBZSjG3fGKe+kXgKYjlIcv1GdYid6gLLDoKdPLkvnUuFDbodGBuJFq6hZ3gY1kd
rXZwCmME4mlK1uIuYjOZx+TuQj74ET4gHBe5GHq4veZe+q2Q7azJpihEPxtPLqJM
jtF7WNAsSHPIvqbvwAMgRtcyS5DBSg3mrpiTF80557Gdk1eltfLRJBW+I6SyweLr
05HBElskq3kg7rSkjJYvPDGE9qj6auLCZAgazOG2vbcqk9pm1u+spgIajQ0fRKnr
p/VfHZPqlV9rOqHH05MgA/On9DuyxEPulnZga3qsIHDMSmtQGSqU+CbNLXg0kdOE
7uQjqQFADi8FSSBfZXPR
=ilug
-END PGP SIGNATURE-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove a placement group?

2016-05-15 Thread Michael Kuriger
I would try:
ceph pg repair 15.3b3


[yp]



Michael Kuriger
Sr. Unix Systems Engineer
• mk7...@yp.com<mailto:mk7...@yp.com> |• 818-649-7235



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Romero 
Junior
Sent: Saturday, May 14, 2016 11:46 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] How to remove a placement group?

Hi all,

I’m currently having trouble with an incomplete pg.
Our cluster has a replication factor of 3, however somehow I found this pg to 
be present in 9 different OSDs (being active only in 3 of them, of course).
Since I don’t really care about data loss, I was wondering if it’s possible to 
get rid of this by simply removing the pg. Is that even possible?

I’m running ceph version 0.94.3.

Here are some quick details:

1 pgs incomplete
1 pgs stuck inactive
100 requests are blocked > 32 sec

6 ops are blocked > 67108.9 sec on osd.130
94 ops are blocked > 33554.4 sec on osd.130

pg 15.3b3 is stuck inactive since forever, current state incomplete, last 
acting [130,210,148]
pg 15.3b3 is stuck unclean since forever, current state incomplete, last acting 
[130,210,148]
pg 15.3b3 is incomplete, acting [130,210,148]

Running a: “ceph pg 15.3b3 query” hangs without response.

I’ve tried setting OSD 130 as down, but then OSD 210 becomes the one keeping 
things stuck (query hangs), same for OSD 148.

Any ideas?

Kind regards,
Romero Junior
DevOps Infra Engineer
LeaseWeb Global Services B.V.

T: +31 20 316 0230
M: +31 6 2115 9310
E: r.jun...@global.leaseweb.com<mailto:r.jun...@global.leaseweb.com>
W: 
www.leaseweb.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.leaseweb.com=CwMGaQ=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ=CSYA9OS6Qd7fQySI2LDvlQ=Tscmkf_Z7k--OL5OSU9Z7wVkGAo5p0jK7WwWxuRpOM8=UBExeF-l0mdhN7KXDDE8imOJV2d0HP2q5YnDrGd32gw=>



Luttenbergweg 8,

1101 EC Amsterdam,

Netherlands




LeaseWeb is the brand name under which the various independent LeaseWeb 
companies operate. Each company is a separate and distinct entity that provides 
services in a particular geographic area. LeaseWeb Global Services B.V. does 
not provide third-party services. Please see 
www.leaseweb.com/en/legal<http://www.leaseweb.com/en/legal> for more 
information.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Small cluster PG question

2016-05-05 Thread Michael Shuey
Most likely, they'd has well across the OSDs - but if I'm correct in my own
research (still a bit of a Ceph noob, myself) you'd still want more PGs.

An OSD can process several things at once - and should, for peak
throughput.  I wouldn't be surprised if some operations serialize within a
PG; if so, one PG per OSD isn't going to be able to max out the potential
of your OSDs.  I suspect there's reasoning along this line behind Ceph's
recommendation for PGs per OSD - I believe best practice is somewhere
between 10-30.



--
Mike Shuey

On Thu, May 5, 2016 at 7:58 PM, Roland Mechler <rmech...@opendns.com> wrote:

> Thanks for your response. So... if I configured 3 PGs for the pool, would
> they necessarily each have their primary on a different OSD, thus spreading
> the load? Or, would it be better to have more PGs to ensure an even
> distribution?
>
> I was also wondering about the pros and cons performance wise of having a
> pool size of 3 vs 2. It seems there would be a benefit for reads (1.5 times
> the bandwidth) but a penalty for writes because the primary has to forward
> to 2 nodes instead of 1. Does that make sense?
>
> -Roland
>
> On Thu, May 5, 2016 at 4:13 PM, Michael Shuey <sh...@fmepnet.org> wrote:
>
>> Reads will be limited to 1/3 of the total bandwidth.  A set of PGs has
>> a "primary" - that's the first one (and only one, if it's up & in)
>> consulted on a read.  The other PGs will still exist, but they'll only
>> take writes (and only after the primary PG forwards along data).  If
>> you have multiple PGs, reads (and write-mastering duties) will be
>> spread across all 3 servers.
>>
>>
>> --
>> Mike Shuey
>>
>>
>> On Thu, May 5, 2016 at 5:36 PM, Roland Mechler <rmech...@opendns.com>
>> wrote:
>> > Let's say I have a small cluster (3 nodes) with 1 OSD per node. If I
>> create
>> > a pool with size 3, such that each object in the pool will be
>> replicated to
>> > each OSD/node, is there any reason to create the pool with more than 1
>> PG?
>> > It seems that increasing the number of PGs beyond 1 would not provide
>> any
>> > additional benefit in terms of data balancing or durability, and would
>> have
>> > a cost in terms of resource usage. But when I try this, I get a "pool
>> 
>> > has many more objects per pg than average (too few pgs?)" warning from
>> ceph
>> > health. Is there a cost to having a large number of objects per PG?
>> >
>> > -Roland
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
>
> --
> Roland Mechler
> Site Reliability Engineer
> [image: OpenDNS] <http://opendns.com/>
> Mobile:
> 604-727-5257
> Email:
> rmech...@cisco.com
> *OpenDNS Vancouver* <http://opendns.com/>
> 675 West Hastings St, Suite 500
> Vancouver, BC V6B 1N2
> Canada
> <http://maps.google.com/maps?q=675+West+Hastings+St,+Suite+200%2CVancouver%2C+BC%2C+Canada=en>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Small cluster PG question

2016-05-05 Thread Michael Shuey
Reads will be limited to 1/3 of the total bandwidth.  A set of PGs has
a "primary" - that's the first one (and only one, if it's up & in)
consulted on a read.  The other PGs will still exist, but they'll only
take writes (and only after the primary PG forwards along data).  If
you have multiple PGs, reads (and write-mastering duties) will be
spread across all 3 servers.


--
Mike Shuey


On Thu, May 5, 2016 at 5:36 PM, Roland Mechler  wrote:
> Let's say I have a small cluster (3 nodes) with 1 OSD per node. If I create
> a pool with size 3, such that each object in the pool will be replicated to
> each OSD/node, is there any reason to create the pool with more than 1 PG?
> It seems that increasing the number of PGs beyond 1 would not provide any
> additional benefit in terms of data balancing or durability, and would have
> a cost in terms of resource usage. But when I try this, I get a "pool 
> has many more objects per pg than average (too few pgs?)" warning from ceph
> health. Is there a cost to having a large number of objects per PG?
>
> -Roland
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu or CentOS for my first lab. Please recommend. Thanks

2016-05-05 Thread Michael Ferguson
Thank you also Oliver.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Oliver Dzombic
Sent: Thursday, May 05, 2016 9:33 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ubuntu or CentOS for my first lab. Please
recommend. Thanks

Hi,

for me, centos is the choice of OS for enterprise usage for two reasons:

1. you receive longer software/repository updates 2. rpm / yum
updates/upgrades can be taken back. So you can rollback on a privous
software version you used.

Both is not present in ubuntu/debian/apt based OS's, afaik.

In both, ubuntu and centos, in combination with ceph, you should anyway use
a (more) recent kernel with your customer optimizations for your individual
hardware / software requirements.

And for ceph, it does not matter if the software in the repository is not
up2date. You anyway wont install anything else but ceph on ceph nodes.

Same goes for the community support. Since you are just using ceph, you wont
need community support ( except of the ceph community ).

So all in all, i dont see a single reason to pick ubuntu, right now,
especially for ceph-only environments.

The only thing what is a bit biting, is that/if ceph specific systemd files
are not working, not complete or for what ever reason not working as its
expected. That already happend in ubuntu, as well as in centos.



BUT in the very end: the only smart choice is to use the OS YOU are most
familar / fine with. In case of emergency/troubleshooting, its ALWAYS best,
if you work with a known system ( and dont need to rely too much on
searchengine input/support of 3rd party ).

So, in the very end, take that, what you can work with -and like most ! :)

--
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 05.05.2016 um 14:26 schrieb Jan Schermer:
> This a always a topic that starts a flamewar my POV:
> 
> Ubuntu + generally newer versions of software, packages are closer to 
> vanilla versions
> + more community packages
> + several versions of HWE (kernels) to choose from over lifetime of 
> + the
> distro
> - not much support from vendors (for e.g. firmware upgrades, BIOS, 
> binary packages)
> 
> CentOS+ more "stable" versions
> + more enterprisey (unchanging) landscape, with better compatibility 
> + generally compatible with RHEL, means that binaries and support are
> usually provider by vendors
> -  frankenpackages of ancient versions patched ad nauseum with 
> backported features
> - documentation lacking on "specialities" that are not present in 
> vanilla versions (kernel is the worst offender)
> 
> My experience is that Ubuntu is much faster overall, can be better 
> "googled" or subverted to your needs, LTS versions seldom break during 
> upgrades but I've seen it.
> CentOS is more suitable for running software like SAP or application 
> servers like JBoss if you need support. I've never seen breakage 
> during upgrades, but those upgrades mostly aren't even worth it :)
> 
> Usually, this choice is up to organisational preference, CentOS will 
> be much easier to use in environment heavy with vendors and
certifications...
> 
> Jan
> 
> 
>> On 05 May 2016, at 14:09, Michael Ferguson 
>> <fergu...@eastsidemiami.com <mailto:fergu...@eastsidemiami.com>> wrote:
>>
>>  
>>  
>> Michael E. Ferguson,
>> “First, your place, and then, the world’s”
>> “Good work ain’t cheap, and cheap work ain’t good”*
>> **PHONE: 305-333-2185 | FAX: 305-533-1582 | 
>> fergu...@eastsidemiami.com
>> <mailto:fergu...@eastsidemiami.com>*
>>  
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu or CentOS for my first lab. Please recommend. Thanks

2016-05-05 Thread Michael Ferguson
Jan,

Thank you very much.

No flamewar please.

Many greetings to all

 

Mike

 

From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Thursday, May 05, 2016 8:26 AM
To: Michael Ferguson <fergu...@eastsidemiami.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ubuntu or CentOS for my first lab. Please recommend. 
Thanks

 

This a always a topic that starts a flamewar

my POV:

 

Ubuntu+ generally newer versions of software, packages are closer 
to vanilla versions

+ more community packages

+ several versions of HWE (kernels) to choose from over 
lifetime of the distro

- not much support from vendors (for e.g. firmware 
upgrades, BIOS, binary packages)

 

CentOS   + more "stable" versions

+ more enterprisey (unchanging) landscape, with better 
compatibility

+ generally compatible with RHEL, means that binaries 
and support are usually provider by vendors

-  frankenpackages of ancient versions patched ad 
nauseum with backported features

- documentation lacking on "specialities" that are not 
present in vanilla versions (kernel is the worst offender) 

 

My experience is that Ubuntu is much faster overall, can be better "googled" or 
subverted to your needs, LTS versions seldom break during upgrades but I've 
seen it.

CentOS is more suitable for running software like SAP or application servers 
like JBoss if you need support. I've never seen breakage during upgrades, but 
those upgrades mostly aren't even worth it :)

 

Usually, this choice is up to organisational preference, CentOS will be much 
easier to use in environment heavy with vendors and certifications...

 

Jan

 

 

On 05 May 2016, at 14:09, Michael Ferguson <fergu...@eastsidemiami.com 
<mailto:fergu...@eastsidemiami.com> > wrote:

 

 

 

Michael E. Ferguson, 

“First, your place, and then, the world’s”

“Good work ain’t cheap, and cheap work ain’t good”
PHONE: 305-333-2185 | FAX: 305-533-1582 | fergu...@eastsidemiami.com 
<mailto:fergu...@eastsidemiami.com> 

 

___
ceph-users mailing list
 <mailto:ceph-users@lists.ceph.com> ceph-users@lists.ceph.com
 <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ubuntu or CentOS for my first lab. Please recommend. Thanks

2016-05-05 Thread Michael Ferguson
 

 

Michael E. Ferguson, 

"First, your place, and then, the world's"

"Good work ain't cheap, and cheap work ain't good"
PHONE: 305-333-2185 | FAX: 305-533-1582 |
<mailto:fergu...@eastsidemiami.com> fergu...@eastsidemiami.com

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  1   2   3   >