from:"Donny Davis"

Re: [ceph-users] Ceph replication factor of 2

2018-05-25 Thread Donny Davis

Nobody cares about their data until they don't have it anymore. Using
replica 3 is the same logic as RAID6. Its likely if one drive has crapped
out, more will meet the maker soon. If you care about your data, then do
what you can to keep it around. If its a lab like mine, who cares its
all ephemeral to me. The decision is about your use case and workload.

If it was my production data, i would spend the money.

On Fri, May 25, 2018 at 3:48 AM, Janne Johansson 
wrote:

>
>
> Den fre 25 maj 2018 kl 00:20 skrev Jack :
>
>> On 05/24/2018 11:40 PM, Stefan Kooman wrote:
>> >> What are your thoughts, would you run 2x replication factor in
>> >> Production and in what scenarios?
>> Me neither, mostly because I have yet to read a technical point of view,
>> from someone who read and understand the code
>>
>> I do not buy Janne's "trust me, I am an engineer", whom btw confirmed
>> that the "replica 3" stuff is subject to probability and function to the
>> cluster size, thus is not a generic "always-true" rule
>>
>
> I did not call for trust on _my_ experience or value, but on the ones
> posting the
> first "everyone should probably use 3 replicas" over which you showed
> doubt.
> I agree with them, but did not intend to claim that my post had extra
> value because
> it was written by me.
>
> Also, the last part of my post was very much intended to add "not
> everything in 3x is true for everyone",
> but if you value your data, it would be very prudent to listen to
> experienced people who took risks and lost data before.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] split brain case

2018-04-02 Thread Donny Davis

The only reason to stray from what is defined in the docs is if you have a
very specific use case for the application of RAID or something else not
defined.

You are in good shape. Just follow the guidance in the docs and you
shouldn't have any problems.

Ceph is powerful and intelligent technology.

On Mon, Apr 2, 2018 at 7:51 PM, ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote:

> I’m newbie to Ceph J.  When I go through the doc and some discussions,
> seems one OSD per disk will be of better performance than one OSD per
> server on RAID.   Is that correct?
>
>
>
> Thanks again.
>
>
>
>
>
> *From:* Donny Davis [mailto:do...@fortnebula.com]
> *Sent:* Tuesday, April 03, 2018 10:19 AM
>
> *To:* ST Wong (ITSC)
> *Cc:* Ronny Aasen; ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> It would work fine either way. I was just curious how people are setting
> up Ceph in their environments.
>
>
>
> Usually when people say they have one OSD per server, then they are using
> RAID for one reason or another.
>
>
>
> It not really relevant to the question at hand, but thank you for
> satisfying my curiosity
>
>
>
> On Mon, Apr 2, 2018 at 7:13 PM, ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
> wrote:
>
> There are multiple disks per server, and will have one OSD for each disk.
> Is that okay?
>
>
>
> Thanks again.
>
>
>
> *From:* Donny Davis [mailto:do...@fortnebula.com]
> *Sent:* Tuesday, April 03, 2018 10:12 AM
> *To:* ST Wong (ITSC)
> *Cc:* Ronny Aasen; ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> Do you only have one OSD per server? Not that is really matters... because
> all of the above stated is true in any case.
>
> Just curious
>
>
>
> On Mon, Apr 2, 2018 at 6:40 PM, ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
> wrote:
>
> Hi,
>
>
>
> >how many servers are your osd's split over ? keep in mind that ceph's
> default picks one osd from each host. so you would need minimum 4 osd hosts
> in total to be able to use 4+2 pools and with only 4 hosts you have no
> failuredomain.  but 4 hosts in the minimum sane starting point for a
> regular small cluster with 3+2 pools  (you can loose a node and ceph
> selfheals as long as there are enough freespace.
>
>
>
> We’ll have 8 servers to split over (4 in each room).  Thanks.
>
>
>
> Best Rgds,
>
> /st wong
>
>
>
> *From:* Ronny Aasen [mailto:ronny+ceph-us...@aasen.cx]
> *Sent:* Friday, March 30, 2018 3:18 AM
> *To:* ST Wong (ITSC); ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> On 29.03.2018 11:13, ST Wong (ITSC) wrote:
>
> Hi,
>
>
>
> Thanks.
>
>
>
> > ofcourse the 4 osd's left working now want to selfheal by recreating
> all objects stored on the 4 split off osd's and have a huge recovery job.
> and you may risk that the osd's goes into too_full error, unless you have
> free space in your osd's to recreate all the data in the defective part of
> the cluster. or they will be stuck in recovery mode until you get the
> second room running, this depends on your crush map.
>
>
>
> Means we’ve to made 4 OSD machines sufficient space to hold all data and
> thus the usable space will be halved?
>
>
> yes if you want to be able to be able to operatate one room as if it was
> the whole cluster (HA) then you need this.
> also if you want to have 4+2 instead of 3+2 pool size to avoid the
> blocking during recovery, that would take a whole lot of ekstra space.
> you can optionally let the cluster run degraded with 4+2 while one room is
> down. or temporary set pools to 2+2 while the other room is down, to reduce
> the space requirements.
>
>
>
> > point in that slitting the cluster hurts. and if HA is the most
> important then you may  want to check out rbd mirror.
>
>
>
> Will consider when there is budget to setup another ceph cluster for rdb
> mirror.
>
>
> i do not know your needs or applications, but while you only have 2 rooms
> you may just think of it as a single cluster that just happen to occupy 2
> rooms.  but with that few osd's you should perhaps just put the cluster in
> a single  room
> the pain of splitting a cluster down the middle is quite significant. and
> i would perhaps use resources to improve the redundancy of the networks
> between the buildings instead. have multiple paths between the buildings to
> prevent service disruption in the building that does not house the cluster.
>
> having 5 mons is quite a lot. i think most clusters have 3 mons up into

Re: [ceph-users] ceph-fuse segfaults

2018-04-02 Thread Donny Davis

The kernel client in my experience was much better all around. I had an
issue with file corruption issue quite a long time ago) that could have
been prevented if I had proper UPS and was using the kernel client.

On Mon, Apr 2, 2018 at 7:12 PM, Zhang Qiang  wrote:

> Thanks Patrick,
> I should have checked the tracker first.
> I'll try the kernel client and a upgrade to see if it resolves.
>
> On 2 April 2018 at 22:29, Patrick Donnelly  wrote:
> > Probably fixed by this: http://tracker.ceph.com/issues/17206
> >
> > You need to upgrade your version of ceph-fuse.
> >
> > On Mon, Apr 2, 2018 at 12:56 AM, Zhang Qiang 
> wrote:
> >> Hi,
> >>
> >> I'm using ceph-fuse 10.2.3 on CentOS 7.3.1611. ceph-fuse always
> >> segfaults after running for some time.
> >>
> >> *** Caught signal (Segmentation fault) **
> >>  in thread 7f455d832700 thread_name:ceph-fuse
> >>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >>  1: (()+0x2a442a) [0x7f457208e42a]
> >>  2: (()+0xf5e0) [0x7f4570b895e0]
> >>  3: (Client::get_root_ino()+0x10) [0x7f4571f86a20]
> >>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x18d)
> >> [0x7f4571f844bd]
> >>  5: (()+0x19ae21) [0x7f4571f84e21]
> >>  6: (()+0x164b5) [0x7f457199e4b5]
> >>  7: (()+0x16bdb) [0x7f457199ebdb]
> >>  8: (()+0x13471) [0x7f457199b471]
> >>  9: (()+0x7e25) [0x7f4570b81e25]
> >>  10: (clone()+0x6d) [0x7f456fa6934d]
> >>
> >> Detailed events dump:
> >> https://drive.google.com/file/d/0B_4ESJRu7BZFcHZmdkYtVG5CTGQ3UVFo
> d0NxQloxS0ZCZmQ0/view?usp=sharing
> >> Let me know if more info is needed.
> >>
> >> Thanks.
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> >
> > --
> > Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] split brain case

2018-04-02 Thread Donny Davis

It would work fine either way. I was just curious how people are setting up
Ceph in their environments.

Usually when people say they have one OSD per server, then they are using
RAID for one reason or another.

It not really relevant to the question at hand, but thank you for
satisfying my curiosity

On Mon, Apr 2, 2018 at 7:13 PM, ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote:

> There are multiple disks per server, and will have one OSD for each disk.
> Is that okay?
>
>
>
> Thanks again.
>
>
>
> *From:* Donny Davis [mailto:do...@fortnebula.com]
> *Sent:* Tuesday, April 03, 2018 10:12 AM
> *To:* ST Wong (ITSC)
> *Cc:* Ronny Aasen; ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> Do you only have one OSD per server? Not that is really matters... because
> all of the above stated is true in any case.
>
> Just curious
>
>
>
> On Mon, Apr 2, 2018 at 6:40 PM, ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
> wrote:
>
> Hi,
>
>
>
> >how many servers are your osd's split over ? keep in mind that ceph's
> default picks one osd from each host. so you would need minimum 4 osd hosts
> in total to be able to use 4+2 pools and with only 4 hosts you have no
> failuredomain.  but 4 hosts in the minimum sane starting point for a
> regular small cluster with 3+2 pools  (you can loose a node and ceph
> selfheals as long as there are enough freespace.
>
>
>
> We’ll have 8 servers to split over (4 in each room).  Thanks.
>
>
>
> Best Rgds,
>
> /st wong
>
>
>
> *From:* Ronny Aasen [mailto:ronny+ceph-us...@aasen.cx]
> *Sent:* Friday, March 30, 2018 3:18 AM
> *To:* ST Wong (ITSC); ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> On 29.03.2018 11:13, ST Wong (ITSC) wrote:
>
> Hi,
>
>
>
> Thanks.
>
>
>
> > ofcourse the 4 osd's left working now want to selfheal by recreating
> all objects stored on the 4 split off osd's and have a huge recovery job.
> and you may risk that the osd's goes into too_full error, unless you have
> free space in your osd's to recreate all the data in the defective part of
> the cluster. or they will be stuck in recovery mode until you get the
> second room running, this depends on your crush map.
>
>
>
> Means we’ve to made 4 OSD machines sufficient space to hold all data and
> thus the usable space will be halved?
>
>
> yes if you want to be able to be able to operatate one room as if it was
> the whole cluster (HA) then you need this.
> also if you want to have 4+2 instead of 3+2 pool size to avoid the
> blocking during recovery, that would take a whole lot of ekstra space.
> you can optionally let the cluster run degraded with 4+2 while one room is
> down. or temporary set pools to 2+2 while the other room is down, to reduce
> the space requirements.
>
>
>
>
> > point in that slitting the cluster hurts. and if HA is the most
> important then you may  want to check out rbd mirror.
>
>
>
> Will consider when there is budget to setup another ceph cluster for rdb
> mirror.
>
>
> i do not know your needs or applications, but while you only have 2 rooms
> you may just think of it as a single cluster that just happen to occupy 2
> rooms.  but with that few osd's you should perhaps just put the cluster in
> a single  room
> the pain of splitting a cluster down the middle is quite significant. and
> i would perhaps use resources to improve the redundancy of the networks
> between the buildings instead. have multiple paths between the buildings to
> prevent service disruption in the building that does not house the cluster.
>
> having 5 mons is quite a lot. i think most clusters have 3 mons up into
> several hundred osd hosts
>
> how many servers are your osd's split over ? keep in mind that ceph's
> default picks one osd from each host. so you would need minimum 4 osd hosts
> in total to be able to use 4+2 pools and with only 4 hosts you have no
> failuredomain.  but 4 hosts in the minimum sane starting point for a
> regular small cluster with 3+2 pools  (you can loose a node and ceph
> selfheals as long as there are enough freespace.
>
> kind regards
> Ronny Aasen
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] split brain case

2018-04-02 Thread Donny Davis

Do you only have one OSD per server? Not that is really matters... because
all of the above stated is true in any case.
Just curious

On Mon, Apr 2, 2018 at 6:40 PM, ST Wong (ITSC)  wrote:

> Hi,
>
>
>
> >how many servers are your osd's split over ? keep in mind that ceph's
> default picks one osd from each host. so you would need minimum 4 osd hosts
> in total to be able to use 4+2 pools and with only 4 hosts you have no
> failuredomain.  but 4 hosts in the minimum sane starting point for a
> regular small cluster with 3+2 pools  (you can loose a node and ceph
> selfheals as long as there are enough freespace.
>
>
>
> We’ll have 8 servers to split over (4 in each room).  Thanks.
>
>
>
> Best Rgds,
>
> /st wong
>
>
>
> *From:* Ronny Aasen [mailto:ronny+ceph-us...@aasen.cx]
> *Sent:* Friday, March 30, 2018 3:18 AM
> *To:* ST Wong (ITSC); ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] split brain case
>
>
>
> On 29.03.2018 11:13, ST Wong (ITSC) wrote:
>
> Hi,
>
>
>
> Thanks.
>
>
>
> > ofcourse the 4 osd's left working now want to selfheal by recreating
> all objects stored on the 4 split off osd's and have a huge recovery job.
> and you may risk that the osd's goes into too_full error, unless you have
> free space in your osd's to recreate all the data in the defective part of
> the cluster. or they will be stuck in recovery mode until you get the
> second room running, this depends on your crush map.
>
>
>
> Means we’ve to made 4 OSD machines sufficient space to hold all data and
> thus the usable space will be halved?
>
>
> yes if you want to be able to be able to operatate one room as if it was
> the whole cluster (HA) then you need this.
> also if you want to have 4+2 instead of 3+2 pool size to avoid the
> blocking during recovery, that would take a whole lot of ekstra space.
> you can optionally let the cluster run degraded with 4+2 while one room is
> down. or temporary set pools to 2+2 while the other room is down, to reduce
> the space requirements.
>
>
>
>
>
> > point in that slitting the cluster hurts. and if HA is the most
> important then you may  want to check out rbd mirror.
>
>
>
> Will consider when there is budget to setup another ceph cluster for rdb
> mirror.
>
>
> i do not know your needs or applications, but while you only have 2 rooms
> you may just think of it as a single cluster that just happen to occupy 2
> rooms.  but with that few osd's you should perhaps just put the cluster in
> a single  room
> the pain of splitting a cluster down the middle is quite significant. and
> i would perhaps use resources to improve the redundancy of the networks
> between the buildings instead. have multiple paths between the buildings to
> prevent service disruption in the building that does not house the cluster.
>
> having 5 mons is quite a lot. i think most clusters have 3 mons up into
> several hundred osd hosts
>
> how many servers are your osd's split over ? keep in mind that ceph's
> default picks one osd from each host. so you would need minimum 4 osd hosts
> in total to be able to use 4+2 pools and with only 4 hosts you have no
> failuredomain.  but 4 hosts in the minimum sane starting point for a
> regular small cluster with 3+2 pools  (you can loose a node and ceph
> selfheals as long as there are enough freespace.
>
> kind regards
> Ronny Aasen
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-01 Thread Donny Davis

I wonder when EMC/Netapp are going to start giving away production ready
bits that fit into your architecture

At least support for this feature is coming in the near term.

I say keep on keepin on. Kudos to the ceph team (and maybe more teams) for
taking care of the hard stuff for us.




On Thu, Mar 1, 2018 at 9:42 AM, Samuel Soulard 
wrote:

> Hi Jason,
>
> That's awesome.  Keep up the good work guys, we all love the work you are
> doing with that software!!
>
> Sam
>
> On Mar 1, 2018 09:11, "Jason Dillaman"  wrote:
>
>> It's very high on our priority list to get a solution merged in the
>> upstream kernel. There was a proposal to use DLM to distribute the PGR
>> state between target gateways (a la the SCST target) and it's quite
>> possible that would have the least amount of upstream resistance since
>> it would work for all backends and not just RBD. We, of course, would
>> love to just use the Ceph cluster to distribute the state information
>> instead of requiring a bolt-on DLM (with its STONITH error handling),
>> but we'll take what we can get (merged).
>>
>> I believe SUSE uses a custom downstream kernel that stores the PGR
>> state in the Ceph cluster but requires two round-trips to the cluster
>> for each IO (first to verify the PGR state and the second to perform
>> the IO). The PetaSAN project is built on top of these custom kernel
>> patches as well, I believe.
>>
>> On Thu, Mar 1, 2018 at 8:50 AM, Samuel Soulard 
>> wrote:
>> > On another note, is there any work being done for persistent group
>> > reservations support for Ceph/LIO compatibility? Or just a rough
>> estimate :)
>> >
>> > Would love to see Redhat/Ceph support this type of setup.  I know Suse
>> > supports it as of late.
>> >
>> > Sam
>> >
>> > On Mar 1, 2018 07:33, "Kai Wagner"  wrote:
>> >>
>> >> I totally understand and see your frustration here, but you've to keep
>> >> in mind that this is an Open Source project with a lots of volunteers.
>> >> If you have a really urgent need, you have the possibility to develop
>> >> such a feature on your own or you've to buy someone who could do the
>> >> work for you.
>> >>
>> >> It's a long journey but it seems like it finally comes to an end.
>> >>
>> >>
>> >> On 03/01/2018 01:26 PM, Max Cuttins wrote:
>> >> > It's obvious that Citrix in not anymore belivable.
>> >> > However, at least Ceph should have added iSCSI to it's platform
>> during
>> >> > all these years.
>> >> > Ceph is awesome, so why just don't kill all the competitors make it
>> >> > compatible even with washingmachine?
>> >>
>> >> --
>> >> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>> HRB
>> >> 21284 (AG Nürnberg)
>> >>
>> >>
>> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph.conf not found

2018-01-04 Thread Donny Davis

Change the name of the cluster to ceph and create /etc/ceph/ceph.conf

On Thu, Jan 4, 2018 at 6:31 PM, Nathan Dehnel  wrote:

> Hey, I get this error:
>
> gentooserver ~ # ceph -s
> 2018-01-04 14:38:35.390154 7f0a6bae8700 -1 Errors while parsing config
> file!
> 2018-01-04 14:38:35.390157 7f0a6bae8700 -1 parse_file: cannot open
> /etc/ceph/ceph.conf: (2) No such file or directory
> 2018-01-04 14:38:35.390158 7f0a6bae8700 -1 parse_file: cannot open
> ~/.ceph/ceph.conf: (2) No such file or directory
> 2018-01-04 14:38:35.390158 7f0a6bae8700 -1 parse_file: cannot open
> ceph.conf: (2) No such file or directory
> Error initializing cluster client: ObjectNotFound('error calling
> conf_read_file',)
>
> I don't have a ceph.conf because my cluster name is "home" and the guide
> at http://docs.ceph.com/docs/master/install/manual-deployment/ says to
> name the configuration file after the cluster name. What should I do to
> resolve this?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare

2017-12-09 Thread Donny Davis

What I am getting at is that instead of sinking a bunch of time into this
bandaid, why not sink that time into a hypervisor migration. Seems well
timed if you ask me.

There are even tools to make that migration easier

http://libguestfs.org/virt-v2v.1.html

You should ultimately move your hypervisor instead of building a one off
case for ceph. Ceph works really well if you stay inside the box. So does
KVM. They work like Gang Buster's together.

I know that doesn't really answer your OP, but this is what I would do.

~D

On Sat, Dec 9, 2017 at 7:56 PM Brady Deetz <bde...@gmail.com> wrote:

> We have over 150 VMs running in vmware. We also have 2PB of Ceph for
> filesystem. With our vmware storage aging and not providing the IOPs we
> need, we are considering and hoping to use ceph. Ultimately, yes we will
> move to KVM, but in the short term, we probably need to stay on VMware.
> On Dec 9, 2017 6:26 PM, "Donny Davis" <do...@fortnebula.com> wrote:
>
>> Just curious but why not just use a hypervisor with rbd support? Are
>> there VMware specific features you are reliant on?
>>
>> On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com> wrote:
>>
>>> I'm testing using RBD as VMWare datastores. I'm currently testing with
>>> krbd+LVM on a tgt target hosted on a hypervisor.
>>>
>>> My Ceph cluster is HDD backed.
>>>
>>> In order to help with write latency, I added an SSD drive to my
>>> hypervisor and made it a writeback cache for the rbd via LVM. So far I've
>>> managed to smooth out my 4k write latency and have some pleasing results.
>>>
>>> Architecturally, my current plan is to deploy an iSCSI gateway on each
>>> hypervisor hosting that hypervisor's own datastore.
>>>
>>> Does anybody have any experience with this kind of configuration,
>>> especially with regard to LVM writeback caching combined with RBD?
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare

2017-12-09 Thread Donny Davis

Just curious but why not just use a hypervisor with rbd support? Are there
VMware specific features you are reliant on?

On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz  wrote:

> I'm testing using RBD as VMWare datastores. I'm currently testing with
> krbd+LVM on a tgt target hosted on a hypervisor.
>
> My Ceph cluster is HDD backed.
>
> In order to help with write latency, I added an SSD drive to my hypervisor
> and made it a writeback cache for the rbd via LVM. So far I've managed to
> smooth out my 4k write latency and have some pleasing results.
>
> Architecturally, my current plan is to deploy an iSCSI gateway on each
> hypervisor hosting that hypervisor's own datastore.
>
> Does anybody have any experience with this kind of configuration,
> especially with regard to LVM writeback caching combined with RBD?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs Hadoop Plugin and CEPH integration

2017-11-27 Thread Donny Davis

Why not use swift. The intergration has been around for a while, and may be
a better fit.

https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html

On Mon, Nov 27, 2017 at 12:55 PM, Aristeu Gil Alves Jr  wrote:

> Hi.
>
> It's my first post on the list. First of all I have to say I'm new on
> hadoop.
>
> We are here a small lab and we have being running cephfs for almost two
> years, loading it with large files (4GB to 4TB in size). Our cluster is
> with approximately with 400TB with ~75% of usage, and we are planning to
> grow a lot.
>
> Until now, we did process most of the files the "serial reading" way. But
> now we will try to implement a parallel process on this files and we are
> looking on the hadoop plugin as a solution for using mapreduce, or
> something like that.
>
> Does the hadoop plugin access cephfs over the network as a normal cluster
> or I can install the hadoop's processors on every ceph node and process the
> data locally?
>
>
> Thanks and regards,
>
> --
> Aristeu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-27 Thread Donny Davis

Also what tuned profile are you using? There is something to be gained by
using a matching tuned profile for your workload.

On Mon, Nov 27, 2017 at 11:16 AM, Donny Davis <do...@fortnebula.com> wrote:

> Why not ask Red Hat? All the rest of the storage vendors you are looking
> at are not free.
>
> Full disclosure, I am an employee at Red Hat.
>
> On Mon, Nov 27, 2017 at 10:16 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
>> Of *German Anders
>> *Sent:* 27 November 2017 14:44
>> *To:* Maged Mokhtar <mmokh...@petasan.org>
>> *Cc:* ceph-users <ceph-users@lists.ceph.com>
>> *Subject:* Re: [ceph-users] ceph all-nvme mysql performance tuning
>>
>>
>>
>> Hi Maged,
>>
>>
>>
>> Thanks a lot for the response. We try with different number of threads
>> and we're getting almost the same kind of difference between the storage
>> types. Going to try with different rbd stripe size, object size values and
>> see if we get more competitive numbers. Will get back with more tests and
>> param changes to see if we get better :)
>>
>>
>>
>>
>>
>> Just to echo a couple of comments. Ceph will always struggle to match the
>> performance of a traditional array for mainly 2 reasons.
>>
>>
>>
>>1. You are replacing some sort of dual ported SAS or internally RDMA
>>connected device with a network for Ceph replication traffic. This will
>>instantly have a large impact on write latency
>>2. Ceph locks at the PG level and a PG will most likely cover at
>>least one 4MB object, so lots of small accesses to the same blocks (on a
>>block device) will wait on each other and go effectively at a single
>>threaded rate.
>>
>>
>>
>> The best thing you can do to mitigate these, is to run the fastest
>> journal/WAL devices you can, fastest network connections (ie 25Gb/s) and
>> run your CPU’s at max C and P states.
>>
>>
>>
>> You stated that you are running the performance profile on the CPU’s.
>> Could you also just double check that the C-states are being held at C1(e)?
>> There are a few utilities that can show this in realtime.
>>
>>
>>
>> Other than that, although there could be some minor tweaks, you are
>> probably nearing the limit of what you can hope to achieve.
>>
>>
>>
>> Nick
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Best,
>>
>>
>> *German*
>>
>>
>>
>> 2017-11-27 11:36 GMT-03:00 Maged Mokhtar <mmokh...@petasan.org>:
>>
>> On 2017-11-27 15:02, German Anders wrote:
>>
>> Hi All,
>>
>>
>>
>> I've a performance question, we recently install a brand new Ceph cluster
>> with all-nvme disks, using ceph version 12.2.0 with bluestore configured.
>> The back-end of the cluster is using a bond IPoIB (active/passive) , and
>> for the front-end we are using a bonding config with active/active (20GbE)
>> to communicate with the clients.
>>
>>
>>
>> The cluster configuration is the following:
>>
>>
>>
>> *MON Nodes:*
>>
>> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>>
>> 3x 1U servers:
>>
>>   2x Intel Xeon E5-2630v4 @2.2Ghz
>>
>>   128G RAM
>>
>>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>>
>>   2x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>>
>>
>>
>> *OSD Nodes:*
>>
>> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>>
>> 4x 2U servers:
>>
>>   2x Intel Xeon E5-2640v4 @2.4Ghz
>>
>>   128G RAM
>>
>>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>>
>>   1x Ethernet Controller 10G X550T
>>
>>   1x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>>
>>   12x Intel SSD DC P3520 1.2T (NVMe) for OSD daemons
>>
>>   1x Mellanox ConnectX-3 InfiniBand FDR 56Gb/s Adapter (dual port)
>>
>>
>>
>>
>>
>> Here's the tree:
>>
>>
>>
>> ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
>>
>> -7   48.0 root root
>>
>> -5   24.0 rack rack1
>>
>> -1   12.0 node cpn01
>>
>>  0  nvme  1.0 osd.0  up  1.0 1.0
>>
>>  1  nvme  1.0 osd.1  up  1.0 1.0
>>
>>  2  nvme  1.0 osd.2  up  1.0 1.0
>>
>>  3  nvme  1.0

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-27 Thread Donny Davis

Why not ask Red Hat? All the rest of the storage vendors you are looking at
are not free.

Full disclosure, I am an employee at Red Hat.

On Mon, Nov 27, 2017 at 10:16 AM, Nick Fisk  wrote:

> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *German Anders
> *Sent:* 27 November 2017 14:44
> *To:* Maged Mokhtar 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] ceph all-nvme mysql performance tuning
>
>
>
> Hi Maged,
>
>
>
> Thanks a lot for the response. We try with different number of threads and
> we're getting almost the same kind of difference between the storage types.
> Going to try with different rbd stripe size, object size values and see if
> we get more competitive numbers. Will get back with more tests and param
> changes to see if we get better :)
>
>
>
>
>
> Just to echo a couple of comments. Ceph will always struggle to match the
> performance of a traditional array for mainly 2 reasons.
>
>
>
>1. You are replacing some sort of dual ported SAS or internally RDMA
>connected device with a network for Ceph replication traffic. This will
>instantly have a large impact on write latency
>2. Ceph locks at the PG level and a PG will most likely cover at least
>one 4MB object, so lots of small accesses to the same blocks (on a block
>device) will wait on each other and go effectively at a single threaded
>rate.
>
>
>
> The best thing you can do to mitigate these, is to run the fastest
> journal/WAL devices you can, fastest network connections (ie 25Gb/s) and
> run your CPU’s at max C and P states.
>
>
>
> You stated that you are running the performance profile on the CPU’s.
> Could you also just double check that the C-states are being held at C1(e)?
> There are a few utilities that can show this in realtime.
>
>
>
> Other than that, although there could be some minor tweaks, you are
> probably nearing the limit of what you can hope to achieve.
>
>
>
> Nick
>
>
>
>
>
> Thanks,
>
>
>
> Best,
>
>
> *German*
>
>
>
> 2017-11-27 11:36 GMT-03:00 Maged Mokhtar :
>
> On 2017-11-27 15:02, German Anders wrote:
>
> Hi All,
>
>
>
> I've a performance question, we recently install a brand new Ceph cluster
> with all-nvme disks, using ceph version 12.2.0 with bluestore configured.
> The back-end of the cluster is using a bond IPoIB (active/passive) , and
> for the front-end we are using a bonding config with active/active (20GbE)
> to communicate with the clients.
>
>
>
> The cluster configuration is the following:
>
>
>
> *MON Nodes:*
>
> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>
> 3x 1U servers:
>
>   2x Intel Xeon E5-2630v4 @2.2Ghz
>
>   128G RAM
>
>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>
>   2x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>
>
>
> *OSD Nodes:*
>
> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>
> 4x 2U servers:
>
>   2x Intel Xeon E5-2640v4 @2.4Ghz
>
>   128G RAM
>
>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>
>   1x Ethernet Controller 10G X550T
>
>   1x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>
>   12x Intel SSD DC P3520 1.2T (NVMe) for OSD daemons
>
>   1x Mellanox ConnectX-3 InfiniBand FDR 56Gb/s Adapter (dual port)
>
>
>
>
>
> Here's the tree:
>
>
>
> ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
>
> -7   48.0 root root
>
> -5   24.0 rack rack1
>
> -1   12.0 node cpn01
>
>  0  nvme  1.0 osd.0  up  1.0 1.0
>
>  1  nvme  1.0 osd.1  up  1.0 1.0
>
>  2  nvme  1.0 osd.2  up  1.0 1.0
>
>  3  nvme  1.0 osd.3  up  1.0 1.0
>
>  4  nvme  1.0 osd.4  up  1.0 1.0
>
>  5  nvme  1.0 osd.5  up  1.0 1.0
>
>  6  nvme  1.0 osd.6  up  1.0 1.0
>
>  7  nvme  1.0 osd.7  up  1.0 1.0
>
>  8  nvme  1.0 osd.8  up  1.0 1.0
>
>  9  nvme  1.0 osd.9  up  1.0 1.0
>
> 10  nvme  1.0 osd.10 up  1.0 1.0
>
> 11  nvme  1.0 osd.11 up  1.0 1.0
>
> -3   12.0 node cpn03
>
> 24  nvme  1.0 osd.24 up  1.0 1.0
>
> 25  nvme  1.0 osd.25 up  1.0 1.0
>
> 26  nvme  1.0 osd.26 up  1.0 1.0
>
> 27  nvme  1.0 osd.27 up  1.0 1.0
>
> 28  nvme  1.0 osd.28 up  1.0 1.0
>
> 29  nvme  1.0 osd.29 up  1.0 1.0
>
> 30  nvme  1.0 osd.30 up  1.0 1.0
>
> 31  nvme  1.0 osd.31 up  1.0 1.0
>
> 32  nvme  1.0 osd.32 up  1.0 1.0
>
> 33  nvme  1.0 osd.33 up  1.0 1.0
>
> 34  nvme  1.0 osd.34 up  1.0 1.0
>

Re: [ceph-users] How's cephfs going?

2017-07-19 Thread Donny Davis

I had a corruption issue with the FUSE client on Jewel. I use CephFS for a
samba share with a light load, and I was using the FUSE client. I had a
power flap and didn't realize my UPS batteries had went bad so the MDS
servers were cycled a couple times and some how the file system had become
corrupted. I moved to the kernel client and after the FUSE experience I put
it through horrible things.

I had every client connected start copying over their user profiles, and
then I started pulling and restarting MDS servers. I saw very few errors,
and only blips in the copy processes. My experience with the kernel client
has been very positive and I would say stable. Nothing replaces a solid
backup copy of your data if you care about it.

I am still currently on Jewel, and my CephFS is daily driven and I can
barely notice that difference between it and the past setups I have had.



On Wed, Jul 19, 2017 at 7:02 AM, Дмитрий Глушенок  wrote:

> Unfortunately no. Using FUSE was discarded due to poor performance.
>
> 19 июля 2017 г., в 13:45, Blair Bethwaite 
> написал(а):
>
> Interesting. Any FUSE client data-points?
>
> On 19 July 2017 at 20:21, Дмитрий Глушенок  wrote:
>
> RBD (via krbd) was in action at the same time - no problems.
>
> 19 июля 2017 г., в 12:54, Blair Bethwaite 
> написал(а):
>
> It would be worthwhile repeating the first test (crashing/killing an
> OSD host) again with just plain rados clients (e.g. rados bench)
> and/or rbd. It's not clear whether your issue is specifically related
> to CephFS or actually something else.
>
> Cheers,
>
> On 19 July 2017 at 19:32, Дмитрий Глушенок  wrote:
>
> Hi,
>
> I can share negative test results (on Jewel 10.2.6). All tests were
> performed while actively writing to CephFS from single client (about 1300
> MB/sec). Cluster consists of 8 nodes, 8 OSD each (2 SSD for journals and
> metadata, 6 HDD RAID6 for data), MON/MDS are on dedicated nodes. 2 MDS at
> all, active/standby.
> - Crashing one node resulted in write hangs for 17 minutes. Repeating the
> test resulted in CephFS hangs forever.
> - Restarting active MDS resulted in successful failover to standby. Then,
> after standby became active and the restarted MDS became standby the new
> active was restarted. CephFS hanged for 12 minutes.
>
> P.S. Planning to repeat the tests again on 10.2.7 or higher
>
> 19 июля 2017 г., в 6:47, 许雪寒  написал(а):
>
> Is there anyone else willing to share some usage information of cephfs?
> Could developers tell whether cephfs is a major effort in the whole ceph
> development?
>
> 发件人: 许雪寒
> 发送时间: 2017年7月17日 11:00
> 收件人: ceph-users@lists.ceph.com
> 主题: How's cephfs going?
>
> Hi, everyone.
>
> We intend to use cephfs of Jewel version, however, we don’t know its
> status.
> Is it production ready in Jewel? Does it still have lots of bugs? Is it a
> major effort of the current ceph development? And who are using cephfs now?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Dmitry Glushenok
> Jet Infosystems
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Cheers,
> ~Blairo
>
>
> --
> Dmitry Glushenok
> Jet Infosystems
>
>
>
>
> --
> Cheers,
> ~Blairo
>
>
> --
> Dmitry Glushenok
> Jet Infosystems
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW Swift public links

2017-06-30 Thread Donny Davis

I am trying to get radosgw to act like swift does when its provisioned with
Openstack. I did end up figuring out how to make it work properly for my
use case, and it would be helpful if it was documented somewhere.

For anyone curious how to make radosgw function more similar to the way
swift does, its perfectly capable...but not well documented. Here is my
config, and it works as expected.

[client.rgw.gateway]
host = gateway
rgw_keystone_url = http://127.0.0.1:5000/
rgw_keystone_api_version = 3
rgw_keystone_admin_user = ceph
rgw_keystone_admin_password = supersecretpassword
rgw_keystone_admin_project = services
rgw_keystone_admin_domain = default
rgw_keystone_accepted_roles = admin, _member_
rgw_keystone_token_cache_size = 100
rgw_keystone_revocation_interval = 300
rgw keystone implicit tenants = false  IMPORTANT
rgw_keystone_make_new_tenants = true
rgw_s3_auth_use_keystone = true
rgw_keystone_verify_ssl = false
rgw_dns_name = cloud.example.com
rgw_swift_account_in_url = true ###IMPORTANT



On Thu, Jun 29, 2017 at 9:34 PM, David Turner <drakonst...@gmail.com> wrote:

> Are you accessing the bucket from a URL that is not configured as an
> endpoint in your zone?  I bet if you looked at the log you would see that
> the bucket that doesn't exist is the URL that you are using to access it.
>
> On Thu, Jun 29, 2017, 9:07 PM Donny Davis <do...@fortnebula.com> wrote:
>
>> I have swift working well with keystone authentication, and I can upload
>> and download files. However when I make a link public, I get nosuchbucket,
>> and I have no idea what url to find the buckets at.
>>
>> When I list the buckets with radosgw-admin bucket list i get back some
>> tenant url+ the bucket.
>>
>> I am lost on where to start with this. I just want to share files with
>> rgw and openstack
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW Swift public links

2017-06-29 Thread Donny Davis

I have swift working well with keystone authentication, and I can upload
and download files. However when I make a link public, I get nosuchbucket,
and I have no idea what url to find the buckets at.

When I list the buckets with radosgw-admin bucket list i get back some
tenant url+ the bucket.

I am lost on where to start with this. I just want to share files with rgw
and openstack
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] chooseleaf updates

2017-04-22 Thread Donny Davis

Just in case anyone was curious as to how amazing ceph actually is, I did
the migration to ceph seamlessly. I was able to bring the other two nodes
into the cluster, and then turn on replication between them without a
hitch. And with zero downtime.  Just incredible software.

On Thu, Apr 20, 2017 at 3:50 AM, Loic Dachary <l...@dachary.org> wrote:

>
>
> On 04/20/2017 02:25 AM, Donny Davis wrote:
> > In reading the docs, I am curious if I can change the chooseleaf
> parameter as my cluster expands. I currently only have one node and used
> this parameter in ceph.conf
> >
> > osd crush chooseleaf type = 0
> >
> > Can this be changed after I expand nodes. The other two nodes are
> currently on gluster, but moving to ceph this weekend.
>
> Yes, it can be changed :-)
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] chooseleaf updates

2017-04-19 Thread Donny Davis

In reading the docs, I am curious if I can change the chooseleaf parameter
as my cluster expands. I currently only have one node and used this
parameter in ceph.conf

osd crush chooseleaf type = 0

Can this be changed after I expand nodes. The other two nodes are currently
on gluster, but moving to ceph this weekend.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-13 Thread Donny Davis

I am having the same issue. When I looked at my idle cluster this morning,
one of the nodes had 400% cpu utilization, and ceph-mgr was 300% of that.
I have 3 AIO nodes, and only one of them seemed to be affected.

On Sat, Jan 14, 2017 at 12:18 AM, Brad Hubbard  wrote:

> Want to install debuginfo packages and use something like this to try
> and find out where it is spending most of its time?
>
> https://poormansprofiler.org/
>
> Note that you may need to do multiple runs to get a "feel" for where
> it is spending most of its time. Also not that likely only one or two
> threads will be using the CPU (you can see this in ps output using a
> command like the following) the rest will likely be idle or waiting
> for something.
>
> # ps axHo %cpu,stat,pid,tid,pgid,ppid,comm,wchan
>
> Observation of these two and maybe a couple of manual gstack dumps
> like this to compare thread ids to ps output (LWP is the thread id
> (tid) in gdb output) should give us some idea of where it is spinning.
>
> # gstack $(pidof ceph-mgr)
>
>
> On Sat, Jan 14, 2017 at 9:54 AM, Robert Longstaff
>  wrote:
> > FYI, I'm seeing this as well on the latest Kraken 11.1.1 RPMs on CentOS
> 7 w/
> > elrepo kernel 4.8.10. ceph-mgr is currently tearing through CPU and has
> > allocated ~11GB of RAM after a single day of usage. Only the active
> manager
> > is performing this way. The growth is linear and reproducible.
> >
> > The cluster is mostly idle; 3 mons (4 CPU, 16GB), 20 heads with 45x8TB
> OSDs
> > each.
> >
> >
> > top - 23:45:47 up 1 day,  1:32,  1 user,  load average: 3.56, 3.94, 4.21
> >
> > Tasks: 178 total,   1 running, 177 sleeping,   0 stopped,   0 zombie
> >
> > %Cpu(s): 33.9 us, 28.1 sy,  0.0 ni, 37.3 id,  0.0 wa,  0.0 hi,  0.7 si,
> 0.0
> > st
> >
> > KiB Mem : 16423844 total,  3980500 free, 11556532 used,   886812
> buff/cache
> >
> > KiB Swap:  2097148 total,  2097148 free,0 used.  4836772 avail
> Mem
> >
> >
> >   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
> >
> >  2351 ceph  20   0 12.160g 0.010t  17380 S 203.7 64.8   2094:27
> ceph-mgr
> >
> >  2302 ceph  20   0  620316 267992 157620 S   2.3  1.6  65:11.50
> ceph-mon
> >
> >
> > On Wed, Jan 11, 2017 at 12:00 PM, Stillwell, Bryan J
> >  wrote:
> >>
> >> John,
> >>
> >> This morning I compared the logs from yesterday and I show a noticeable
> >> increase in messages like these:
> >>
> >> 2017-01-11 09:00:03.032521 7f70f15c1700 10 mgr handle_mgr_digest 575
> >> 2017-01-11 09:00:03.032523 7f70f15c1700 10 mgr handle_mgr_digest 441
> >> 2017-01-11 09:00:03.032529 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all mon_status
> >> 2017-01-11 09:00:03.032532 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all health
> >> 2017-01-11 09:00:03.032534 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all pg_summary
> >> 2017-01-11 09:00:03.033613 7f70f15c1700  4 mgr ms_dispatch active
> >> mgrdigest v1
> >> 2017-01-11 09:00:03.033618 7f70f15c1700 -1 mgr ms_dispatch mgrdigest v1
> >> 2017-01-11 09:00:03.033620 7f70f15c1700 10 mgr handle_mgr_digest 575
> >> 2017-01-11 09:00:03.033622 7f70f15c1700 10 mgr handle_mgr_digest 441
> >> 2017-01-11 09:00:03.033628 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all mon_status
> >> 2017-01-11 09:00:03.033631 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all health
> >> 2017-01-11 09:00:03.033633 7f70f15c1700 10 mgr notify_all notify_all:
> >> notify_all pg_summary
> >> 2017-01-11 09:00:03.532898 7f70f15c1700  4 mgr ms_dispatch active
> >> mgrdigest v1
> >> 2017-01-11 09:00:03.532945 7f70f15c1700 -1 mgr ms_dispatch mgrdigest v1
> >>
> >>
> >> In a 1 minute period yesterday I saw 84 times this group of messages
> >> showed up.  Today that same group of messages showed up 156 times.
> >>
> >> Other than that I did see an increase in this messages from 9 times a
> >> minute to 14 times a minute:
> >>
> >> 2017-01-11 09:00:00.402000 7f70f3d61700  0 -- 172.24.88.207:6800/4104
> >> -
> >> conn(0x563c9ee89000 :6800 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
> >> l=0).fault with nothing to send and in the half  accept state just
> closed
> >>
> >> Let me know if you need anything else.
> >>
> >> Bryan
> >>
> >>
> >> On 1/10/17, 10:00 AM, "ceph-users on behalf of Stillwell, Bryan J"
> >>  >> bryan.stillw...@charter.com> wrote:
> >>
> >> >On 1/10/17, 5:35 AM, "John Spray"  wrote:
> >> >
> >> >>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
> >> >> wrote:
> >> >>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
> >> >>> single node, two OSD cluster, and after a while I noticed that the
> new
> >> >>> ceph-mgr daemon is frequently using a lot of the CPU:
> >> >>>
> >> >>> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
> >> >>> ceph-mgr
> >> >>>
> >>

Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects

2017-01-16 Thread Donny Davis

give this a try

ceph osd set noout

On Jan 16, 2017 9:08 AM, "Stéphane Klein" 
wrote:

> I see my mistake:
>
> ```
>  osdmap e57: 2 osds: 1 up, 1 in; 64 remapped pgs
> flags sortbitwise,require_jewel_osds
> ```
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph replication factor of 2

Re: [ceph-users] split brain case

Re: [ceph-users] ceph-fuse segfaults

Re: [ceph-users] split brain case

Re: [ceph-users] split brain case

Re: [ceph-users] Ceph iSCSI is a prank?

Re: [ceph-users] ceph.conf not found

Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare

Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare

Re: [ceph-users] Cephfs Hadoop Plugin and CEPH integration

Re: [ceph-users] ceph all-nvme mysql performance tuning

Re: [ceph-users] ceph all-nvme mysql performance tuning

Re: [ceph-users] How's cephfs going?

Re: [ceph-users] RadosGW Swift public links

[ceph-users] RadosGW Swift public links

Re: [ceph-users] chooseleaf updates

[ceph-users] chooseleaf updates

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

Re: [ceph-users] How to fix: HEALTH_ERR 45 pgs are stuck inactive for more than 300 seconds; 19 pgs degraded; 45 pgs stuck inactive; 19 pgs stuck unclean; 19 pgs undersized; recovery 2514/5028 objects

19 matches

Site Navigation

Mail list logo

Footer information