from:"Shinobu"

Re: [ceph-users] Random checksum errors (bluestore on Luminous)

2017-12-10 Thread Shinobu Kinjo

Can you open a ticket with exact version of your ceph cluster?

http://tracker.ceph.com

Thanks,

On Sun, Dec 10, 2017 at 10:34 PM, Martin Preuss  wrote:
> Hi,
>
> I'm new to Ceph. I started a ceph cluster from scratch on DEbian 9,
> consisting of 3 hosts, each host has 3-4 OSDs (using 4TB hdds, currently
> totalling 10 hdds).
>
> Right from the start I always received random scrub errors telling me
> that some checksums didn't match the expected value, fixable with "ceph
> pg repair".
>
> I looked at the ceph-osd logfiles on each of the hosts and compared with
> the corresponding syslogs. I never found any hardware error, so there
> was no problem reading or writing a sector hardware-wise. Also there was
> never any other suspicious syslog entry around the time of checksum
> error reporting.
>
> When I looked at the checksum error entries I found that the reported
> bad checksum always was "0x6706be76".
>
> Could someone please tell me where to look further for the source of the
> problem?
>
> I appended an excerpt of the osd logs.
>
>
> Kind regards
> Martin
>
>
> --
> "Things are only impossible until they're not"
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephx

2017-10-13 Thread Shinobu Kinjo

On Fri, Oct 13, 2017 at 3:29 PM, Ashley Merrick <ash...@amerrick.co.uk> wrote:
> Hello,
>
>
> Is it possible to limit a cephx user to one image?
>
>
> I have looked and seems it's possible per a pool, but can't find a per image
> option.

What did you look at?

Best regards,
Shinobu Kinjo

>
>
> ,Ashley
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

2017-09-28 Thread Shinobu Kinjo

So the problem you faced has been completely solved?

On Thu, Sep 28, 2017 at 7:51 PM, Richard Hesketh
 wrote:
> On 27/09/17 19:35, John Spray wrote:
>> On Wed, Sep 27, 2017 at 1:18 PM, Richard Hesketh
>>  wrote:
>>> On 27/09/17 12:32, John Spray wrote:
 On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
  wrote:
> As the subject says... any ceph fs administrative command I try to run 
> hangs forever and kills monitors in the background - sometimes they come 
> back, on a couple of occasions I had to manually stop/restart a suffering 
> mon. Trying to load the filesystem tab in the ceph-mgr dashboard dumps an 
> error and can also kill a monitor. However, clients can mount the 
> filesystem and read/write data without issue.
>
> Relevant excerpt from logs on an affected monitor, just trying to run 
> 'ceph fs ls':
>
> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 
> handle_command mon_command({"prefix": "fs ls"} v 0) v1
> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : 
> from='client.? 10.10.10.1:0/2771553898' entity='client.admin' 
> cmd=[{"prefix": "fs ls"}]: dispatch
> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 
> /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& 
> OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 
> 13:20:50.727676
> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != 
> pool_name.end())
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous 
> (rc)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x55a8ca0bb642]
>  2: (()+0x48165f) [0x55a8c9f4165f]
>  3: 
> (MDSMonitor::preprocess_command(boost::intrusive_ptr)+0x1d18)
>  [0x55a8ca047688]
>  4: 
> (MDSMonitor::preprocess_query(boost::intrusive_ptr)+0x2a8) 
> [0x55a8ca048008]
>  5: (PaxosService::dispatch(boost::intrusive_ptr)+0x700) 
> [0x55a8c9f9d1b0]
>  6: (Monitor::handle_command(boost::intrusive_ptr)+0x1f93) 
> [0x55a8c9e63193]
>  7: (Monitor::dispatch_op(boost::intrusive_ptr)+0xa0e) 
> [0x55a8c9e6a52e]
>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>  13: (clone()+0x6d) [0x7fc869bd63dd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed 
> to interpret this.
>
> I'm running Luminous. The cluster and FS have been in service since 
> Hammer and have default data/metadata pool names. I discovered the issue 
> after attempting to enable directory sharding.

 Well that's not good...

 The assertion is because your FSMap is referring to a pool that
 apparently no longer exists in the OSDMap.  This should be impossible
 in current Ceph (we forbid removing pools if they're in use), but
 could perhaps have been caused in an earlier version of Ceph when it
 was possible to remove a pool even if CephFS was referring to it?

 Alternatively, perhaps something more severe is going on that's
 causing your mons to see a wrong/inconsistent view of the world.  Has
 the cluster ever been through any traumatic disaster recovery type
 activity involving hand-editing any of the cluster maps?  What
 intermediate versions has it passed through on the way from Hammer to
 Luminous?

 Opened a ticket here: http://tracker.ceph.com/issues/21568

 John
>>>
>>> I've reviewed my notes (i.e. I've grepped my IRC logs); I actually 
>>> inherited this cluster from a colleague who left shortly after I joined, so 
>>> unfortunately there is some of its history I cannot fill in.
>>>
>>> Turns out the cluster actually predates Firefly. Looking at dates my 
>>> suspicion is that it went Emperor -> Firefly -> Giant -> Hammer. I 
>>> inherited it at Hammer, and took it Hammer -> Infernalis -> Jewel -> 
>>> Luminous myself. I know I did make sure to do the tmap_upgrade step on 
>>> cephfs but can't remember if I did it at Infernalis or Jewel.
>>>
>>> Infernalis was a tricky upgrade; the attempt was aborted once after the 
>>> first set of OSDs didn't come back up after upgrade (had to 
>>> remove/downgrade and readd), and setting sortbitwise as the documentation 
>>> suggested after a successful second attempt caused everything to break and 
>>> degrade slowly until it was unset and recovered. Never had disaster 
>>> recovery involve mucking around with the pools while I was administrating 
>>> it, but unfortunately I cannot speak for the cluster's pre-Hammer history. 
>>> The only pools I have

Re: [ceph-users] Ceph Developers Monthly - October

2017-09-28 Thread Shinobu Kinjo

Are we going to have next CDM in an APAC friendly time slot again?



On Thu, Sep 28, 2017 at 12:08 PM, Leonardo Vaz  wrote:
> Hey Cephers,
>
> This is just a friendly reminder that the next Ceph Developer Montly
> meeting is coming up:
>
>  http://wiki.ceph.com/Planning
>
> If you have work that you're doing that it a feature work, significant
> backports, or anything you would like to discuss with the core team,
> please add it to the following page:
>
>  http://wiki.ceph.com/CDM_04-OCT-2017
>
> If you have questions or comments, please let us know.
>
> Kindest regards,
>
> Leo
>
> --
> Leonardo Vaz
> Ceph Community Manager
> Open Source and Standards Team
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

2017-09-27 Thread Shinobu Kinjo

Just for clarification.
Did you upgrade your cluster from Hammer to Luminous, then hit an assertion?

On Wed, Sep 27, 2017 at 8:15 PM, Richard Hesketh
 wrote:
> As the subject says... any ceph fs administrative command I try to run hangs 
> forever and kills monitors in the background - sometimes they come back, on a 
> couple of occasions I had to manually stop/restart a suffering mon. Trying to 
> load the filesystem tab in the ceph-mgr dashboard dumps an error and can also 
> kill a monitor. However, clients can mount the filesystem and read/write data 
> without issue.
>
> Relevant excerpt from logs on an affected monitor, just trying to run 'ceph 
> fs ls':
>
> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 
> handle_command mon_command({"prefix": "fs ls"} v 0) v1
> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : 
> from='client.? 10.10.10.1:0/2771553898' entity='client.admin' cmd=[{"prefix": 
> "fs ls"}]: dispatch
> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 
> /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& 
> OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 
> 13:20:50.727676
> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != pool_name.end())
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x55a8ca0bb642]
>  2: (()+0x48165f) [0x55a8c9f4165f]
>  3: 
> (MDSMonitor::preprocess_command(boost::intrusive_ptr)+0x1d18) 
> [0x55a8ca047688]
>  4: (MDSMonitor::preprocess_query(boost::intrusive_ptr)+0x2a8) 
> [0x55a8ca048008]
>  5: (PaxosService::dispatch(boost::intrusive_ptr)+0x700) 
> [0x55a8c9f9d1b0]
>  6: (Monitor::handle_command(boost::intrusive_ptr)+0x1f93) 
> [0x55a8c9e63193]
>  7: (Monitor::dispatch_op(boost::intrusive_ptr)+0xa0e) 
> [0x55a8c9e6a52e]
>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>  13: (clone()+0x6d) [0x7fc869bd63dd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> I'm running Luminous. The cluster and FS have been in service since Hammer 
> and have default data/metadata pool names. I discovered the issue after 
> attempting to enable directory sharding.
>
> Rich
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] which kernel version support object-map feature from rbd kernel client

2017-08-15 Thread Shinobu Kinjo

It would be much better to explain why as of today, object-map feature
is not supported by the kernel client, or document it.

On Tue, Aug 15, 2017 at 8:08 PM, Ilya Dryomov  wrote:
> On Tue, Aug 15, 2017 at 11:34 AM, moftah moftah  wrote:
>> Hi All,
>>
>> I have search everywhere for some sort of table that show kernel version to
>> what rbd image features supported and didnt find any.
>>
>> basically I am looking at latest kernels from kernel.org , and i am thinking
>> of upgrading to 4.12 since it is stable but i want to make sure i can get
>> rbd images with object-map features working with rbd.ko
>>
>> if anyone know please let me know what kernel version i have to upgrade to
>> to get that feature supported by kernel client
>
> As of today, object-map feature is not supported by the kernel client.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] chooseleaf updates

2017-04-22 Thread Shinobu Kinjo

On Sun, Apr 23, 2017 at 4:09 AM, Donny Davis  wrote:
> Just in case anyone was curious as to how amazing ceph actually is, I did
> the migration to ceph seamlessly. I was able to bring the other two nodes
> into the cluster, and then turn on replication between them without a hitch.
> And with zero downtime.  Just incredible software.

That's true.

>
> On Thu, Apr 20, 2017 at 3:50 AM, Loic Dachary  wrote:
>>
>>
>>
>> On 04/20/2017 02:25 AM, Donny Davis wrote:
>> > In reading the docs, I am curious if I can change the chooseleaf
>> > parameter as my cluster expands. I currently only have one node and used
>> > this parameter in ceph.conf
>> >
>> > osd crush chooseleaf type = 0
>> >
>> > Can this be changed after I expand nodes. The other two nodes are
>> > currently on gluster, but moving to ceph this weekend.
>>
>> Yes, it can be changed :-)
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running the Ceph Erasure Code Benhmark

2017-04-07 Thread Shinobu Kinjo

You don't need to recompile that tool. Please see
``ceph_erasure_code_benchmark -h``.

Some examples are:
https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/README#L31-L48

On Sat, Apr 8, 2017 at 8:21 AM, Henry Ngo  wrote:
> Hello,
>
> I have a 6 node cluster and I have installed Ceph on the admin node from
> source. I want to run the benchmark test on my cluster. How do I do this? If
> I type ceph_erasure_code_benchmark on the command line it gives a "
> parameter k is 0. But k needs to be > 0 ". What elese do I need to set up
> before running the command? How do I customize the size of file to be
> encoded/decoded and the iteration?
>
> Best,
> Henry N.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS: ceph-fuse segfaults

2017-04-07 Thread Shinobu Kinjo

Please open a ticket so that we track.

http://tracker.ceph.com/

Regards,

On Sat, Apr 8, 2017 at 1:40 AM, Patrick Donnelly 
wrote:

> Hello Andras,
>
> On Wed, Mar 29, 2017 at 11:07 AM, Andras Pataki
>  wrote:
> > Below is a crash we had on a few machines with the ceph-fuse client on
> the
> > latest Jewel release 10.2.6.  A total of 5 ceph-fuse processes crashed
> more
> > or less the same way at different times.  The full logs are at
> > http://voms.simonsfoundation.org:50013/9SXnEpflYPmE6UhM9EgOR3us341eqy
> m/ceph-20170328
>
> This is a reference count bug. I'm afraid it won't be possible to
> debug it without a higher debug setting (probably "debug client =
> 0/20"). Be aware that will slow down your client.
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Apply for an official mirror at CN

2017-04-05 Thread Shinobu Kinjo

Adding Patrick who might be the best person.

Regards,

On Wed, Apr 5, 2017 at 6:16 PM, Wido den Hollander  wrote:
>
>> Op 5 april 2017 om 8:14 schreef SJ Zhu :
>>
>>
>> Wido, ping?
>>
>
> This might take a while! Has to go through a few hops for this to get fixed.
>
> It's on my radar!
>
> Wido
>
>> On Sat, Apr 1, 2017 at 8:40 PM, SJ Zhu  wrote:
>> > On Sat, Apr 1, 2017 at 8:10 PM, Wido den Hollander  wrote:
>> >> Great! Very good to hear. We can CNAME cn.ceph.com to that location?
>> >
>> >
>> > Yes, please CNAME to mirrors.ustc.edu.cn, and I will set vhost in our
>> > nginx for the
>> > ceph directory.
>> >
>> > Thanks
>> >
>> > --
>> > Regards,
>> > Shengjing Zhu
>>
>>
>>
>> --
>> Regards,
>> Shengjing Zhu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Shinobu Kinjo

> I am sure I remember having to reduce min_size to 1 temporarily in the past 
> to allow recovery from having two drives irrecoverably die at the same time 
> in one of my clusters.

What was the situation that you had to do that?
Thanks for sharing your experience in advance.

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo

So description of Jewel is wrong?

http://docs.ceph.com/docs/master/releases/

On Thu, Mar 16, 2017 at 2:27 AM, John Spray <jsp...@redhat.com> wrote:
> On Wed, Mar 15, 2017 at 5:04 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>> It may be probably kind of challenge but please consider Kraken (or
>> later) because Jewel will be retired:
>>
>> http://docs.ceph.com/docs/master/releases/
>
> Nope, Jewel is LTS, Kraken is not.
>
> Kraken will only receive updates until the next stable release.  Jewel
> will receive updates for longer.
>
> John
>
>>
>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley <smi...@npr.org> wrote:
>>> No this is a production cluster that I have not had a chance to upgrade yet.
>>>
>>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>>> hope that the osd data is still in tact.
>>>
>>> Once I get things stable again I was planning on upgrading…but the upgrade
>>> is a bit intensive by the looks of it so I need to set aside a decent amount
>>> of time.
>>>
>>> Thanks all!
>>>
>>> Shain
>>>
>>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni <vakul...@redhat.com> wrote:
>>>
>>> Just curious, why you still want to deploy new hammer instead of stable
>>> jewel? Is this a test environment? the last .10 release was basically for
>>> bug fixes for 0.94.9.
>>>
>>>
>>>
>>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo <ski...@redhat.com> wrote:
>>>>
>>>> FYI:
>>>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>>>
>>>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley <smi...@npr.org> wrote:
>>>> > Hello,
>>>> > I am trying to deploy ceph to a new server using ceph-deply which I have
>>>> > done in the past many times without issue.
>>>> >
>>>> > Right now I am seeing a timeout trying to connect to git.ceph.com:
>>>> >
>>>> >
>>>> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>>>> > apt-get
>>>> > -q install --assume-yes ca-certificates
>>>> > [hqosd6][DEBUG ] Reading package lists...
>>>> > [hqosd6][DEBUG ] Building dependency tree...
>>>> > [hqosd6][DEBUG ] Reading state information...
>>>> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
>>>> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>>>> > upgraded.
>>>> > [hqosd6][INFO  ] Running command: wget -O release.asc
>>>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
>>>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>>>> > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>>>> > connected.
>>>> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>>>> > Permanently
>>>> > [hqosd6][WARNIN] Location:
>>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [following]
>>>> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
>>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>> > [hqosd6][WARNIN] Retrying.
>>>> > [hqosd6][WARNIN]
>>>> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>> > [hqosd6][WARNIN] Retrying.
>>>> > [hqosd6][WARNIN]
>>>> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>> > [hqosd6][WARNIN] Retrying.
>>>> >
>>>> >
>>>> > I am wondering if this is a known issue.
>>>> >
>>>> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
>>>> > in
>>>> > the past upgrading to a newer version I was not able to install hammer
>>>> > on
>>>> > the cluster…so the workaround was to use a slightly older version.
>>>> >
>>>> > Thanks in advance for any help you may be able to provide.
>>>> >
>>>> > Shain
>>>> >
>>>> >
>>>> > ___
>>>> > ceph-users mailing list
>>>> > ceph-users@lists.ceph.com
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo

Would you file this as a doc bug? So we discuss properly with tracking.

http://tracker.ceph.com

On Thu, Mar 16, 2017 at 2:17 AM, Deepak Naidu <dna...@nvidia.com> wrote:
>>> because Jewel will be retired:
> Hmm.  Isn't Jewel LTS ?
>
> Every other stable releases is a LTS (Long Term Stable) and will receive 
> updates until two LTS are published.
>
> --
> Deepak
>
>> On Mar 15, 2017, at 10:09 AM, Shinobu Kinjo <ski...@redhat.com> wrote:
>>
>> It may be probably kind of challenge but please consider Kraken (or
>> later) because Jewel will be retired:
>>
>> http://docs.ceph.com/docs/master/releases/
>>
>>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley <smi...@npr.org> wrote:
>>> No this is a production cluster that I have not had a chance to upgrade yet.
>>>
>>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>>> hope that the osd data is still in tact.
>>>
>>> Once I get things stable again I was planning on upgrading…but the upgrade
>>> is a bit intensive by the looks of it so I need to set aside a decent amount
>>> of time.
>>>
>>> Thanks all!
>>>
>>> Shain
>>>
>>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni <vakul...@redhat.com> wrote:
>>>
>>> Just curious, why you still want to deploy new hammer instead of stable
>>> jewel? Is this a test environment? the last .10 release was basically for
>>> bug fixes for 0.94.9.
>>>
>>>
>>>
>>>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo <ski...@redhat.com> wrote:
>>>>
>>>> FYI:
>>>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>>>
>>>>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley <smi...@npr.org> wrote:
>>>>> Hello,
>>>>> I am trying to deploy ceph to a new server using ceph-deply which I have
>>>>> done in the past many times without issue.
>>>>>
>>>>> Right now I am seeing a timeout trying to connect to git.ceph.com:
>>>>>
>>>>>
>>>>> [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>>>>> apt-get
>>>>> -q install --assume-yes ca-certificates
>>>>> [hqosd6][DEBUG ] Reading package lists...
>>>>> [hqosd6][DEBUG ] Building dependency tree...
>>>>> [hqosd6][DEBUG ] Reading state information...
>>>>> [hqosd6][DEBUG ] ca-certificates is already the newest version.
>>>>> [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>>>>> upgraded.
>>>>> [hqosd6][INFO  ] Running command: wget -O release.asc
>>>>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [hqosd6][WARNIN] --2017-03-15 11:49:16--
>>>>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>>>>> [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>>>>> connected.
>>>>> [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>>>>> Permanently
>>>>> [hqosd6][WARNIN] Location:
>>>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [following]
>>>>> [hqosd6][WARNIN] --2017-03-15 11:49:17--
>>>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>>>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>>> [hqosd6][WARNIN] Retrying.
>>>>> [hqosd6][WARNIN]
>>>>> [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>>>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>>> [hqosd6][WARNIN] Retrying.
>>>>> [hqosd6][WARNIN]
>>>>> [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>>>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>>>> [hqosd6][WARNIN] Retrying.
>>>>>
>>>&g

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo

It may be probably kind of challenge but please consider Kraken (or
later) because Jewel will be retired:

http://docs.ceph.com/docs/master/releases/

On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley <smi...@npr.org> wrote:
> No this is a production cluster that I have not had a chance to upgrade yet.
>
> We had an is with the OS on a node so I am just trying to reinstall ceph and
> hope that the osd data is still in tact.
>
> Once I get things stable again I was planning on upgrading…but the upgrade
> is a bit intensive by the looks of it so I need to set aside a decent amount
> of time.
>
> Thanks all!
>
> Shain
>
> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni <vakul...@redhat.com> wrote:
>
> Just curious, why you still want to deploy new hammer instead of stable
> jewel? Is this a test environment? the last .10 release was basically for
> bug fixes for 0.94.9.
>
>
>
> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo <ski...@redhat.com> wrote:
>>
>> FYI:
>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>
>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley <smi...@npr.org> wrote:
>> > Hello,
>> > I am trying to deploy ceph to a new server using ceph-deply which I have
>> > done in the past many times without issue.
>> >
>> > Right now I am seeing a timeout trying to connect to git.ceph.com:
>> >
>> >
>> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>> > apt-get
>> > -q install --assume-yes ca-certificates
>> > [hqosd6][DEBUG ] Reading package lists...
>> > [hqosd6][DEBUG ] Building dependency tree...
>> > [hqosd6][DEBUG ] Reading state information...
>> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
>> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>> > upgraded.
>> > [hqosd6][INFO  ] Running command: wget -O release.asc
>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>> > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>> > connected.
>> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>> > Permanently
>> > [hqosd6][WARNIN] Location:
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [following]
>> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> > [hqosd6][WARNIN]
>> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> > [hqosd6][WARNIN]
>> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> >
>> >
>> > I am wondering if this is a known issue.
>> >
>> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
>> > in
>> > the past upgrading to a newer version I was not able to install hammer
>> > on
>> > the cluster…so the workaround was to use a slightly older version.
>> >
>> > Thanks in advance for any help you may be able to provide.
>> >
>> > Shain
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo

FYI:
https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3

On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
> Hello,
> I am trying to deploy ceph to a new server using ceph-deply which I have
> done in the past many times without issue.
>
> Right now I am seeing a timeout trying to connect to git.ceph.com:
>
>
> [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get
> -q install --assume-yes ca-certificates
> [hqosd6][DEBUG ] Reading package lists...
> [hqosd6][DEBUG ] Building dependency tree...
> [hqosd6][DEBUG ] Reading state information...
> [hqosd6][DEBUG ] ca-certificates is already the newest version.
> [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
> upgraded.
> [hqosd6][INFO  ] Running command: wget -O release.asc
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] --2017-03-15 11:49:16--
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
> [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
> connected.
> [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
> Permanently
> [hqosd6][WARNIN] Location:
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc [following]
> [hqosd6][WARNIN] --2017-03-15 11:49:17--
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
>
>
> I am wondering if this is a known issue.
>
> Just an fyi...I am using an older version of ceph-deply (1.5.36) because in
> the past upgrading to a newer version I was not able to install hammer on
> the cluster…so the workaround was to use a slightly older version.
>
> Thanks in advance for any help you may be able to provide.
>
> Shain
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd and kernel feature mismatches

2017-02-27 Thread Shinobu Kinjo

We already discussed this:

https://www.spinics.net/lists/ceph-devel/msg34559.html

What do you think of comment posted in that ML?
Would that make sense to you as well?


On Tue, Feb 28, 2017 at 2:41 AM, Vasu Kulkarni  wrote:
> Ilya,
>
> Many folks hit this and its quite difficult since the error is not properly
> printed out(unless one scans syslogs), Is it possible to default the feature
> to
> the one that kernel supports or its not possible to handle that case?
>
> Thanks
>
> On Mon, Feb 27, 2017 at 5:59 AM, Ilya Dryomov  wrote:
>>
>> On Mon, Feb 27, 2017 at 2:37 PM, Simon Weald  wrote:
>> > I've currently having some issues making some Jessie-based Xen hosts
>> > talk to a Trusty-based cluster due to feature mismatch errors. Our
>> > Trusty hosts are using 3.19.0-80 (the Vivid LTS kernel), and our Jessie
>> > hosts were using the standard Jessie kernel (3.16). Volumes wouldn't
>> > map, so I tried the kernel from jessie-backports (4.9.2-2~bpo8+1); still
>> > no joy. I then tried compiling the latest kernel in the 4.9 branch
>> > (4.9.12) from source with the Debian kernel config - still no joy. As I
>> > understand it there have been a lot of changes in krbd which I should
>> > have pulled in when building from source - am I missing something? Some
>> > info about the Xen hosts:
>> >
>> > root@xen-host:~# uname -r
>> > 4.9.12-internal
>> >
>> > root@xen-host:~# ceph -v
>> > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>> >
>> > root@xen-host:~# rbd map -p cinder
>> > volume-88188973-0f40-48a3-8a88-302d1cb5e093
>> > rbd: sysfs write failed
>> > RBD image feature set mismatch. You can disable features unsupported by
>> > the kernel with "rbd feature disable".
>> > In some cases useful info is found in syslog - try "dmesg | tail" or so.
>> > rbd: map failed: (6) No such device or address
>> >
>> > root@xen-host:~# dmesg | grep 'unsupported'
>> > [252723.885948] rbd: image volume-88188973-0f40-48a3-8a88-302d1cb5e093:
>> > image uses unsupported features: 0x38
>> >
>> > root@xen-host:~# rbd info -p cinder
>> > volume-88188973-0f40-48a3-8a88-302d1cb5e093
>> > rbd image 'volume-88188973-0f40-48a3-8a88-302d1cb5e093':
>> > size 1024 MB in 256 objects
>> > order 22 (4096 kB objects)
>> > block_name_prefix: rbd_data.c6bd3c5f705426
>> > format: 2
>> > features: layering, exclusive-lock, object-map, fast-diff,
>> > deep-flatten
>> > flags:
>>
>> object-map, fast-diff, deep-flatten are still unsupported.
>>
>> > Do
>> >
>> > $ rbd feature disable 
>> > deep-flatten,fast-diff,object-map,exclusive-lock
>> >
>> > to disable features unsupported by the kernel client.  If you are using
>> > the
>> > kernel client, you should create your images with
>> >
>> > $ rbd create --size  --image-feature layering 
>> >
>> > or add
>> >
>> > rbd default features = 3
>> >
>> > to ceph.conf on the client side.  (Setting rbd default features on the
>> > OSDs will have no effect.)
>>
>> exclusive-lock is supported starting with 4.9.  The above becomes
>>
>> > $ rbd feature disable  deep-flatten,fast-diff,object-map
>> > $ rbd create --size  --image-feature layering,exclusive-lock
>> > 
>> > rbd default features = 5
>>
>> if you want it.
>>
>> Thanks,
>>
>> Ilya
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] kraken-bluestore 11.2.0 memory leak issue

2017-02-19 Thread Shinobu Kinjo

Please open ticket at http://tracker.ceph.com, if you haven't yet.

On Thu, Feb 16, 2017 at 6:07 PM, Muthusamy Muthiah
 wrote:
> Hi Wido,
>
> Thanks for the information and let us know if this is a bug.
> As workaround we will go with small bluestore_cache_size to 100MB.
>
> Thanks,
> Muthu
>
> On 16 February 2017 at 14:04, Wido den Hollander  wrote:
>>
>>
>> > Op 16 februari 2017 om 7:19 schreef Muthusamy Muthiah
>> > :
>> >
>> >
>> > Thanks IIya Letkowski for the information we will change this value
>> > accordingly.
>> >
>>
>> What I understand from yesterday's performance meeting is that this seems
>> like a bug. Lowering this buffer reduces memory, but the root-cause seems to
>> be memory not being freed. A few bytes of a larger allocation still
>> allocated causing this buffer not to be freed.
>>
>> Tried:
>>
>> debug_mempools = true
>>
>> $ ceph daemon osd.X dump_mempools
>>
>> Might want to view the YouTube video of yesterday when it's online:
>> https://www.youtube.com/channel/UCno-Fry25FJ7B4RycCxOtfw/videos
>>
>> Wido
>>
>> > Thanks,
>> > Muthu
>> >
>> > On 15 February 2017 at 17:03, Ilya Letkowski 
>> > wrote:
>> >
>> > > Hi, Muthusamy Muthiah
>> > >
>> > > I'm not totally sure that this is a memory leak.
>> > > We had same problems with bluestore on ceph v11.2.0.
>> > > Reduce bluestore cache helped us to solve it and stabilize OSD memory
>> > > consumption on the 3GB level.
>> > >
>> > > Perhaps this will help you:
>> > >
>> > > bluestore_cache_size = 104857600
>> > >
>> > >
>> > >
>> > > On Tue, Feb 14, 2017 at 11:52 AM, Muthusamy Muthiah <
>> > > muthiah.muthus...@gmail.com> wrote:
>> > >
>> > >> Hi All,
>> > >>
>> > >> On all our 5 node cluster with ceph 11.2.0 we encounter memory leak
>> > >> issues.
>> > >>
>> > >> Cluster details : 5 node with 24/68 disk per node , EC : 4+1 , RHEL
>> > >> 7.2
>> > >>
>> > >> Some traces using sar are below and attached the memory utilisation
>> > >> graph
>> > >> .
>> > >>
>> > >> (16:54:42)[cn2.c1 sa] # sar -r
>> > >> 07:50:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit
>> > >> %commit
>> > >> kbactive kbinact kbdirty
>> > >> 10:20:01 32077264 132754368 80.54 16176 3040244 77767024 47.18
>> > >> 51991692
>> > >> 2676468 260
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> *10:30:01 32208384 132623248 80.46 16176 3048536 77832312 47.22
>> > >> 51851512
>> > >> 2684552 1210:40:01 32067244 132764388 80.55 16176 3059076 77832316
>> > >> 47.22
>> > >> 51983332 2694708 26410:50:01 30626144 134205488 81.42 16176 3064340
>> > >> 78177232 47.43 53414144 2693712 411:00:01 28927656 135903976 82.45
>> > >> 16176
>> > >> 3074064 78958568 47.90 55114284 2702892 1211:10:01 27158548 137673084
>> > >> 83.52
>> > >> 16176 3080600 80553936 48.87 56873664 2708904 1211:20:01 2646
>> > >> 138376076
>> > >> 83.95 16176 3080436 81991036 49.74 57570280 2708500 811:30:01
>> > >> 26002252
>> > >> 138829380 84.22 16176 3090556 82223840 49.88 58015048 2718036
>> > >> 1611:40:01
>> > >> 25965924 138865708 84.25 16176 3089708 83734584 50.80 58049980
>> > >> 2716740
>> > >> 1211:50:01 26142888 138688744 84.14 16176 3089544 83800100 50.84
>> > >> 57869628
>> > >> 2715400 16*
>> > >>
>> > >> ...
>> > >> ...
>> > >>
>> > >> In the attached graph, there is increase in memory utilisation by
>> > >> ceph-osd during soak test. And when it reaches the system limit of
>> > >> 128GB
>> > >> RAM , we could able to see the below dmesg logs related to memory out
>> > >> when
>> > >> the system reaches close to 128GB RAM. OSD.3 killed due to Out of
>> > >> memory
>> > >> and started again.
>> > >>
>> > >> [Tue Feb 14 03:51:02 2017] *tp_osd_tp invoked oom-killer:
>> > >> gfp_mask=0x280da, order=0, oom_score_adj=0*
>> > >> [Tue Feb 14 03:51:02 2017] tp_osd_tp cpuset=/ mems_allowed=0-1
>> > >> [Tue Feb 14 03:51:02 2017] CPU: 20 PID: 11864 Comm: tp_osd_tp Not
>> > >> tainted
>> > >> 3.10.0-327.13.1.el7.x86_64 #1
>> > >> [Tue Feb 14 03:51:02 2017] Hardware name: HP ProLiant XL420
>> > >> Gen9/ProLiant
>> > >> XL420 Gen9, BIOS U19 09/12/2016
>> > >> [Tue Feb 14 03:51:02 2017]  8819ccd7a280 30e84036
>> > >> 881fa58f7528 816356f4
>> > >> [Tue Feb 14 03:51:02 2017]  881fa58f75b8 8163068f
>> > >> 881fa3478360 881fa3478378
>> > >> [Tue Feb 14 03:51:02 2017]  881fa58f75e8 8819ccd7a280
>> > >> 0001 0001f65f
>> > >> [Tue Feb 14 03:51:02 2017] Call Trace:
>> > >> [Tue Feb 14 03:51:02 2017]  [] dump_stack+0x19/0x1b
>> > >> [Tue Feb 14 03:51:02 2017]  []
>> > >> dump_header+0x8e/0x214
>> > >> [Tue Feb 14 03:51:02 2017]  []
>> > >> oom_kill_process+0x24e/0x3b0
>> > >> [Tue Feb 14 03:51:02 2017]  [] ?
>> > >> find_lock_task_mm+0x56/0xc0
>> > >> [Tue Feb 14 03:51:02 2017]  []
>> > >> *out_of_memory+0x4b6/0x4f0*
>> > >> [Tue Feb 14 03:51:02 2017]  []
>> > >> __alloc_pages_nodemask+0xa95/0xb90
>> > >> [Tue Feb

Re: [ceph-users] How safe is ceph pg repair these days?

2017-02-17 Thread Shinobu Kinjo

if ``ceph pg deep-scrub `` does not work
then
  do
``ceph pg repair 


On Sat, Feb 18, 2017 at 10:02 AM, Tracy Reed  wrote:
> I have a 3 replica cluster. A couple times I have run into inconsistent
> PGs. I googled it and ceph docs and various blogs say run a repair
> first. But a couple people on IRC and a mailing list thread from 2015
> say that ceph blindly copies the primary over the secondaries and calls
> it good.
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001370.html
>
> I sure hope that isn't the case. If so it would seem highly
> irresponsible to implement such a naive command called "repair". I have
> recently learned how to properly analyze the OSD logs and manually fix
> these things but not before having run repair on a dozen inconsistent
> PGs. Now I'm worried about what sort of corruption I may have
> introduced. Repairing things by hand is a simple heuristic based on
> comparing the size or checksum (as indicated by the logs) for each of
> the 3 copies and figuring out which is correct. Presumably matching two
> out of three should win and the odd object out should be deleted since
> having the exact same kind of error on two different OSDs is highly
> improbable. I don't understand why ceph repair wouldn't have done this
> all along.
>
> What is the current best practice in the use of ceph repair?
>
> Thanks!
>
> --
> Tracy Reed
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo

On Sat, Feb 18, 2017 at 9:03 AM, Matyas Koszik  wrote:
>
>
> Looks like you've provided me with the solution, thanks!

:)

> I've set the tunables to firefly, and now I only see the normal states
> associated with a recovering cluster, there're no more stale pgs.
> I hope it'll stay like this when it's done, but that'll take quite a
> while.
>
> Matyas
>
>
> On Fri, 17 Feb 2017, Gregory Farnum wrote:
>
>> Situations that are stable lots of undersized PGs like this generally
>> mean that the CRUSH map is failing to allocate enough OSDs for certain
>> PGs. The log you have says the OSD is trying to NOTIFY the new primary
>> that the PG exists here on this replica.
>>
>> I'd guess you only have 3 hosts and are trying to place all your
>> replicas on independent boxes. Bobtail tunables have trouble with that
>> and you're going to need to pay the cost of moving to more modern
>> ones.
>> -Greg
>>
>> On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik  wrote:
>> >
>> >
>> > I'm not sure what variable should I be looking at exactly, but after
>> > reading through all of them I don't see anyting supsicious, all values are
>> > 0. I'm attaching it anyway, in case I missed something:
>> > https://atw.hu/~koszik/ceph/osd26-perf
>> >
>> >
>> > I tried debugging the ceph pg query a bit more, and it seems that it
>> > gets stuck communicating with the mon - it doesn't even try to connect to
>> > the osd. This is the end of the log:
>> >
>> > 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, 
>> > {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"...,
>> >  53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, 
>> > {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, 
>> > msg_flags=0}, MSG_NOSIGNAL) = 98
>> > 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, 
>> > MSG_NOSIGNAL) = 9
>> > 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, 
>> > MSG_DONTWAIT, NULL, NULL) = 9
>> > 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
>> > = 9
>> > 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 
>> > 9
>> > 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> >
>> > ... and this goes on for as long as I let it. When I kill it, I get this:
>> > RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", 
>> > "pgid": "6.245"}']": exception 'int' object is not iterable
>> >
>> > I restarted (again) osd26 with max debugging; after grepping for 6.245,
>> > this is the log I get:
>> > https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245
>> >
>> > Matyas
>> >
>> >
>> > On Fri, 17 Feb 2017, Tomasz Kuzemko wrote:
>> >
>> >> If the PG cannot be queried I would bet on OSD message throttler. Check 
>> >> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which 
>> >> is holding this PG  if message throttler current value is not equal max. 
>> >> If it is, increase the max value in ceph.conf and restart OSD.
>> >>
>> >> --
>> >> Tomasz Kuzemko
>> >> tomasz.kuze...@corp.ovh.com
>> >>
>> >> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik  napisaĹ (a):
>> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > It seems that my ceph cluster is in an erroneous state of which I cannot
>> >> > see right now how to get out of.
>> >> >
>> >> > The status is the following:
>> >> >
>> >> > health HEALTH_WARN
>> >> >   25 pgs degraded
>> >> >   1 pgs stale
>> >> >   26 pgs stuck unclean
>> >> >   25 pgs undersized
>> >> >   recovery 23578/9450442 objects degraded (0.249%)
>> >> >   recovery 45/9450442 objects misplaced (0.000%)
>> >> >   crush map has legacy tunables (require bobtail, min is firefly)
>> >> > monmap e17: 3 mons at x
>> >> >   election epoch 8550, quorum 0,1,2 store1,store3,store2
>> >> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs
>> >> >   flags require_jewel_osds
>> >> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects
>> >> >   36750 GB used, 61947 GB / 98697 GB avail
>> >> >   23578/9450442 objects degraded (0.249%)
>> >> >   45/9450442 objects misplaced (0.000%)
>> >> >   4362 active+clean
>> >> > 24 active+undersized+degraded
>> >> >  1 stale+active+undersized+degraded+remapped
>> >> >  1 active+remapped
>> >> >
>> >> >
>> >> > I tried restarting all OSDs, to no avail, it actually made things a bit
>> >> > worse.
>> >> > From a user point of view the cluster works

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo

You may need to increase ``choose_total_tries`` to more than 50
(default) up to 100.

 - 
http://docs.ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map

 - https://github.com/ceph/ceph/blob/master/doc/man/8/crushtool.rst

On Sat, Feb 18, 2017 at 5:25 AM, Matyas Koszik  wrote:
>
> I have size=2 and 3 independent nodes. I'm happy to try firefly tunables,
> but a bit scared that it would make things even worse.
>
>
> On Fri, 17 Feb 2017, Gregory Farnum wrote:
>
>> Situations that are stable lots of undersized PGs like this generally
>> mean that the CRUSH map is failing to allocate enough OSDs for certain
>> PGs. The log you have says the OSD is trying to NOTIFY the new primary
>> that the PG exists here on this replica.
>>
>> I'd guess you only have 3 hosts and are trying to place all your
>> replicas on independent boxes. Bobtail tunables have trouble with that
>> and you're going to need to pay the cost of moving to more modern
>> ones.
>> -Greg
>>
>> On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik  wrote:
>> >
>> >
>> > I'm not sure what variable should I be looking at exactly, but after
>> > reading through all of them I don't see anyting supsicious, all values are
>> > 0. I'm attaching it anyway, in case I missed something:
>> > https://atw.hu/~koszik/ceph/osd26-perf
>> >
>> >
>> > I tried debugging the ceph pg query a bit more, and it seems that it
>> > gets stuck communicating with the mon - it doesn't even try to connect to
>> > the osd. This is the end of the log:
>> >
>> > 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, 
>> > {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"...,
>> >  53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, 
>> > {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, 
>> > msg_flags=0}, MSG_NOSIGNAL) = 98
>> > 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, 
>> > MSG_NOSIGNAL) = 9
>> > 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, 
>> > MSG_DONTWAIT, NULL, NULL) = 9
>> > 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
>> > = 9
>> > 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 
>> > 9
>> > 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> >
>> > ... and this goes on for as long as I let it. When I kill it, I get this:
>> > RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", 
>> > "pgid": "6.245"}']": exception 'int' object is not iterable
>> >
>> > I restarted (again) osd26 with max debugging; after grepping for 6.245,
>> > this is the log I get:
>> > https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245
>> >
>> > Matyas
>> >
>> >
>> > On Fri, 17 Feb 2017, Tomasz Kuzemko wrote:
>> >
>> >> If the PG cannot be queried I would bet on OSD message throttler. Check 
>> >> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which 
>> >> is holding this PG  if message throttler current value is not equal max. 
>> >> If it is, increase the max value in ceph.conf and restart OSD.
>> >>
>> >> --
>> >> Tomasz Kuzemko
>> >> tomasz.kuze...@corp.ovh.com
>> >>
>> >> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik  napisaĹ (a):
>> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > It seems that my ceph cluster is in an erroneous state of which I cannot
>> >> > see right now how to get out of.
>> >> >
>> >> > The status is the following:
>> >> >
>> >> > health HEALTH_WARN
>> >> >   25 pgs degraded
>> >> >   1 pgs stale
>> >> >   26 pgs stuck unclean
>> >> >   25 pgs undersized
>> >> >   recovery 23578/9450442 objects degraded (0.249%)
>> >> >   recovery 45/9450442 objects misplaced (0.000%)
>> >> >   crush map has legacy tunables (require bobtail, min is firefly)
>> >> > monmap e17: 3 mons at x
>> >> >   election epoch 8550, quorum 0,1,2 store1,store3,store2
>> >> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs
>> >> >   flags require_jewel_osds
>> >> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects
>> >> >   36750 GB used, 61947 GB / 98697 GB avail
>> >> >   23578/9450442 objects degraded (0.249%)
>> >> >   45/9450442 objects misplaced (0.000%)
>> >> >   4362 active+clean
>> >> > 24 active+undersized+degraded
>> >> >  1 stale+active+undersized+degraded+remapped
>> >> >  1 active+remapped
>> >> >
>> >> >
>> >> > I tried restarting all OSDs, to no avail, it actually made things a

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo

Can you do?

 * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o
./crushmap.txt

On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum  wrote:
> Situations that are stable lots of undersized PGs like this generally
> mean that the CRUSH map is failing to allocate enough OSDs for certain
> PGs. The log you have says the OSD is trying to NOTIFY the new primary
> that the PG exists here on this replica.
>
> I'd guess you only have 3 hosts and are trying to place all your
> replicas on independent boxes. Bobtail tunables have trouble with that
> and you're going to need to pay the cost of moving to more modern
> ones.
> -Greg
>
> On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik  wrote:
>>
>>
>> I'm not sure what variable should I be looking at exactly, but after
>> reading through all of them I don't see anyting supsicious, all values are
>> 0. I'm attaching it anyway, in case I missed something:
>> https://atw.hu/~koszik/ceph/osd26-perf
>>
>>
>> I tried debugging the ceph pg query a bit more, and it seems that it
>> gets stuck communicating with the mon - it doesn't even try to connect to
>> the osd. This is the end of the log:
>>
>> 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, 
>> {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"..., 
>> 53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, 
>> {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, 
>> msg_flags=0}, MSG_NOSIGNAL) = 98
>> 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, NULL, 
>> NULL) = 9
>> 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
>> = 9
>> 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, MSG_DONTWAIT, 
>> NULL, NULL) = 9
>> 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 
>> 9
>> 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, MSG_DONTWAIT, 
>> NULL, NULL) = 9
>> 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9
>> 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, 
>> NULL, NULL) = 9
>>
>> ... and this goes on for as long as I let it. When I kill it, I get this:
>> RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", 
>> "pgid": "6.245"}']": exception 'int' object is not iterable
>>
>> I restarted (again) osd26 with max debugging; after grepping for 6.245,
>> this is the log I get:
>> https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245
>>
>> Matyas
>>
>>
>> On Fri, 17 Feb 2017, Tomasz Kuzemko wrote:
>>
>>> If the PG cannot be queried I would bet on OSD message throttler. Check 
>>> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which 
>>> is holding this PG  if message throttler current value is not equal max. If 
>>> it is, increase the max value in ceph.conf and restart OSD.
>>>
>>> --
>>> Tomasz Kuzemko
>>> tomasz.kuze...@corp.ovh.com
>>>
>>> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik  napisał(a):
>>>
>>> >
>>> > Hi,
>>> >
>>> > It seems that my ceph cluster is in an erroneous state of which I cannot
>>> > see right now how to get out of.
>>> >
>>> > The status is the following:
>>> >
>>> > health HEALTH_WARN
>>> >   25 pgs degraded
>>> >   1 pgs stale
>>> >   26 pgs stuck unclean
>>> >   25 pgs undersized
>>> >   recovery 23578/9450442 objects degraded (0.249%)
>>> >   recovery 45/9450442 objects misplaced (0.000%)
>>> >   crush map has legacy tunables (require bobtail, min is firefly)
>>> > monmap e17: 3 mons at x
>>> >   election epoch 8550, quorum 0,1,2 store1,store3,store2
>>> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs
>>> >   flags require_jewel_osds
>>> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects
>>> >   36750 GB used, 61947 GB / 98697 GB avail
>>> >   23578/9450442 objects degraded (0.249%)
>>> >   45/9450442 objects misplaced (0.000%)
>>> >   4362 active+clean
>>> > 24 active+undersized+degraded
>>> >  1 stale+active+undersized+degraded+remapped
>>> >  1 active+remapped
>>> >
>>> >
>>> > I tried restarting all OSDs, to no avail, it actually made things a bit
>>> > worse.
>>> > From a user point of view the cluster works perfectly (apart from that
>>> > stale pg, which fortunately hit the pool on which I keep swap images
>>> > only).
>>> >
>>> > A little background: I made the mistake of creating the cluster with
>>> > size=2 pools, which I'm now in the process of rectifying, but that
>>> > requires some fiddling around. I also tried moving to more optimal
>>> > tunables (firefly), but the documentation is a bit optimistic
>>> > with the 'up to 10%' data movement -

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Shinobu Kinjo

Would you simply do?

 * ceph -s

On Fri, Feb 17, 2017 at 6:26 AM, Benjeman Meekhof  wrote:
> As I'm looking at logs on the OSD mentioned in previous email at this
> point, I mostly see this message repeating...is this normal or
> indicating a problem?  This osd is marked up in the cluster.
>
> 2017-02-16 16:23:35.550102 7fc66fce3700 20 osd.564 152609
> share_map_peer 0x7fc6887a3000 already has epoch 152609
> 2017-02-16 16:23:35.556208 7fc66f4e2700 20 osd.564 152609
> share_map_peer 0x7fc689e35000 already has epoch 152609
> 2017-02-16 16:23:35.556233 7fc66f4e2700 20 osd.564 152609
> share_map_peer 0x7fc689e35000 already has epoch 152609
> 2017-02-16 16:23:35.577324 7fc66fce3700 20 osd.564 152609
> share_map_peer 0x7fc68f4c1000 already has epoch 152609
> 2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609
> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>
> thanks,
> Ben
>
> On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof  wrote:
>> I tried starting up just a couple OSD with debug_osd = 20 and
>> debug_filestore = 20.
>>
>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>> unusual but maybe someone else sees something in here that is a
>> problem:  http://pastebin.com/uy8S7hps
>>
>> As this log is rolling on, our OSD has still not been marked up and is
>> occupying 100% of a CPU core.  I've done this a couple times and in a
>> matter of some hours it will be marked up and CPU will drop.  If more
>> kraken OSD on another host are brought up the existing kraken OSD go
>> back into max CPU usage again while pg recover.  The trend scales
>> upward as OSD are started until the system is completely saturated.
>>
>> I was reading the docs on async messenger settings at
>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>> that under 'ms async max op threads' there is a note about one or more
>> CPUs constantly on 100% load.  As an experiment I set max op threads
>> to 20 and that is the setting during the period of the pasted log.  It
>> seems to make no difference.
>>
>> Appreciate any thoughts on troubleshooting this.  For the time being
>> I've aborted our kraken update and will probably re-initialize any
>> already updated OSD to revert to Jewel except perhaps one host to
>> continue testing.
>>
>> thanks,
>> Ben
>>
>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>>> wrote:
 Hi all,

 We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
 (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
 mons to Kraken.

 After updating ceph packages I restarted the 60 OSD on the box with
 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
 load flat-lines at 100% with top showing all of that being system load
 from ceph-osd processes.  Not long after we get OSD flapping due to
 the load on the system (noout was set to start this, but perhaps
 too-quickly unset post restart).

 This is causing problems in the cluster, and we reboot the box.  The
 OSD don't start up/mount automatically - not a new problem on this
 setup.  We run 'ceph-disk activate $disk' on a list of all the
 /dev/dm-X devices as output by ceph-disk list.  Everything activates
 and the CPU gradually climbs to once again be a solid 100%.  No OSD
 have joined cluster so it isn't causing issues.

 I leave the box overnight...by the time I leave I see that 1-2 OSD on
 this box are marked up/in.   By morning all are in, CPU is fine,
 cluster is still fine.

 This is not a show-stopping issue now that I know what happens though
 it means upgrades are a several hour or overnight affair.  Next box I
 will just mark all the OSD out before updating and restarting them or
 try leaving them up but being sure to set noout to avoid flapping
 while they churn.

 Here's a log snippet from one currently spinning in the startup
 process since 11am.  This is the second box we did, the first
 experience being as detailed above.  Could this have anything to do
 with the 'PGs are upgrading' message?
>>>
>>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>>> scale with the object count. I could be missing something but I don't
>>> see anything in the upgrade notes that should be doing this either.
>>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>>> 20" set and see what the log spits out.
>>> -Greg
>>>

 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to 
 load lua
 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
 has features 288514119978713088, adjusting msgr requires for clients
 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
 has features

Re: [ceph-users] RBD client newer than cluster

2017-02-14 Thread Shinobu Kinjo

On Wed, Feb 15, 2017 at 2:18 AM, Lukáš Kubín  wrote:
> Hi,
> I'm most probably hitting bug http://tracker.ceph.com/issues/13755 - when
> libvirt mounted RBD disks suspend I/O during snapshot creation until hard
> reboot.
>
> My Ceph cluster (monitors and OSDs) is running v0.94.3, while clients
> (OpenStack/KVM computes) run v0.94.5. Can I still update the client packages
> (librbd1 and dependencies) to a patched release 0.94.7, while keeping the
> cluster on v0.94.3?

The latest hammer is v0.94.9 and hammer will be EOL in this spring.
Why do you want to keep v0.94.3? Is it because you just want to avoid
any risks regarding to upgrading packages?

>
> I realize it's not ideal but does it present any risk? Can I assume that
> patching the client is sufficient to resolve the mentioned bug?
>
> Ceph cluster nodes can't receive updates currently and this will stay so for
> some time still, but I need to resolve the snapshot bug urgently.
>
> Greetings,
>
> Lukáš
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 答复: 答复: mon is stuck in leveldb and costs nearly 100% cpu

2017-02-13 Thread Shinobu Kinjo

> 2 active+clean+scrubbing+deep

 * Set noscrub and nodeep-scrub
  # ceph osd set noscrub
  # ceph osd set nodeep-scrub

 * Wait for scrubbing+deep to complete

 * Do `ceph -s`

If still you would be seeing high CPU usage, please identify who
is/are eating CPU resource.

 * ps aux | sort -rk 3,4 | head -n 20

And let us know.


On Mon, Feb 13, 2017 at 9:39 PM, Chenyehua <chen.ye...@h3c.com> wrote:
> Thanks for the response, Shinobu
> The warning disappears due to your suggesting solution, however the nearly 
> 100% cpu cost still exists and concerns me a lot.
> So, do you know why the cpu cost is so high?
> Are there any solutions or suggestions to this problem?
>
> Cheers
>
> -邮件原件-
> 发件人: Shinobu Kinjo [mailto:ski...@redhat.com]
> 发送时间: 2017年2月13日 10:54
> 收件人: chenyehua 11692 (RD)
> 抄送: kc...@redhat.com; ceph-users@lists.ceph.com
> 主题: Re: 答复: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu
>
> O.k, that's reasonable answer. Would you do on all hosts which the MON are 
> running on:
>
>  #* ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname -s`.asok config show 
> | grep leveldb_log
>
> Anyway you can compact leveldb size with at runtime:
>
>  #* ceph tell mon.`hostname -s` compact
>
> And you should set in ceph.conf to prevent same issue from the next:
>
>  #* [mon]
>  #* mon compact on start = true
>
>
> On Mon, Feb 13, 2017 at 11:37 AM, Chenyehua <chen.ye...@h3c.com> wrote:
>> Sorry, I made a mistake, the ceph version is actually 0.94.5
>>
>> -邮件原件-
>> 发件人: chenyehua 11692 (RD)
>> 发送时间: 2017年2月13日 9:40
>> 收件人: 'Shinobu Kinjo'
>> 抄送: kc...@redhat.com; ceph-users@lists.ceph.com
>> 主题: 答复: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu
>>
>> My ceph version is 10.2.5
>>
>> -邮件原件-
>> 发件人: Shinobu Kinjo [mailto:ski...@redhat.com]
>> 发送时间: 2017年2月12日 13:12
>> 收件人: chenyehua 11692 (RD)
>> 抄送: kc...@redhat.com; ceph-users@lists.ceph.com
>> 主题: Re: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu
>>
>> Which Ceph version are you using?
>>
>> On Sat, Feb 11, 2017 at 5:02 PM, Chenyehua <chen.ye...@h3c.com> wrote:
>>> Dear Mr Kefu Chai
>>>
>>> Sorry to disturb you.
>>>
>>> I meet a problem recently. In my ceph cluster ,health status has
>>> warning “store is getting too big!” for several days; and  ceph-mon
>>> costs nearly 100% cpu;
>>>
>>> Have you ever met this situation?
>>>
>>> Some detailed information are attached below:
>>>
>>>
>>>
>>> root@cvknode17:~# ceph -s
>>>
>>> cluster 04afba60-3a77-496c-b616-2ecb5e47e141
>>>
>>>  health HEALTH_WARN
>>>
>>> mon.cvknode17 store is getting too big! 34104 MB >= 15360
>>> MB
>>>
>>>  monmap e1: 3 mons at
>>> {cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode1
>>> 7
>>> =172.16.51.17:6789/0}
>>>
>>> election epoch 862, quorum 0,1,2
>>> cvknode15,cvknode16,cvknode17
>>>
>>>  osdmap e196279: 347 osds: 347 up, 347 in
>>>
>>>   pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822
>>> kobjects
>>>
>>> 65966 GB used, 579 TB / 644 TB avail
>>>
>>>33270 active+clean
>>>
>>>2 active+clean+scrubbing+deep
>>>
>>>   client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr
>>>
>>>
>>>
>>> root@cvknode17:~# top
>>>
>>> top - 15:19:28 up 23 days, 23:58,  6 users,  load average: 1.08,
>>> 1.40,
>>> 1.77
>>>
>>> Tasks: 346 total,   2 running, 342 sleeping,   0 stopped,   2 zombie
>>>
>>> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,
>>> 2.5%si, 0.0%st
>>>
>>> Mem:  65384424k total, 58102880k used,  7281544k free,   240720k buffers
>>>
>>> Swap: 2100k total,   344944k used, 29654156k free, 24274272k cached
>>>
>>>
>>>
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>>
>>>   24407 root  20   0 17.3g  12g  10m S   98 20.2   8420:11 ceph-mon
>>>
>>>
>>>
>>> root@cvknode17:~# top -Hp 24407
>>>
>>> top - 15:19:49 up 23 days, 23:59,  6 users,  load average: 1.12,
>>> 1.39,
>>> 1.76
>>>
>>> Tasks:  17 total,   1 running,  16 sleeping,   0 st

Re: [ceph-users] 答复: mon is stuck in leveldb and costs nearly 100% cpu

2017-02-12 Thread Shinobu Kinjo

O.k, that's reasonable answer. Would you do on all hosts which the MON
are running on:

 #* ceph --admin-daemon /var/run/ceph/ceph-mon.`hostname -s`.asok
config show | grep leveldb_log

Anyway you can compact leveldb size with at runtime:

 #* ceph tell mon.`hostname -s` compact

And you should set in ceph.conf to prevent same issue from the next:

 #* [mon]
 #* mon compact on start = true


On Mon, Feb 13, 2017 at 11:37 AM, Chenyehua <chen.ye...@h3c.com> wrote:
> Sorry, I made a mistake, the ceph version is actually 0.94.5
>
> -邮件原件-
> 发件人: chenyehua 11692 (RD)
> 发送时间: 2017年2月13日 9:40
> 收件人: 'Shinobu Kinjo'
> 抄送: kc...@redhat.com; ceph-users@lists.ceph.com
> 主题: 答复: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu
>
> My ceph version is 10.2.5
>
> -邮件原件-
> 发件人: Shinobu Kinjo [mailto:ski...@redhat.com]
> 发送时间: 2017年2月12日 13:12
> 收件人: chenyehua 11692 (RD)
> 抄送: kc...@redhat.com; ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu
>
> Which Ceph version are you using?
>
> On Sat, Feb 11, 2017 at 5:02 PM, Chenyehua <chen.ye...@h3c.com> wrote:
>> Dear Mr Kefu Chai
>>
>> Sorry to disturb you.
>>
>> I meet a problem recently. In my ceph cluster ,health status has
>> warning “store is getting too big!” for several days; and  ceph-mon
>> costs nearly 100% cpu;
>>
>> Have you ever met this situation?
>>
>> Some detailed information are attached below:
>>
>>
>>
>> root@cvknode17:~# ceph -s
>>
>> cluster 04afba60-3a77-496c-b616-2ecb5e47e141
>>
>>  health HEALTH_WARN
>>
>> mon.cvknode17 store is getting too big! 34104 MB >= 15360
>> MB
>>
>>  monmap e1: 3 mons at
>> {cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode17
>> =172.16.51.17:6789/0}
>>
>> election epoch 862, quorum 0,1,2
>> cvknode15,cvknode16,cvknode17
>>
>>  osdmap e196279: 347 osds: 347 up, 347 in
>>
>>   pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822
>> kobjects
>>
>> 65966 GB used, 579 TB / 644 TB avail
>>
>>33270 active+clean
>>
>>2 active+clean+scrubbing+deep
>>
>>   client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr
>>
>>
>>
>> root@cvknode17:~# top
>>
>> top - 15:19:28 up 23 days, 23:58,  6 users,  load average: 1.08, 1.40,
>> 1.77
>>
>> Tasks: 346 total,   2 running, 342 sleeping,   0 stopped,   2 zombie
>>
>> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,
>> 0.0%st
>>
>> Mem:  65384424k total, 58102880k used,  7281544k free,   240720k buffers
>>
>> Swap: 2100k total,   344944k used, 29654156k free, 24274272k cached
>>
>>
>>
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>
>>   24407 root  20   0 17.3g  12g  10m S   98 20.2   8420:11 ceph-mon
>>
>>
>>
>> root@cvknode17:~# top -Hp 24407
>>
>> top - 15:19:49 up 23 days, 23:59,  6 users,  load average: 1.12, 1.39,
>> 1.76
>>
>> Tasks:  17 total,   1 running,  16 sleeping,   0 stopped,   0 zombie
>>
>> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,
>> 0.0%st
>>
>> Mem:  65384424k total, 58104868k used,  7279556k free,   240744k buffers
>>
>> Swap: 2100k total,   344944k used, 29654156k free, 24271188k cached
>>
>>
>>
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>
>>   25931 root  20   0 17.3g  12g   9m R   98 20.2   7957:37 ceph-mon
>>
>>   24514 root  20   0 17.3g  12g   9m S2 20.2   3:06.75 ceph-mon
>>
>>   25932 root  20   0 17.3g  12g   9m S2 20.2   1:07.82 ceph-mon
>>
>>   24407 root  20   0 17.3g  12g   9m S0 20.2   0:00.67 ceph-mon
>>
>>   24508 root  20   0 17.3g  12g   9m S0 20.2  15:50.24 ceph-mon
>>
>>   24513 root  20   0 17.3g  12g   9m S0 20.2   0:07.88 ceph-mon
>>
>>   24534 root  20   0 17.3g  12g   9m S0 20.2 196:33.85 ceph-mon
>>
>>   24535 root  20   0 17.3g  12g   9m S0 20.2   0:00.01 ceph-mon
>>
>>   25929 root  20   0 17.3g  12g   9m S0 20.2   3:06.09 ceph-mon
>>
>>   25930 root  20   0 17.3g  12g   9m S0 20.2   8:12.58 ceph-mon
>>
>>   25933 root  20   0 17.3g  12g   9m S0 20.2   4:42.22 ceph-mon
>>
>>   25934 root  20   0 17.3g  12g   9m S0 20.2  40:53.27

Re: [ceph-users] mon is stuck in leveldb and costs nearly 100% cpu

2017-02-11 Thread Shinobu Kinjo

Which Ceph version are you using?

On Sat, Feb 11, 2017 at 5:02 PM, Chenyehua  wrote:
> Dear Mr Kefu Chai
>
> Sorry to disturb you.
>
> I meet a problem recently. In my ceph cluster ,health status has warning
> “store is getting too big!” for several days; and  ceph-mon costs nearly
> 100% cpu;
>
> Have you ever met this situation?
>
> Some detailed information are attached below:
>
>
>
> root@cvknode17:~# ceph -s
>
> cluster 04afba60-3a77-496c-b616-2ecb5e47e141
>
>  health HEALTH_WARN
>
> mon.cvknode17 store is getting too big! 34104 MB >= 15360 MB
>
>  monmap e1: 3 mons at
> {cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode17=172.16.51.17:6789/0}
>
> election epoch 862, quorum 0,1,2 cvknode15,cvknode16,cvknode17
>
>  osdmap e196279: 347 osds: 347 up, 347 in
>
>   pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822 kobjects
>
> 65966 GB used, 579 TB / 644 TB avail
>
>33270 active+clean
>
>2 active+clean+scrubbing+deep
>
>   client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr
>
>
>
> root@cvknode17:~# top
>
> top - 15:19:28 up 23 days, 23:58,  6 users,  load average: 1.08, 1.40, 1.77
>
> Tasks: 346 total,   2 running, 342 sleeping,   0 stopped,   2 zombie
>
> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,
> 0.0%st
>
> Mem:  65384424k total, 58102880k used,  7281544k free,   240720k buffers
>
> Swap: 2100k total,   344944k used, 29654156k free, 24274272k cached
>
>
>
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
>   24407 root  20   0 17.3g  12g  10m S   98 20.2   8420:11 ceph-mon
>
>
>
> root@cvknode17:~# top -Hp 24407
>
> top - 15:19:49 up 23 days, 23:59,  6 users,  load average: 1.12, 1.39, 1.76
>
> Tasks:  17 total,   1 running,  16 sleeping,   0 stopped,   0 zombie
>
> Cpu(s):  8.1%us, 10.8%sy,  0.0%ni, 69.0%id,  9.5%wa,  0.0%hi,  2.5%si,
> 0.0%st
>
> Mem:  65384424k total, 58104868k used,  7279556k free,   240744k buffers
>
> Swap: 2100k total,   344944k used, 29654156k free, 24271188k cached
>
>
>
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
>   25931 root  20   0 17.3g  12g   9m R   98 20.2   7957:37 ceph-mon
>
>   24514 root  20   0 17.3g  12g   9m S2 20.2   3:06.75 ceph-mon
>
>   25932 root  20   0 17.3g  12g   9m S2 20.2   1:07.82 ceph-mon
>
>   24407 root  20   0 17.3g  12g   9m S0 20.2   0:00.67 ceph-mon
>
>   24508 root  20   0 17.3g  12g   9m S0 20.2  15:50.24 ceph-mon
>
>   24513 root  20   0 17.3g  12g   9m S0 20.2   0:07.88 ceph-mon
>
>   24534 root  20   0 17.3g  12g   9m S0 20.2 196:33.85 ceph-mon
>
>   24535 root  20   0 17.3g  12g   9m S0 20.2   0:00.01 ceph-mon
>
>   25929 root  20   0 17.3g  12g   9m S0 20.2   3:06.09 ceph-mon
>
>   25930 root  20   0 17.3g  12g   9m S0 20.2   8:12.58 ceph-mon
>
>   25933 root  20   0 17.3g  12g   9m S0 20.2   4:42.22 ceph-mon
>
>   25934 root  20   0 17.3g  12g   9m S0 20.2  40:53.27 ceph-mon
>
>   25935 root  20   0 17.3g  12g   9m S0 20.2   0:04.84 ceph-mon
>
>   25936 root  20   0 17.3g  12g   9m S0 20.2   0:00.01 ceph-mon
>
>   25980 root  20   0 17.3g  12g   9m S0 20.2   0:06.65 ceph-mon
>
>   25986 root  20   0 17.3g  12g   9m S0 20.2  48:26.77 ceph-mon
>
>   55738 root  20   0 17.3g  12g   9m S0 20.2   0:09.06 ceph-mon
>
>
>
>
>
> Thread 20 (Thread 0x7f3e77e80700 (LWP 25931)):
>
> #0  0x7f3e7e83a653 in pread64 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
>
> #1  0x009286cf in ?? ()
>
> #2  0x0092c187 in leveldb::ReadBlock(leveldb::RandomAccessFile*,
> leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::Block**)
> ()
>
> #3  0x00922f41 in leveldb::Table::BlockReader(void*,
> leveldb::ReadOptions const&, leveldb::Slice const&) ()
>
> #4  0x00924840 in ?? ()
>
> #5  0x00924b39 in ?? ()
>
> #6  0x00924a7a in ?? ()
>
> #7  0x009227d0 in ?? ()
>
> #8  0x009140b6 in ?? ()
>
> #9  0x009143dd in ?? ()
>
> #10 0x0088d399 in
> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string const&,
> std::string const&) ()
>
> #11 0x0088bf00 in LevelDBStore::get(std::string const&,
> std::set
> const&, std::map std::allocator >*) ()
>
> #12 0x0056a7a2 in MonitorDBStore::get(std::string const&,
> std::string const&) ()
>
> ---Type  to continue, or q  to quit---
>
> #13 0x005dcf61 in PaxosService::refresh(bool*) ()
>
> #14 0x0058a76b in Monitor::refresh_from_paxos(bool*) ()
>
> #15 0x005c55ac in Paxos::do_refresh() ()
>
> #16 0x005cc093 in Paxos::handle_commit(MMonPaxos*) ()
>
> #17 0x005d4d8b in Paxos::dispatch(PaxosServiceMessage*)

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Shinobu Kinjo

On Sat, Feb 11, 2017 at 1:08 PM, Michael Andersen <mich...@steelcode.com> wrote:
> I believe I did shutdown mon process. Is that not done by the
>
> sudo systemctl stop ceph\*.service ceph\*.target

Oh, that's I missed.

>
> command? Also, as I noted, the mon process does not show up in ps after I do
> that, but I still get the shutdown halting.
>
> The libceph kernel module may be installed. I did not do so deliberately but
> I used ceph-deploy so if it installs that then that is why it's there. I
> also run some kubernetes pods with rbd persistent volumes on these machines,
> although no rbd volumes are in use or mounted when I try shut down. In fact
> I unmapped all rbd volumes across the whole cluster to make sure. Is libceph
> required for rbd?
>
> But even so, is it normal for the libceph kernel module to prevent shutdown?
> Is there another stage in the shutdown procedure that I am missing?
>
>
> On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com> wrote:
>
> That looks like dmesg output from the libceph kernel module. Do you
> have the libceph kernel module loaded?
>
> If the answer to that question is "yes" the follow-up question is
> "Why?" as it is not required for a MON or OSD host.
>
> On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <mich...@steelcode.com>
> wrote:
>> Yeah, all three mons have OSDs on the same machines.
>>
>> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> wrote:
>>>
>>> Is your primary MON running on the host which some OSDs are running on?
>>>
>>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>>> <mich...@steelcode.com> wrote:
>>> > Hi
>>> >
>>> > I am running a small cluster of 8 machines (80 osds), with three
>>> > monitors on
>>> > Ubuntu 16.04. Ceph version 10.2.5.
>>> >
>>> > I cannot reboot the monitors without physically going into the
>>> > datacenter
>>> > and power cycling them. What happens is that while shutting down, ceph
>>> > gets
>>> > stuck trying to contact the other monitors but networking has already
>>> > shut
>>> > down or something like that. I get an endless stream of:
>>> >
>>> > libceph: connect 10.20.0.10:6789 error -101
>>> > libceph: connect 10.20.0.13:6789 error -101
>>> > libceph: connect 10.20.0.17:6789 error -101
>>> >
>>> > where in this case 10.20.0.10 is the machine I am trying to shut down
>>> > and
>>> > all three IPs are the MONs.
>>> >
>>> > At this stage of the shutdown, the machine doesn't respond to pings,
>>> > and
>>> > I
>>> > cannot even log in on any of the virtual terminals. Nothing to do but
>>> > poweroff at the server.
>>> >
>>> > The other non-mon servers shut down just fine, and the cluster was
>>> > healthy
>>> > at the time I was rebooting the mon (I only reboot one machine at a
>>> > time,
>>> > waiting for it to come up before I do the next one).
>>> >
>>> > Also worth mentioning that if I execute
>>> >
>>> > sudo systemctl stop ceph\*.service ceph\*.target
>>> >
>>> > on the server, the only things I see are:
>>> >
>>> > root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
>>> > root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]
>>> >
>>> > and even then, when no ceph daemons are left running, doing a reboot
>>> > goes
>>> > into the same loop.
>>> >
>>> > I can't really find any mention of this online, but I feel someone must
>>> > have
>>> > hit this. Any idea how to fix it? It's really annoying because its hard
>>> > for
>>> > me to get access to the datacenter.
>>> >
>>> > Thanks
>>> > Michael
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Shinobu Kinjo

You may need to stop MON process (if it's acting as primary).

And once you make sure that all OSDs sessions are moved to another
MON, you would be able to shutdown physical host.

Have you tried that?



On Sat, Feb 11, 2017 at 12:18 PM, Michael Andersen
<mich...@steelcode.com> wrote:
> Yeah, all three mons have OSDs on the same machines.
>
> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> wrote:
>>
>> Is your primary MON running on the host which some OSDs are running on?
>>
>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>> <mich...@steelcode.com> wrote:
>> > Hi
>> >
>> > I am running a small cluster of 8 machines (80 osds), with three
>> > monitors on
>> > Ubuntu 16.04. Ceph version 10.2.5.
>> >
>> > I cannot reboot the monitors without physically going into the
>> > datacenter
>> > and power cycling them. What happens is that while shutting down, ceph
>> > gets
>> > stuck trying to contact the other monitors but networking has already
>> > shut
>> > down or something like that. I get an endless stream of:
>> >
>> > libceph: connect 10.20.0.10:6789 error -101
>> > libceph: connect 10.20.0.13:6789 error -101
>> > libceph: connect 10.20.0.17:6789 error -101
>> >
>> > where in this case 10.20.0.10 is the machine I am trying to shut down
>> > and
>> > all three IPs are the MONs.
>> >
>> > At this stage of the shutdown, the machine doesn't respond to pings, and
>> > I
>> > cannot even log in on any of the virtual terminals. Nothing to do but
>> > poweroff at the server.
>> >
>> > The other non-mon servers shut down just fine, and the cluster was
>> > healthy
>> > at the time I was rebooting the mon (I only reboot one machine at a
>> > time,
>> > waiting for it to come up before I do the next one).
>> >
>> > Also worth mentioning that if I execute
>> >
>> > sudo systemctl stop ceph\*.service ceph\*.target
>> >
>> > on the server, the only things I see are:
>> >
>> > root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
>> > root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]
>> >
>> > and even then, when no ceph daemons are left running, doing a reboot
>> > goes
>> > into the same loop.
>> >
>> > I can't really find any mention of this online, but I feel someone must
>> > have
>> > hit this. Any idea how to fix it? It's really annoying because its hard
>> > for
>> > me to get access to the datacenter.
>> >
>> > Thanks
>> > Michael
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cannot shutdown monitors

2017-02-10 Thread Shinobu Kinjo

Is your primary MON running on the host which some OSDs are running on?

On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
 wrote:
> Hi
>
> I am running a small cluster of 8 machines (80 osds), with three monitors on
> Ubuntu 16.04. Ceph version 10.2.5.
>
> I cannot reboot the monitors without physically going into the datacenter
> and power cycling them. What happens is that while shutting down, ceph gets
> stuck trying to contact the other monitors but networking has already shut
> down or something like that. I get an endless stream of:
>
> libceph: connect 10.20.0.10:6789 error -101
> libceph: connect 10.20.0.13:6789 error -101
> libceph: connect 10.20.0.17:6789 error -101
>
> where in this case 10.20.0.10 is the machine I am trying to shut down and
> all three IPs are the MONs.
>
> At this stage of the shutdown, the machine doesn't respond to pings, and I
> cannot even log in on any of the virtual terminals. Nothing to do but
> poweroff at the server.
>
> The other non-mon servers shut down just fine, and the cluster was healthy
> at the time I was rebooting the mon (I only reboot one machine at a time,
> waiting for it to come up before I do the next one).
>
> Also worth mentioning that if I execute
>
> sudo systemctl stop ceph\*.service ceph\*.target
>
> on the server, the only things I see are:
>
> root 11143 2  0 18:40 ?00:00:00 [ceph-msgr]
> root 11162 2  0 18:40 ?00:00:00 [ceph-watch-noti]
>
> and even then, when no ceph daemons are left running, doing a reboot goes
> into the same loop.
>
> I can't really find any mention of this online, but I feel someone must have
> hit this. Any idea how to fix it? It's really annoying because its hard for
> me to get access to the datacenter.
>
> Thanks
> Michael
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I can't create new pool in my cluster.

2017-02-09 Thread Shinobu Kinjo

What did you exactly do?

On Fri, Feb 10, 2017 at 11:48 AM, 周威  wrote:
> The version I'm using is 0.94.9
>
> And when I want to create a pool, It shows:
>
> Error EINVAL: error running crushmap through crushtool: (1) Operation
> not permitted
>
> What's wrong about this?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs stuck unclean

2017-02-09 Thread Shinobu Kinjo

Please provide us with crushmap

 * sudo ceph osd getcrushmap -o crushmap.`date +%Y%m%d%H`

On Fri, Feb 10, 2017 at 5:46 AM, Craig Read <cr...@litewiredata.com> wrote:
> Sorry, 2 nodes, 6 daemons (forgot I added 2 daemons to see if it made a 
> difference)
>
> On CentOS7
>
> Ceph -v:
>
> 10.2.5
>
> Ceph -s:
>
> Health HEALTH_WARN
> 64 pgs stuck unclean
> Too few PGs per OSD (21 < min 30)
> Monmap e1: 1 mons at {=:6789/0}
> Election epoch 3, quorum 0 
> Osdmap e89: 6 osds: 6 up, 6 in; 64 remapped pgs
> Flags sortbitwise,require_jewel_osds
> Pgmap v263: 64pgs, 1 pools, 0 bytes data, 0 objects
> 209 MB used, 121GB / 121GB avail
> 32 active+remapped
> 32 active
>
> Ceph osd tree:
>
> -1 0.11899 root default
> -2 0.05949  Host 1:
>  0 0.00490  Osd.0   up 1.0  1.0
>  3 0.01070  Osd.3   up 1.0  1.0
>  4 0.04390  Osd.4   up.10   1.0
>
> -3 0.05949  Host 2:
>  1 0.00490  Osd.1   up 1.0  1.0
>  2 0.01070  Osd.2   up 1.0  1.0
>  5 0.04390  Osd.5   up1.0   1.0
>
>
> Appreciate your help
>
> Craig
>
> -Original Message-
> From: Shinobu Kinjo [mailto:ski...@redhat.com]
> Sent: Thursday, February 9, 2017 2:34 PM
> To: Craig Read <cr...@litewiredata.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] OSDs stuck unclean
>
> 4 OSD nodes or daemons?
>
> please:
>
>  * ceph -v
>  * ceph -s
>  * ceph osd tree
>
>
> On Fri, Feb 10, 2017 at 5:26 AM, Craig Read <cr...@litewiredata.com> wrote:
>> We have 4 OSDs in test environment that are all stuck unclean
>>
>>
>>
>> I’ve tried rebuilding the whole environment with the same result.
>>
>>
>>
>> OSDs are running on XFS disk, partition 1 is OSD, partition 2 is journal
>>
>>
>>
>> Also seeing degraded despite having 4 OSDs and a default osd pool of 2
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs stuck unclean

2017-02-09 Thread Shinobu Kinjo

4 OSD nodes or daemons?

please:

 * ceph -v
 * ceph -s
 * ceph osd tree


On Fri, Feb 10, 2017 at 5:26 AM, Craig Read  wrote:
> We have 4 OSDs in test environment that are all stuck unclean
>
>
>
> I’ve tried rebuilding the whole environment with the same result.
>
>
>
> OSDs are running on XFS disk, partition 1 is OSD, partition 2 is journal
>
>
>
> Also seeing degraded despite having 4 OSDs and a default osd pool of 2
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Shinobu Kinjo

On Wed, Feb 8, 2017 at 8:07 PM, Dan van der Ster  wrote:
> Hi,
>
> This is interesting. Do you have a bit more info about how to identify
> a server which is suffering from this problem? Is there some process
> (xfs* or kswapd?) we'll see as busy in top or iotop.

That's my question as well. If you would be able to reproduce the
issue intentionally, it would be very helpful.

And also if you could tell us your cluster environment a bit more in
detail, it would be also helpful.

>
> Also, which kernel are you using?
>
> Cheers, Dan
>
>
> On Tue, Feb 7, 2017 at 6:59 PM, Thorvald Natvig  wrote:
>> Hi,
>>
>> We've encountered a small "kernel feature" in XFS using Filestore. We
>> have a workaround, and would like to share in case others have the
>> same problem.
>>
>> Under high load, on slow storage, with lots of dirty buffers and low
>> memory, there's a design choice with unfortunate side-effects if you
>> have multiple XFS filesystems mounted, such as often is the case when
>> you have a JBOD full of drives. This results in network traffic
>> stalling, leading to OSDs failing heartbeats.
>>
>> In short, when the kernel needs to allocate memory for anything, it
>> first figures out how many pages it needs, then goes to each
>> filesystem and says "release N pages". In XFS, that's implemented as
>> follows:
>>
>> - For each AG (8 in our case):
>>   - Try to lock AG
>>   - Release unused buffers, up to N
>> - If this point is reached, and we didn't manage to release at least N
>> pages, try again, but this time wait for the lock.
>>
>> That last part is the problem; if the lock is currently held by, say,
>> another kernel thread that is currently flushing dirty buffers, then
>> the memory allocation stalls. However, we have 30 other XFS
>> filesystems that could release memory, and the kernel also has a lot
>> of non-filesystem memory that can be released.
>>
>> This manifests as OSDs going offline during high load, with other OSDs
>> claiming that the OSD stopped responding to health checks. This is
>> especially prevalent during cache tier flushing and large backfills,
>> which can put very heavy load on the write buffers, thus increasing
>> the probability of one of these events.
>> In reality, the OSD is stuck in the kernel, trying to allocate buffers
>> to build a TCP packet to answer the network message. As soon as the
>> buffers are flushed (which can take a while), the OSD recovers, but
>> now has to deal with being marked down in the monitor maps.
>>
>> The following systemtap changes the kernel behavior to not do the 
>> lock-waiting:
>>
>> probe module("xfs").function("xfs_reclaim_inodes_ag").call {
>>  $flags = $flags & 2
>> }
>>
>> Save it to a file, and run with 'stap -v -g -d kernel
>> --suppress-time-limits '. We've been running this for a
>> few weeks, and the issue is completely gone.
>>
>> There was a writeup on the XFS mailing list a while ago about the same
>> issue ( http://www.spinics.net/lists/linux-xfs/msg01541.html ), but
>> unfortunately it didn't result in consensus on a patch. This problem
>> won't exist in BlueStore, so we consider the systemtap approach a
>> workaround until we're ready to deploy BlueStore.
>>
>> - Thorvald
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] virt-install into rbd hangs during Anaconda package installation

2017-02-08 Thread Shinobu Kinjo

If you would be able to reproduce the issue intentionally under
particular condition which I have no idea about at the moment, it
would be helpful.

There were some MLs previously regarding to *similar* issue.

 # google "libvirt rbd issue"


Regards,

On Tue, Feb 7, 2017 at 7:50 PM, Tracy Reed  wrote:
> On Tue, Feb 07, 2017 at 12:25:08AM PST, koukou73gr spake thusly:
>> On 2017-02-07 10:11, Tracy Reed wrote:
>> > Weird. Now the VMs that were hung in interruptable wait state have now
>> > disappeared. No idea why.
>>
>> Have you tried the same procedure but with local storage instead?
>
> Yes. I have local storage and iSCSI storage and they both install just
> fine.
>
> --
> Tracy Reed
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS read IO caching, where it is happining?

2017-02-07 Thread Shinobu Kinjo

On Wed, Feb 8, 2017 at 3:05 PM, Ahmed Khuraidah <abushi...@gmail.com> wrote:

> Hi Shinobu, I am using SUSE packages in scope of their latest SUSE
> Enterprise Storage 4 and following documentation (method of deployment:
> ceph-deploy)
> But, I was able reproduce this issue on Ubuntu 14.04 with Ceph
> repositories (also latest Jewel and ceph-deploy) as well.
>

Community Ceph packages are running on ubuntu box, right?
If so, please do `ceph -v` on ubuntu box.

And also please provide us with same issue which you hit on suse box.


>
> On Wed, Feb 8, 2017 at 3:03 AM, Shinobu Kinjo <ski...@redhat.com> wrote:
>
>> Are you using opensource Ceph packages or suse ones?
>>
>> On Sat, Feb 4, 2017 at 3:54 PM, Ahmed Khuraidah <abushi...@gmail.com>
>> wrote:
>>
>>> I Have opened ticket on http://tracker.ceph.com/
>>>
>>> http://tracker.ceph.com/issues/18816
>>>
>>>
>>> My client and server kernels are the same, here is info:
>>> # lsb_release -a
>>> LSB Version:n/a
>>> Distributor ID: SUSE
>>> Description:SUSE Linux Enterprise Server 12 SP2
>>> Release:12.2
>>> Codename:   n/a
>>> # uname -a
>>> Linux cephnode 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016
>>> (2d3e9d4) x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> Thanks
>>>
>>> On Fri, Feb 3, 2017 at 1:59 PM, John Spray <jsp...@redhat.com> wrote:
>>>
>>>> On Fri, Feb 3, 2017 at 8:07 AM, Ahmed Khuraidah <abushi...@gmail.com>
>>>> wrote:
>>>> > Thank you guys,
>>>> >
>>>> > I tried to add option "exec_prerun=echo 3 > /proc/sys/vm/drop_caches"
>>>> as
>>>> > well as "exec_prerun=echo 3 | sudo tee /proc/sys/vm/drop_caches", but
>>>> > despite FIO corresponds that command was executed, there are no
>>>> changes.
>>>> >
>>>> > But, I caught very strange another behavior. If I will run my FIO test
>>>> > (speaking about 3G file case) twice, after the first run FIO will
>>>> create my
>>>> > file and print a lot of IOps as described already, but if- before
>>>> second
>>>> > run- drop cache (by root echo 3 > /proc/sys/vm/drop_caches) I broke
>>>> will end
>>>> > with broken MDS:
>>>> >
>>>> > --- begin dump of recent events ---
>>>> >  0> 2017-02-03 02:34:41.974639 7f7e8ec5e700 -1 *** Caught signal
>>>> > (Aborted) **
>>>> >  in thread 7f7e8ec5e700 thread_name:ms_dispatch
>>>> >
>>>> >  ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318e
>>>> d0dcedc8734)
>>>> >  1: (()+0x5142a2) [0x557c51e092a2]
>>>> >  2: (()+0x10b00) [0x7f7e95df2b00]
>>>> >  3: (gsignal()+0x37) [0x7f7e93ccb8d7]
>>>> >  4: (abort()+0x13a) [0x7f7e93aa]
>>>> >  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> > const*)+0x265) [0x557c51f133d5]
>>>> >  6: (MutationImpl::~MutationImpl()+0x28e) [0x557c51bb9e1e]
>>>> >  7: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_relea
>>>> se()+0x39)
>>>> > [0x557c51b2ccf9]
>>>> >  8: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned
>>>> long, bool,
>>>> > unsigned long, utime_t)+0x9a7) [0x557c51ca2757]
>>>> >  9: (Locker::remove_client_cap(CInode*, client_t)+0xb1)
>>>> [0x557c51ca38f1]
>>>> >  10: (Locker::_do_cap_release(client_t, inodeno_t, unsigned long,
>>>> unsigned
>>>> > int, unsigned int)+0x90d) [0x557c51ca424d]
>>>> >  11: (Locker::handle_client_cap_release(MClientCapRelease*)+0x1cc)
>>>> > [0x557c51ca449c]
>>>> >  12: (MDSRank::handle_deferrable_message(Message*)+0xc1c)
>>>> [0x557c51b33d3c]
>>>> >  13: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x557c51b3c991]
>>>> >  14: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x557c51b3dae5]
>>>> >  15: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x557c51b25703]
>>>> >  16: (DispatchQueue::entry()+0x78b) [0x557c5200d06b]
>>>> >  17: (DispatchQueue::DispatchThread::entry()+0xd) [0x557c51ee5dcd]
>>>> >  18: (()+0x8734) [0x7f7e95dea734]
>>>> >  19: (clone()+0x6d) [0x7f7e93d80d3d]
>>>> >  NOTE: a copy of the executable, or `objdump -rdS ` is

Re: [ceph-users] CephFS read IO caching, where it is happining?

2017-02-07 Thread Shinobu Kinjo

Are you using opensource Ceph packages or suse ones?

On Sat, Feb 4, 2017 at 3:54 PM, Ahmed Khuraidah <abushi...@gmail.com> wrote:

> I Have opened ticket on http://tracker.ceph.com/
>
> http://tracker.ceph.com/issues/18816
>
>
> My client and server kernels are the same, here is info:
> # lsb_release -a
> LSB Version:n/a
> Distributor ID: SUSE
> Description:SUSE Linux Enterprise Server 12 SP2
> Release:12.2
> Codename:   n/a
> # uname -a
> Linux cephnode 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016
> (2d3e9d4) x86_64 x86_64 x86_64 GNU/Linux
>
>
> Thanks
>
> On Fri, Feb 3, 2017 at 1:59 PM, John Spray <jsp...@redhat.com> wrote:
>
>> On Fri, Feb 3, 2017 at 8:07 AM, Ahmed Khuraidah <abushi...@gmail.com>
>> wrote:
>> > Thank you guys,
>> >
>> > I tried to add option "exec_prerun=echo 3 > /proc/sys/vm/drop_caches" as
>> > well as "exec_prerun=echo 3 | sudo tee /proc/sys/vm/drop_caches", but
>> > despite FIO corresponds that command was executed, there are no changes.
>> >
>> > But, I caught very strange another behavior. If I will run my FIO test
>> > (speaking about 3G file case) twice, after the first run FIO will
>> create my
>> > file and print a lot of IOps as described already, but if- before second
>> > run- drop cache (by root echo 3 > /proc/sys/vm/drop_caches) I broke
>> will end
>> > with broken MDS:
>> >
>> > --- begin dump of recent events ---
>> >  0> 2017-02-03 02:34:41.974639 7f7e8ec5e700 -1 *** Caught signal
>> > (Aborted) **
>> >  in thread 7f7e8ec5e700 thread_name:ms_dispatch
>> >
>> >  ceph version 10.2.4-211-g12b091b (12b091b4a40947aa43919e71a318e
>> d0dcedc8734)
>> >  1: (()+0x5142a2) [0x557c51e092a2]
>> >  2: (()+0x10b00) [0x7f7e95df2b00]
>> >  3: (gsignal()+0x37) [0x7f7e93ccb8d7]
>> >  4: (abort()+0x13a) [0x7f7e93aa]
>> >  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x265) [0x557c51f133d5]
>> >  6: (MutationImpl::~MutationImpl()+0x28e) [0x557c51bb9e1e]
>> >  7: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_relea
>> se()+0x39)
>> > [0x557c51b2ccf9]
>> >  8: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
>> bool,
>> > unsigned long, utime_t)+0x9a7) [0x557c51ca2757]
>> >  9: (Locker::remove_client_cap(CInode*, client_t)+0xb1)
>> [0x557c51ca38f1]
>> >  10: (Locker::_do_cap_release(client_t, inodeno_t, unsigned long,
>> unsigned
>> > int, unsigned int)+0x90d) [0x557c51ca424d]
>> >  11: (Locker::handle_client_cap_release(MClientCapRelease*)+0x1cc)
>> > [0x557c51ca449c]
>> >  12: (MDSRank::handle_deferrable_message(Message*)+0xc1c)
>> [0x557c51b33d3c]
>> >  13: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x557c51b3c991]
>> >  14: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x557c51b3dae5]
>> >  15: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x557c51b25703]
>> >  16: (DispatchQueue::entry()+0x78b) [0x557c5200d06b]
>> >  17: (DispatchQueue::DispatchThread::entry()+0xd) [0x557c51ee5dcd]
>> >  18: (()+0x8734) [0x7f7e95dea734]
>> >  19: (clone()+0x6d) [0x7f7e93d80d3d]
>> >  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to
>> > interpret this.
>>
>> Oops!  Please could you open a ticket on tracker.ceph.com, with this
>> backtrace, the client versions, any non-default config settings, and
>> the series of operations that led up to it.
>>
>> Thanks,
>> John
>>
>> > "
>> >
>> > On Thu, Feb 2, 2017 at 9:30 PM, Shinobu Kinjo <ski...@redhat.com>
>> wrote:
>> >>
>> >> You may want to add this in your FIO recipe.
>> >>
>> >>  * exec_prerun=echo 3 > /proc/sys/vm/drop_caches
>> >>
>> >> Regards,
>> >>
>> >> On Fri, Feb 3, 2017 at 12:36 AM, Wido den Hollander <w...@42on.com>
>> wrote:
>> >> >
>> >> >> Op 2 februari 2017 om 15:35 schreef Ahmed Khuraidah
>> >> >> <abushi...@gmail.com>:
>> >> >>
>> >> >>
>> >> >> Hi all,
>> >> >>
>> >> >> I am still confused about my CephFS sandbox.
>> >> >>
>> >> >> When I am performing simple FIO test into single file with size of
>> 3G I
>> >> >> have too many IOps:
>> >> >&

Re: [ceph-users] ceph df : negative numbers

2017-02-06 Thread Shinobu Kinjo

I've not been able to reproduce an issue with exactly same version of
your cluster.

./ceph -v
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

./rados df | grep cephfs
cephfs_data_a 409600  10000
000  100   409600

./rados -p cephfs_data_a ls | wc -l
100

If you could reproduce an issue and let us share procedure, that would
be definitely help.

Will try again.

On Tue, Feb 7, 2017 at 2:01 AM, Florent B <flor...@coppint.com> wrote:
> On 02/06/2017 05:49 PM, Shinobu Kinjo wrote:
>> How about *pve01-rbd01*?
>>
>>  * rados -p pve01-rbd01 ls | wc -l
>>
>> ?
>
> # rados -p pve01-rbd01 ls | wc -l
> 871
>
> # ceph df
> GLOBAL:
> SIZE  AVAIL RAW USED %RAW USED
> 5173G 5146G   27251M  0.51
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> data0   0 0 2985G   0
> metadata1  59178k 0 2985G 114
> pve01-rbd01 5   2572M  0.08 2985G 852
> cephfs016  39059k 0 2985G 120

Can you do just `df -h` on your box mounting cephfs or `ceph -s` on
your one of the MON hosts?

>
>
> And I saw there's a huge difference between data "used" and "raw used" :
> 27251M is not the sum of all pools (including copies)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph df : negative numbers

2017-02-06 Thread Shinobu Kinjo

How about *pve01-rbd01*?

 * rados -p pve01-rbd01 ls | wc -l

?

On Mon, Feb 6, 2017 at 9:40 PM, Florent B  wrote:
> On 02/06/2017 11:12 AM, Wido den Hollander wrote:
>>> Op 6 februari 2017 om 11:10 schreef Florent B :
>>>
>>>
>>> # ceph -v
>>> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>>
>>> (officiel Ceph packages for Jessie)
>>>
>>>
>>> Yes I recently adjusted pg_num, but all objects were correctly rebalanced.
>>>
>>> Then a manually deleted some objects from this pool.
>> A scrub should correct this for you.
>>
>> Wido
>
> After scrub and deep-scrub every PG, it now show 120 objects on pool
> cephfs01, but there is only 1 object :
>
> # ceph df
> GLOBAL:
> SIZE  AVAIL RAW USED %RAW USED
> 6038G 6011G   27582M  0.45
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> data0   0 0 2986G   0
> metadata1  59178k 0 2986G 114
> pve01-rbd01 5   2572M  0.07 2986G 852
> cephfs016  39059k 0 2986G 120
>
>
> # rados -p cephfs01 ls | wc -l
> 1
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph df : negative numbers

2017-02-04 Thread Shinobu Kinjo

On Sun, Feb 5, 2017 at 1:15 AM, John Spray  wrote:
> On Fri, Feb 3, 2017 at 5:28 PM, Florent B  wrote:
>> Hi everyone,
>>
>> On a Jewel test cluster I have :

please, `ceph -v`

>>
>> # ceph df
>> GLOBAL:
>> SIZE  AVAIL RAW USED %RAW USED
>> 6038G 6011G   27379M  0.44
>> POOLS:
>> NAMEID USED   %USED MAX AVAIL OBJECTS
>> data0   0 0 2986G   0
>> metadata1  58955k 0 2986G 115
>> pve01-rbd01 5   2616M  0.09 2986G 862
>> cephfs016 15E 0 2986G-315
>>
>>
>> # rados -p cephfs01 ls
>> 1034339.
>>
>>
>> Maybe I hit a bug ?
>
> I wonder if you had recently adjusted pg_num?  Those were the
> situations where we've seen this sort of issue before.
>
> John
>
>>
>> Flo
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS read IO caching, where it is happining?

2017-02-02 Thread Shinobu Kinjo

You may want to add this in your FIO recipe.

 * exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Regards,

On Fri, Feb 3, 2017 at 12:36 AM, Wido den Hollander  wrote:
>
>> Op 2 februari 2017 om 15:35 schreef Ahmed Khuraidah :
>>
>>
>> Hi all,
>>
>> I am still confused about my CephFS sandbox.
>>
>> When I am performing simple FIO test into single file with size of 3G I
>> have too many IOps:
>>
>> cephnode:~ # fio payloadrandread64k3G
>> test: (g=0): rw=randread, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio,
>> iodepth=2
>> fio-2.13
>> Starting 1 process
>> test: Laying out IO file(s) (1 file(s) / 3072MB)
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [277.8MB/0KB/0KB /s] [/0/0 iops]
>> [eta 00m:00s]
>> test: (groupid=0, jobs=1): err= 0: pid=3714: Thu Feb  2 07:07:01 2017
>>   read : io=3072.0MB, bw=181101KB/s, iops=2829, runt= 17370msec
>> slat (usec): min=4, max=386, avg=12.49, stdev= 6.90
>> clat (usec): min=202, max=5673.5K, avg=690.81, stdev=361
>>
>>
>> But if I will change size to file to 320G, looks like I skip the cache:
>>
>> cephnode:~ # fio payloadrandread64k320G
>> test: (g=0): rw=randread, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio,
>> iodepth=2
>> fio-2.13
>> Starting 1 process
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [4740KB/0KB/0KB /s] [74/0/0 iops] [eta
>> 00m:00s]
>> test: (groupid=0, jobs=1): err= 0: pid=3624: Thu Feb  2 06:51:09 2017
>>   read : io=3410.9MB, bw=11641KB/s, iops=181, runt=300033msec
>> slat (usec): min=4, max=442, avg=14.43, stdev=10.07
>> clat (usec): min=98, max=286265, avg=10976.32, stdev=14904.82
>>
>>
>> For random write test such behavior not exists, there are almost the same
>> results - around 100 IOps.
>>
>> So my question: could please somebody clarify where this caching likely
>> happens and how to manage it?
>>
>
> The page cache of your kernel. The kernel will cache the file in memory and 
> perform read operations from there.
>
> Best way is to reboot your client between test runs. Although you can drop 
> kernel caches I always reboot to make sure nothing is cached locally.
>
> Wido
>
>> P.S.
>> This is latest SLES/Jewel based onenode setup which has:
>> 1 MON, 1 MDS (both data and metadata pools on SATA drive) and 1 OSD (XFS on
>> SATA and journal on SSD).
>> My FIO config file:
>> direct=1
>> buffered=0
>> ioengine=libaio
>> iodepth=2
>> runtime=300
>>
>> Thanks
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Shinobu Kinjo

 1.0
>>>> -6  65.49463 host ca-cn2
>>>>  1   5.45789 osd.1  down0  1.0
>>>>  8   5.45789 osd.8up  1.0  1.0
>>>> 13   5.45789 osd.13   up  1.0  1.0
>>>> 18   5.45789 osd.18   up  1.0  1.0
>>>> 22   5.45789 osd.22   up  1.0  1.0
>>>> 28   5.45789 osd.28   up  1.0  1.0
>>>> 31   5.45789 osd.31   up  1.0  1.0
>>>> 37   5.45789 osd.37   up  1.0  1.0
>>>> 39   5.45789 osd.39   up  1.0  1.0
>>>> 44   5.45789 osd.44   up  1.0  1.0
>>>> 48   5.45789 osd.48   up  1.0  1.0
>>>> 54   5.45789 osd.54   up  1.0  1.0
>>>>
>>>> health HEALTH_ERR
>>>> 69 pgs are stuck inactive for more than 300 seconds
>>>> 69 pgs incomplete
>>>> 69 pgs stuck inactive
>>>> 69 pgs stuck unclean
>>>> 512 requests are blocked > 32 sec
>>>>  monmap e2: 5 mons at
>>>> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>>>> election epoch 8, quorum 0,1,2,3,4
>>>> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>>>> mgr active: ca-cn4 standbys: ca-cn2, ca-cn5, ca-cn3, ca-cn1
>>>>  osdmap e406: 60 osds: 59 up, 59 in; 69 remapped pgs
>>>> flags sortbitwise,require_jewel_osds,require_kraken_osds
>>>>   pgmap v23018: 1024 pgs, 1 pools, 3892 GB data, 7910 kobjects
>>>> 6074 GB used, 316 TB / 322 TB avail
>>>>  955 active+clean
>>>>   69 remapped+incomplete
>>>>
>>>> Thanks,
>>>> Muthu
>>>>
>>>>
>>>> On 31 January 2017 at 02:54, Gregory Farnum <gfar...@redhat.com> wrote:
>>>>>
>>>>> You might also check out "ceph osd tree" and crush dump and make sure
>>>>> they look the way you expect.
>>>>>
>>>>> On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum <gfar...@redhat.com>
>>>>> wrote:
>>>>> > On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
>>>>> > <muthiah.muthus...@gmail.com> wrote:
>>>>> >> Hi All,
>>>>> >>
>>>>> >> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled  .
>>>>> >> When
>>>>> >> an OSD is down the cluster goes to ERROR state even when the cluster
>>>>> >> is n+1
>>>>> >> . No recovery happening.
>>>>> >>
>>>>> >> health HEALTH_ERR
>>>>> >> 75 pgs are stuck inactive for more than 300 seconds
>>>>> >> 75 pgs incomplete
>>>>> >> 75 pgs stuck inactive
>>>>> >> 75 pgs stuck unclean
>>>>> >>  monmap e2: 5 mons at
>>>>> >>
>>>>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>>>>> >> election epoch 10, quorum 0,1,2,3,4
>>>>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>>>>> >> mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2
>>>>> >>  osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs
>>>>> >> flags sortbitwise,require_jewel_osds,require_kraken_osds
>>>>> >>   pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects
>>>>> >> 39976 GB used, 282 TB / 322 TB avail
>>>>> >>  941 active+clean
>>>>> >>   75 remapped+incomplete
>>>>> >>8 active+clean+scrubbing
>>>>> >>
>>>>> >> this seems to be an issue with bluestore , recovery not happening
>>>>> >> properly
>>>>> >> with EC .
>>>>> >
>>>>> > It's possible but it seems a lot more likely this is some kind of
&g

Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Shinobu Kinjo

On Wed, Feb 1, 2017 at 1:51 AM, Joao Eduardo Luis  wrote:
> On 01/31/2017 03:35 PM, David Turner wrote:
>>
>> If you do have a large enough drive on all of your mons (and always
>> intend to do so) you can increase the mon store warning threshold in the
>> config file so that it no longer warns at 15360 MB.
>
>
> And if you so decide to go that route, please be aware that the monitors are
> known to misbehave if their store grows too much.

Would you please elaborate on what *misbehave* means? Do you have any
pointers to tell us more specifically?

>
> Those warnings have been put in place to let the admin know that action may
> be needed, hopefully in time to avoid abhorrent behaviour.
>
>   -Joao
>
>
>> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido
>> den Hollander [w...@42on.com]
>> Sent: Tuesday, January 31, 2017 2:35 AM
>> To: Martin Palma; CEPH list
>> Subject: Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB
>>>
>>> = 15360 MB -- 94% avail
>>
>>
>>> Op 31 januari 2017 om 10:22 schreef Martin Palma :
>>>
>>>
>>> Hi all,
>>>
>>> our cluster is currently performing a big expansion and is in recovery
>>> mode (we doubled in size and osd# from 600 TB to 1,2 TB).
>>>
>>
>> Yes, that is to be expected. When not all PGs are active+clean the MONs
>> will not trim their datastore.
>>
>>> Now we get the following message from our monitor nodes:
>>>
>>> mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail
>>>
>>> Reading [0] it says that it is normal in a state of active data
>>> rebalance and after it is finished it will be compacted.
>>>
>>> Should we wait until the recovery is finished or should we perform
>>> "ceph tell mon.{id} compact" now during recovery?
>>>
>>
>> Mainly wait and make sure there is enough disk space. You can try a
>> compact, but that can take the mon offline temp.
>>
>> Just make sure you have enough diskspace :)
>>
>> Wido
>>
>>> Best,
>>> Martin
>>>
>>> [0] https://access.redhat.com/solutions/1982273
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-30 Thread Shinobu Kinjo

First off, the followings, please.

 * ceph -s
 * ceph osd tree
 * ceph pg dump

and

 * what you actually did with exact commands.

Regards,

On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín  wrote:
> Dear list,
>
> I'm having some big problems with my setup.
>
> I was trying to increase the global capacity by changing some osds by
> bigger ones. I changed them without wait the rebalance process finished,
> thinking the replicas were saved in other buckets, but I found a lot of
> PGs incomplete, so replicas of a PG were placed in a same bucket. I have
> assumed I have lost data because I zapped the disks and used in other tasks.
>
> My question is: what should I do to recover as much data as possible?
> I'm using the filesystem and RBD.
>
> Thank you so much,
>
> --
>
> Jose M. Martín
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph monitoring

2017-01-29 Thread Shinobu Kinjo

There were some related MLs.

Google this:

 [ceph-users] Ceph Plugin for Collectd

On Sun, Jan 29, 2017 at 8:43 AM, Marc Roos  wrote:
>
>
> Is there a doc that describes all the parameters that are published by
> collectd-ceph?
>
> Is there maybe a default grafana dashboard for influxdb? I found
> something for graphite, and modifying those.
>
>
>
> -Original Message-
> From: Patrick McGarry [mailto:pmcga...@redhat.com]
> Sent: donderdag 26 januari 2017 16:56
> To: Ceph Devel; Ceph-User
> Subject: [ceph-users] Ceph Tech Talk in ~2 hrs
>
> Hey cephers,
>
> Just a reminder that the 'Getting Started with Ceph Development' Ceph
> Tech Talk [0] is start in about 2 hours. Sage is going to walk through
> the process from start to finish, so if you have coworkers, friends, or
> anyone that might be interested in getting started with Ceph, please
> send them our way!
>
> If you are already experienced in Ceph Development, feel free to use the
> Q to discuss ways of improving the experience for new developers and
> what you might want to see changed. Thanks!
>
> [0] http://ceph.com/ceph-tech-talks/
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-20 Thread Shinobu Kinjo

`ceph pg dump` should show you something like:

 * active+undersized+degraded ... [NONE,3,2,4,1]3[NONE,3,2,4,1]

Sam,

Am I wrong? Or is it up to something else?


On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum  wrote:
> I'm pretty sure the default configs won't let an EC PG go active with
> only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not
> certain). Running an "n+1" EC config is just not a good idea.
> For testing you could probably adjust this with the equivalent of
> min_size for EC pools, but I don't know the parameters off the top of
> my head.
> -Greg
>
> On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah
>  wrote:
>> Hi ,
>>
>> We are validating kraken 11.2.0 with bluestore  on 5 node cluster with EC
>> 4+1.
>>
>> When an OSD is down , the peering is not happening and ceph health status
>> moved to ERR state after few mins. This was working in previous development
>> releases. Any additional configuration required in v11.2.0
>>
>> Following is our ceph configuration:
>>
>> mon_osd_down_out_interval = 30
>> mon_osd_report_timeout = 30
>> mon_osd_down_out_subtree_limit = host
>> mon_osd_reporter_subtree_level = host
>>
>> and the recovery parameters set to default.
>>
>> [root@ca-cn1 ceph]# ceph osd crush show-tunables
>>
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 1,
>> "chooseleaf_stable": 1,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 54,
>> "profile": "jewel",
>> "optimal_tunables": 1,
>> "legacy_tunables": 0,
>> "minimum_required_version": "jewel",
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "has_v2_rules": 1,
>> "require_feature_tunables3": 1,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0,
>> "require_feature_tunables5": 1,
>> "has_v5_rules": 0
>> }
>>
>> ceph status:
>>
>>  health HEALTH_ERR
>> 173 pgs are stuck inactive for more than 300 seconds
>> 173 pgs incomplete
>> 173 pgs stuck inactive
>> 173 pgs stuck unclean
>>  monmap e2: 5 mons at
>> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>> election epoch 106, quorum 0,1,2,3,4
>> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>> mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3
>>  osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
>> flags sortbitwise,require_jewel_osds,require_kraken_osds
>>   pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects
>> 85199 GB used, 238 TB / 322 TB avail
>> 1868 active+clean
>>  173 remapped+incomplete
>>7 active+clean+scrubbing
>>
>> MON log:
>>
>> 2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster) log [INF] :
>> osd.54 out (down for 31.703786)
>> 2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1120
>> crush map has features 288250512065953792, adjusting msgr requires
>> 2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster) log [INF] :
>> osdmap e1120: 60 osds: 59 up, 59 in
>> 2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster) log [INF] :
>> pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6
>> active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail;
>> 21825 B/s rd, 163 MB/s wr, 2046 op/s
>> 2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1121
>> crush map has features 288250512065953792, adjusting msgr requires
>> 2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster) log [INF] :
>> osdmap e1121: 60 osds: 59 up, 59 in
>> 2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster) log [INF] :
>> pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean, 144
>> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB /
>> 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
>> 2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster) log [INF] :
>> pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean, 126
>> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB /
>> 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s
>>
>> Thanks,
>> Muthu
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph counters decrementing after changing pg_num

2017-01-20 Thread Shinobu Kinjo

What does `ceph -s` say?

On Sat, Jan 21, 2017 at 3:39 AM, Wido den Hollander  wrote:
>
>> Op 20 januari 2017 om 17:17 schreef Kai Storbeck :
>>
>>
>> Hello ceph users,
>>
>> My graphs of several counters in our Ceph cluster are showing abnormal
>> behaviour after changing the pg_num and pgp_num respectively.
>
> What counters exactly? Like pg information? It could be that it needs a scrub 
> on all PGs before that information is corrected. This scrub will trigger 
> automatically.
>
>>
>> We're using "http://eu.ceph.com/debian-hammer/ jessie/main".
>>
>>
>> Is this a bug, or will the counters stabilize at some time in the near
>> future? Or, is this otherwise fixable by "turning it off and on again"?
>>
>>
>> Regards,
>> Kai
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problems with http://tracker.ceph.com/?

2017-01-19 Thread Shinobu Kinjo

On Fri, Jan 20, 2017 at 2:54 AM, Brian Andrus
 wrote:
> Much of the Ceph project VMs (including tracker.ceph.com) is currently
> hosted on DreamCompute. The migration to our new service/cluster that was
> completed on 2017-01-17, the Ceph project was somehow enabled in our new
> OpenStack project without enabling a service in our billing system (this
> shouldn't be possible).
>
> Since tenant_deletes (started by customers leaving for example) often fail,
> we run daily audits that root out accounts without Service Instances in our
> billing system, and issue a tenant delete in OpenStack. In hindsight, it
> should probably look for accounts that are INACTIVE, and not non-existent. I
> have enabled a Service Instance for the DreamCompute service, so it should
> NOT happen again. This did happen yesterday as well, but we incorrectly
> assessed the situation and thus happened again today.
>
> The good news is the tenant delete failed. The bad news is we're looking for
> the tracker volume now, which is no longer present in the Ceph project.
>
> The Ceph project guys are understandably upset, and from the DreamHost side,
> we're currently looking to recover the tracker volume.

Yeah, pretty much...

>
> On Thu, Jan 19, 2017 at 8:51 AM, Sean Redmond 
> wrote:
>>
>> Looks like there maybe an issue with the ceph.com and tracker.ceph.com
>> website at the moment
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Brian Andrus
> Cloud Systems Engineer
> DreamHost, LLC
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo

Now I'm totally clear.

Regards,

On Fri, Jan 13, 2017 at 6:59 AM, Samuel Just  wrote:
> That would work.
> -Sam
>
> On Thu, Jan 12, 2017 at 1:40 PM, Gregory Farnum  wrote:
>> On Thu, Jan 12, 2017 at 1:37 PM, Samuel Just  wrote:
>>> Oh, this is basically working as intended.  What happened is that the
>>> mon died before the pending map was actually committed.  The OSD has a
>>> timeout (5s) after which it stops trying to mark itself down and just
>>> dies (so that OSDs don't hang when killed).  It took a bit longer than
>>> 5s for the remaining 2 mons to form a new quorum, so they never got
>>> the MOSDMarkMeDown message so we had to do it the slow way.  I would
>>> prefer this behavior to changing the mon shutdown process or making
>>> the OSDs wait longer, so I think that's it.  If you want to avoid
>>> disruption with colocated mons and osds, stop the osds first
>>
>> We can probably make our systemd scripts do this automatically? Or at
>> least, there's a Ceph super-task thingy and I bet we can order the
>> shutdown so it waits to kill the monitor until all the OSDs processes
>> have ended.
>>
>>> and then
>>> reboot.
>>
>>
>>
>>> -Sam
>>>
>>> On Thu, Jan 12, 2017 at 1:24 PM, Udo Lembke  wrote:
 Hi Sam,

 the webfrontend of an external ceph-dash was interrupted till the node
 was up again. The reboot took app. 5 min.

 But  the ceph -w output shows some IO much faster. I will look tomorrow
 at the output again and create an ticket.


 Thanks


 Udo


 On 12.01.2017 20:02, Samuel Just wrote:
> How long did it take for the cluster to recover?
> -Sam
>
> On Thu, Jan 12, 2017 at 10:54 AM, Gregory Farnum  
> wrote:
>> On Thu, Jan 12, 2017 at 2:03 AM,   wrote:
>>> Hi all,
>>> I had just reboot all 3 nodes (one after one) of an small Proxmox-VE
>>> ceph-cluster. All nodes are mons and have two OSDs.
>>> During reboot of one node, ceph stucks longer than normaly and I look 
>>> in the
>>> "ceph -w" output to find the reason.
>>>
>>> This is not the reason, but I'm wonder why "osd marked itself down" 
>>> will not
>>> recognised by the mons:
>>> 2017-01-12 10:18:13.584930 mon.0 [INF] osd.5 marked itself down
>>> 2017-01-12 10:18:13.585169 mon.0 [INF] osd.4 marked itself down
>>> 2017-01-12 10:18:22.809473 mon.2 [INF] mon.2 calling new monitor 
>>> election
>>> 2017-01-12 10:18:22.847548 mon.0 [INF] mon.0 calling new monitor 
>>> election
>>> 2017-01-12 10:18:27.879341 mon.0 [INF] mon.0@0 won leader election with
>>> quorum 0,2
>>> 2017-01-12 10:18:27.889797 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 
>>> 0,2
>>> 0,2
>>> 2017-01-12 10:18:27.952672 mon.0 [INF] monmap e3: 3 mons at
>>> {0=10.132.7.11:6789/0,1=10.132.7.12:6789/0,2=10.132.7.13:6789/0}
>>> 2017-01-12 10:18:27.953410 mon.0 [INF] pgmap v4800799: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 239 
>>> kB/s
>>> wr, 15 op/s
>>> 2017-01-12 10:18:27.953453 mon.0 [INF] fsmap e1:
>>> 2017-01-12 10:18:27.953787 mon.0 [INF] osdmap e2053: 6 osds: 6 up, 6 in
>>> 2017-01-12 10:18:29.013968 mon.0 [INF] pgmap v4800800: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 
>>> 73018 B/s
>>> wr, 12 op/s
>>> 2017-01-12 10:18:30.086787 mon.0 [INF] pgmap v4800801: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 59 
>>> B/s
>>> rd, 135 kB/s wr, 15 op/s
>>> 2017-01-12 10:18:34.559509 mon.0 [INF] pgmap v4800802: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 184 
>>> B/s
>>> rd, 189 kB/s wr, 7 op/s
>>> 2017-01-12 10:18:35.623838 mon.0 [INF] pgmap v4800803: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
>>> 2017-01-12 10:18:39.580770 mon.0 [INF] pgmap v4800804: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
>>> 2017-01-12 10:18:39.681058 mon.0 [INF] osd.4 10.132.7.12:6800/4064 
>>> failed (2
>>> reporters from different host after 21.222945 >= grace 20.388836)
>>> 2017-01-12 10:18:39.681221 mon.0 [INF] osd.5 10.132.7.12:6802/4163 
>>> failed (2
>>> reporters from different host after 21.222970 >= grace 20.388836)
>>> 2017-01-12 10:18:40.612401 mon.0 [INF] pgmap v4800805: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
>>> 2017-01-12 10:18:40.670801 mon.0 [INF] osdmap e2054: 6 osds: 4 up, 6 in
>>> 2017-01-12 10:18:40.689302 mon.0 [INF] pgmap v4800806: 392 pgs: 392
>>> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
>>> 2017-01-12 10:18:41.730006 mon.0 [INF] osdmap e2055: 6 osds: 4

Re: [ceph-users] RBD v1 image format ...

2017-01-12 Thread Shinobu Kinjo

It would be more appreciated to provide users with evaluation results
of migration and recovery tools by QA to avoid any disaster on
production environment, and get agreement with them

 e.g.,
 #1 Scenarios we test
 #2 Images spec we use
 and some

Does it make sense, or too much?

Regards,


On Thu, Jan 12, 2017 at 1:01 PM, Jason Dillaman <jdill...@redhat.com> wrote:
> On Wed, Jan 11, 2017 at 10:43 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>> +2
>>  * Reduce manual operation as much as possible.
>>  * A recovery tool in case that we break something which would not
>> appear to us initially.
>
> I definitely agree that this is an overdue tool and we have an
> upstream feature ticket for tracking a possible solution for this [1].
> We won't remove the support for interacting with v1 images before we
> provide a path for migration. The Ceph core development team would
> really like to drop internal support for tmap operations, which are
> only utilized by RBD v1.
>
> [1] http://tracker.ceph.com/issues/18430
>
> --
> Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Why would "osd marked itself down" will not recognised?

2017-01-12 Thread Shinobu Kinjo

Sorry, I don't get your question.

Generally speaking, the MON maintains maps of the cluster state:

 * Monitor map
 * OSD map
 * PG map
 * CRUSH map

Regards,


On Thu, Jan 12, 2017 at 7:03 PM,   wrote:
> Hi all,
> I had just reboot all 3 nodes (one after one) of an small Proxmox-VE
> ceph-cluster. All nodes are mons and have two OSDs.
> During reboot of one node, ceph stucks longer than normaly and I look in the
> "ceph -w" output to find the reason.
>
> This is not the reason, but I'm wonder why "osd marked itself down" will not
> recognised by the mons:
> 2017-01-12 10:18:13.584930 mon.0 [INF] osd.5 marked itself down
> 2017-01-12 10:18:13.585169 mon.0 [INF] osd.4 marked itself down
> 2017-01-12 10:18:22.809473 mon.2 [INF] mon.2 calling new monitor election
> 2017-01-12 10:18:22.847548 mon.0 [INF] mon.0 calling new monitor election
> 2017-01-12 10:18:27.879341 mon.0 [INF] mon.0@0 won leader election with
> quorum 0,2
> 2017-01-12 10:18:27.889797 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,2
> 0,2
> 2017-01-12 10:18:27.952672 mon.0 [INF] monmap e3: 3 mons at
> {0=10.132.7.11:6789/0,1=10.132.7.12:6789/0,2=10.132.7.13:6789/0}
> 2017-01-12 10:18:27.953410 mon.0 [INF] pgmap v4800799: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 239 kB/s
> wr, 15 op/s
> 2017-01-12 10:18:27.953453 mon.0 [INF] fsmap e1:
> 2017-01-12 10:18:27.953787 mon.0 [INF] osdmap e2053: 6 osds: 6 up, 6 in
> 2017-01-12 10:18:29.013968 mon.0 [INF] pgmap v4800800: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 73018 B/s
> wr, 12 op/s
> 2017-01-12 10:18:30.086787 mon.0 [INF] pgmap v4800801: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 59 B/s
> rd, 135 kB/s wr, 15 op/s
> 2017-01-12 10:18:34.559509 mon.0 [INF] pgmap v4800802: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail; 184 B/s
> rd, 189 kB/s wr, 7 op/s
> 2017-01-12 10:18:35.623838 mon.0 [INF] pgmap v4800803: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
> 2017-01-12 10:18:39.580770 mon.0 [INF] pgmap v4800804: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
> 2017-01-12 10:18:39.681058 mon.0 [INF] osd.4 10.132.7.12:6800/4064 failed (2
> reporters from different host after 21.222945 >= grace 20.388836)
> 2017-01-12 10:18:39.681221 mon.0 [INF] osd.5 10.132.7.12:6802/4163 failed (2
> reporters from different host after 21.222970 >= grace 20.388836)
> 2017-01-12 10:18:40.612401 mon.0 [INF] pgmap v4800805: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
> 2017-01-12 10:18:40.670801 mon.0 [INF] osdmap e2054: 6 osds: 4 up, 6 in
> 2017-01-12 10:18:40.689302 mon.0 [INF] pgmap v4800806: 392 pgs: 392
> active+clean; 567 GB data, 1697 GB used, 9445 GB / 11142 GB avail
> 2017-01-12 10:18:41.730006 mon.0 [INF] osdmap e2055: 6 osds: 4 up, 6 in
>
> Why trust the mon not the osd? In this case the osdmap will be right app. 26
> seconds earlier (the pgmap at 10:18:27.953410 is wrong).
>
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
>
> regards
>
> Udo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD v1 image format ...

2017-01-11 Thread Shinobu Kinjo

On Thu, Jan 12, 2017 at 12:28 PM, Christian Balzer  wrote:
>
> Hello,
>
> On Wed, 11 Jan 2017 11:09:46 -0500 Jason Dillaman wrote:
>
>> I would like to propose that starting with the Luminous release of Ceph,
>> RBD will no longer support the creation of v1 image format images via the
>> rbd CLI and librbd.
>>
>> We previously made the v2 image format the default and deprecated the v1
>> format under the Jewel release. It is important to note that we would still
>> support the use of v1 images by the CLI, VMs, etc for Luminous.
>>
>> Any objections or concerns?
>>
> As with the "deprecation" of v1, a transparent and non-downtime conversion
> tool from v1 to v2 would significantly reduce the number of people who are
> likely to take issue with this

 +2
 * Reduce manual operation as much as possible.
 * A recovery tool in case that we break something which would not
appear to us initially.

>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD v1 image format ...

2017-01-11 Thread Shinobu Kinjo

On Thu, Jan 12, 2017 at 2:41 AM, Ilya Dryomov <idryo...@gmail.com> wrote:
> On Wed, Jan 11, 2017 at 6:01 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>> It would be fine to not support v1 image format at all.
>>
>> But it would be probably friendly for users to provide them with more
>> understandable message when they face feature mismatch instead of just
>> displaying:
>>
>>  * rbd: map failed: (6) No such device or address
>>
>> For instance, show the following something in message:
>>
>>  * Execute rbd feature disable 
>> deep-flatten,fast-diff,object-map,exclusive-lock
>>
>>  or
>>
>>  * Set rbd default features to 3
>
> We already do that in jewel:
>
> $ sudo rbd map a
> rbd: sysfs write failed
> RBD image feature set mismatch. You can disable features unsupported
> by the kernel with "rbd feature disable".
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (6) No such device or address
>
> Enumerating the features to be disabled is hard because that depends on
> the version of the rbd.ko module and there is no interface for querying
> the feature set.  The man page can definitely be improved though: move
> --image-feature paragraph into its own section, expand the descriptions
> and add kernel version info.  I'll put that on my TODO list...

That would reasonably make sense.

Regards,

>
> Thanks,
>
> Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD v1 image format ...

2017-01-11 Thread Shinobu Kinjo

It would be fine to not support v1 image format at all.

But it would be probably friendly for users to provide them with more
understandable message when they face feature mismatch instead of just
displaying:

 * rbd: map failed: (6) No such device or address

For instance, show the following something in message:

 * Execute rbd feature disable 
deep-flatten,fast-diff,object-map,exclusive-lock

 or

 * Set rbd default features to 3

What do you think?
Does it make sense to you?

Regards,


On Thu, Jan 12, 2017 at 1:09 AM, Jason Dillaman  wrote:
> I would like to propose that starting with the Luminous release of Ceph, RBD
> will no longer support the creation of v1 image format images via the rbd
> CLI and librbd.
>
> We previously made the v2 image format the default and deprecated the v1
> format under the Jewel release. It is important to note that we would still
> support the use of v1 images by the CLI, VMs, etc for Luminous.
>
> Any objections or concerns?
>
> --
> Jason
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Shinobu Kinjo

Please refer to Jens's message.

Regards,

On Wed, Jan 11, 2017 at 8:53 PM, Marcus Müller <mueller.mar...@posteo.de> wrote:
> Ok, thank you. I thought I have to set ceph to a tunables profile. If I’m 
> right, then I just have to export the current crush map, edit it and import 
> it again, like:
>
> ceph osd getcrushmap -o /tmp/crush
> crushtool -i /tmp/crush --set-choose-total-tries 100 -o /tmp/crush.new
> ceph osd setcrushmap -i /tmp/crush.new
>
> Is this right or not?
>
> I started this cluster with these 3 nodes and each 3 osds. They are vms. I 
> knew that this cluster would expand very big, that’s the reason for my choice 
> for ceph. Now I can’t add more HDDs to the vm hypervisor and I want to 
> separate the nodes physically too. I bought a new node with these 4 drives 
> and now another node with only 2 drives. As I hear now from several people 
> this was not a good idea. For this reason, I bought now additional HDDs for 
> the new node, so I have two with the same amount of HDDs and size. In the 
> next 1-2 months I will get the third physical node and then everything should 
> be fine. But at this time I have no other option.
>
> May it help to solve this problem by adding the 2 new HDDs to the new ceph 
> node?
>
>
>
>> Am 11.01.2017 um 12:00 schrieb Brad Hubbard <bhubb...@redhat.com>:
>>
>> Your current problem has nothing to do with clients and neither does
>> choose_total_tries.
>>
>> Try setting just this value to 100 and see if your situation improves.
>>
>> Ultimately you need to take a good look at your cluster configuration
>> and how your crush map is configured to deal with that configuration
>> but start with choose_total_tries as it has the highest probability of
>> helping your situation. Your clients should not be affected.
>>
>> Could you explain the reasoning behind having three hosts with one ods
>> each, one host with two osds and one with four?
>>
>> You likely need to tweak your crushmap to handle this configuration
>> better or, preferably, move to a more uniform configuration.
>>
>>
>> On Wed, Jan 11, 2017 at 5:38 PM, Marcus Müller <mueller.mar...@posteo.de> 
>> wrote:
>>> I have to thank you all. You give free support and this already helps me.
>>> I’m not the one who knows ceph that good, but everyday it’s getting better
>>> and better ;-)
>>>
>>> According to the article Brad posted I have to change the ceph osd crush
>>> tunables. But there are two questions left as I already wrote:
>>>
>>> - According to
>>> http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables there
>>> are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2)
>>> wich would set choose_total_tries to 50. For the beginning better than 19.
>>> There I also see: "You can select a profile on a running cluster with the
>>> command: ceph osd crush tunables {PROFILE}“. My question on this is: Even if
>>> I run hammer, is it good and possible to set it to bobtail?
>>>
>>> - We can also read:
>>>  WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2
>>>  - v0.55 or later, including bobtail series (v0.56.x)
>>>  - Linux kernel version v3.9 or later (for the file system and RBD kernel
>>> clients)
>>>
>>> And here my question is: If my clients use librados (version hammer), do I
>>> need to have this required kernel version on the clients or the ceph nodes?
>>>
>>> I don’t want to have troubles at the end with my clients. Can someone answer
>>> me this, before I change the settings?
>>>
>>>
>>> Am 11.01.2017 um 06:47 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>
>>>
>>> Yeah, Sam is correct. I've not looked at crushmap. But I should have
>>> noticed what troublesome is with looking at `ceph osd tree`. That's my
>>> bad, sorry for that.
>>>
>>> Again please refer to:
>>>
>>> http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/
>>>
>>> Regards,
>>>
>>>
>>> On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sj...@redhat.com> wrote:
>>>
>>> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
>>> equal acting because crush is having trouble fulfilling the weights in
>>> your crushmap and the acting set is being padded out with an extra osd
>>> which happens to have the data to keep you up to the right number of
>>> replicas.  Please refer back to Brad's post.
>>> -Sam
>>&

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Shinobu Kinjo

Yeah, Sam is correct. I've not looked at crushmap. But I should have
noticed what troublesome is with looking at `ceph osd tree`. That's my
bad, sorry for that.

Again please refer to:

http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/

Regards,


On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sj...@redhat.com> wrote:
> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
> equal acting because crush is having trouble fulfilling the weights in
> your crushmap and the acting set is being padded out with an extra osd
> which happens to have the data to keep you up to the right number of
> replicas.  Please refer back to Brad's post.
> -Sam
>
> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.mar...@posteo.de> 
> wrote:
>> Ok, i understand but how can I debug why they are not running as they 
>> should? For me I thought everything is fine because ceph -s said they are up 
>> and running.
>>
>> I would think of a problem with the crush map.
>>
>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>
>>> e.g.,
>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>>> are properly running.
>>>
>>> # 9.7
>>> 
>>>>   "up": [
>>>>   7,
>>>>   3
>>>>   ],
>>>>   "acting": [
>>>>   7,
>>>>   3,
>>>>   0
>>>>   ],
>>> 
>>>
>>> Here is an example:
>>>
>>>  "up": [
>>>1,
>>>0,
>>>2
>>>  ],
>>>  "acting": [
>>>1,
>>>0,
>>>2
>>>   ],
>>>
>>> Regards,
>>>
>>>
>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.mar...@posteo.de> 
>>> wrote:
>>>>>
>>>>> That's not perfectly correct.
>>>>>
>>>>> OSD.0/1/2 seem to be down.
>>>>
>>>>
>>>> Sorry but where do you see this? I think this indicates that they are up:  
>>>>  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>>>>
>>>>
>>>>> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>>>
>>>>> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.mar...@posteo.de> 
>>>>> wrote:
>>>>>> All osds are currently up:
>>>>>>
>>>>>>health HEALTH_WARN
>>>>>>   4 pgs stuck unclean
>>>>>>   recovery 4482/58798254 objects degraded (0.008%)
>>>>>>   recovery 420522/58798254 objects misplaced (0.715%)
>>>>>>   noscrub,nodeep-scrub flag(s) set
>>>>>>monmap e9: 5 mons at
>>>>>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>>>>>   election epoch 478, quorum 0,1,2,3,4
>>>>>> ceph1,ceph2,ceph3,ceph4,ceph5
>>>>>>osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>>>>   flags noscrub,nodeep-scrub
>>>>>> pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>>>>>   15070 GB used, 40801 GB / 55872 GB avail
>>>>>>   4482/58798254 objects degraded (0.008%)
>>>>>>   420522/58798254 objects misplaced (0.715%)
>>>>>>316 active+clean
>>>>>>  4 active+remapped
>>>>>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>>>>>>
>>>>>> This did not chance for two days or so.
>>>>>>
>>>>>>
>>>>>> By the way, my ceph osd df now looks like this:
>>>>>>
>>>>>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>>>>>> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>>>>>> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>>>>>> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>>>>>> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>>>>>> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>>>>>> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>>>>>> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>>>>>> 7 9.75499  1.0  7450G  2497G

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo

e.g.,
OSD7 / 3 / 0 are in the same acting set. They should be up, if they
are properly running.

# 9.7
 
>"up": [
>7,
>3
>],
>"acting": [
>7,
>3,
>0
>],
 

Here is an example:

  "up": [
1,
0,
2
  ],
  "acting": [
1,
0,
2
   ],

Regards,


On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.mar...@posteo.de> wrote:
>>
>> That's not perfectly correct.
>>
>> OSD.0/1/2 seem to be down.
>
>
> Sorry but where do you see this? I think this indicates that they are up:   
> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>
>
>> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>
>> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.mar...@posteo.de> 
>> wrote:
>>> All osds are currently up:
>>>
>>> health HEALTH_WARN
>>>4 pgs stuck unclean
>>>recovery 4482/58798254 objects degraded (0.008%)
>>>recovery 420522/58798254 objects misplaced (0.715%)
>>>noscrub,nodeep-scrub flag(s) set
>>> monmap e9: 5 mons at
>>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>>election epoch 478, quorum 0,1,2,3,4
>>> ceph1,ceph2,ceph3,ceph4,ceph5
>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>flags noscrub,nodeep-scrub
>>>  pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>>15070 GB used, 40801 GB / 55872 GB avail
>>>4482/58798254 objects degraded (0.008%)
>>>420522/58798254 objects misplaced (0.715%)
>>> 316 active+clean
>>>   4 active+remapped
>>>  client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>>>
>>> This did not chance for two days or so.
>>>
>>>
>>> By the way, my ceph osd df now looks like this:
>>>
>>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>>> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>>> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>>> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>>> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>>> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>>> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>>> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>>> 7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
>>> 8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
>>>  TOTAL 55872G 15071G 40801G 26.97
>>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>>>
>>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
>>> also think this is no problem and ceph just clears everything up after
>>> backfilling.
>>>
>>>
>>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>
>>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>>>
>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>
>>>
>>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
>>
>> That's not perfectly correct.
>>
>> OSD.0/1/2 seem to be down.
>>
>>> like related to ?:
>>>
>>> Ceph1, ceph2 and ceph3 are vms on one physical host
>>>
>>>
>>> Are those OSDs running on vm instances?
>>>
>>> # 9.7
>>> 
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>  7,
>>>  3
>>>  ],
>>>  "acting": [
>>>  7,
>>>  3,
>>>  0
>>>  ],
>>>
>>> 
>>>
>>> # 7.84
>>> 
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>> "up": [
>>>  4,
>>>  8
>>>  ],
>>>  "acting": [
>>>  4,
>>>  8,
>>>  1
>>>  ],
>>>
>>> 
>>>
>>> # 8.1b
>>> 
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>  4,
>>>  7
>>>  ],
>>>  "acting": [
>>>  4,
>>>  7,
>>>  2
>>>  ],
>>>
>>> 
>>>
>>> # 7.7a
>>> 
>>>
>>>  "state": "active+remapped",
>>>  "snap_trimq": "[]",
>>>  "epoch": 3114,
>>>  "up": [
>>>  7,
>>>  4
>>>  ],
>>>  "acting": [
>>>  7,
>>>  4,
>>>  2
>>>  ],
>>>
>>> 
>>>
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo

On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.mar...@posteo.de> wrote:
> All osds are currently up:
>
>  health HEALTH_WARN
> 4 pgs stuck unclean
> recovery 4482/58798254 objects degraded (0.008%)
> recovery 420522/58798254 objects misplaced (0.715%)
> noscrub,nodeep-scrub flag(s) set
>  monmap e9: 5 mons at
> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
> election epoch 478, quorum 0,1,2,3,4
> ceph1,ceph2,ceph3,ceph4,ceph5
>  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
> flags noscrub,nodeep-scrub
>   pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
> 15070 GB used, 40801 GB / 55872 GB avail
> 4482/58798254 objects degraded (0.008%)
> 420522/58798254 objects misplaced (0.715%)
>  316 active+clean
>4 active+remapped
>   client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>
> This did not chance for two days or so.
>
>
> By the way, my ceph osd df now looks like this:
>
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>  0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>  1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>  2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>  3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>  4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>  5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>  6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>  7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
>  8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
>   TOTAL 55872G 15071G 40801G 26.97
> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>
> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
> also think this is no problem and ceph just clears everything up after
> backfilling.
>
>
> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <ski...@redhat.com>:
>
> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>
> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>
>
> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something

That's not perfectly correct.

OSD.0/1/2 seem to be down.

> like related to ?:
>
> Ceph1, ceph2 and ceph3 are vms on one physical host
>
>
> Are those OSDs running on vm instances?
>
> # 9.7
> 
>
>   "state": "active+remapped",
>   "snap_trimq": "[]",
>   "epoch": 3114,
>   "up": [
>   7,
>   3
>   ],
>   "acting": [
>   7,
>   3,
>   0
>   ],
>
> 
>
> # 7.84
> 
>
>   "state": "active+remapped",
>   "snap_trimq": "[]",
>   "epoch": 3114,
>  "up": [
>   4,
>   8
>   ],
>   "acting": [
>   4,
>   8,
>   1
>   ],
>
> 
>
> # 8.1b
> 
>
>   "state": "active+remapped",
>   "snap_trimq": "[]",
>   "epoch": 3114,
>   "up": [
>   4,
>   7
>   ],
>   "acting": [
>   4,
>   7,
>   2
>   ],
>
> 
>
> # 7.7a
> 
>
>   "state": "active+remapped",
>   "snap_trimq": "[]",
>   "epoch": 3114,
>   "up": [
>   7,
>   4
>   ],
>   "acting": [
>   7,
>   4,
>   2
>   ],
>
> 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo

Looking at ``ceph -s`` you originally provided, all OSDs are up.

> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs

But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
like related to ?:

> Ceph1, ceph2 and ceph3 are vms on one physical host

Are those OSDs running on vm instances?

# 9.7
 
>"state": "active+remapped",
>"snap_trimq": "[]",
>"epoch": 3114,
>"up": [
>7,
>3
>],
>"acting": [
>7,
>3,
>0
>],
 

# 7.84
 
>"state": "active+remapped",
>"snap_trimq": "[]",
>"epoch": 3114,
>   "up": [
>4,
>8
>],
>"acting": [
>4,
>8,
>1
>],
 

# 8.1b
 
>"state": "active+remapped",
>"snap_trimq": "[]",
>"epoch": 3114,
>"up": [
>4,
>7
>],
>"acting": [
>4,
>7,
>2
>],
 

# 7.7a
 
>"state": "active+remapped",
>"snap_trimq": "[]",
>"epoch": 3114,
>"up": [
>7,
>4
>],
>"acting": [
>7,
>4,
>2
>],
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo

> pg 9.7 is stuck unclean for 512936.160212, current state active+remapped, 
> last acting [7,3,0]
> pg 7.84 is stuck unclean for 512623.894574, current state active+remapped, 
> last acting [4,8,1]
> pg 8.1b is stuck unclean for 513164.616377, current state active+remapped, 
> last acting [4,7,2]
> pg 7.7a is stuck unclean for 513162.316328, current state active+remapped, 
> last acting [7,4,2]

Please execute:

for pg in 9.7 7.84 8.1b 7.7a;do ceph pg $pg query; done

Regards,

On Tue, Jan 10, 2017 at 7:31 AM, Christian Wuerdig
 wrote:
>
>
> On Tue, Jan 10, 2017 at 10:22 AM, Marcus Müller 
> wrote:
>>
>> Trying google with "ceph pg stuck in active and remapped" points to a
>> couple of post on this ML typically indicating that it's a problem with the
>> CRUSH map and ceph being unable to satisfy the mapping rules. Your ceph -s
>> output indicates that your using replication of size 3 in your pools. You
>> also said you had a custom CRUSH map - can you post it?
>>
>>
>> I’ve sent the file to you, since I’m not sure if it contains sensitive
>> data. Yes I have replication of 3 and I did not customize the map by me.
>
>
> I received your map but I'm not familiar enough with the details to give any
> particular advise on this - I just suggested to post your map in case
> someone more familiar with the CRUSH details might be able to spot
> something. Brad just provided a pointer so that would be useful to try.
>
>>
>>
>>
>> I might be missing something here but I don't quite see how you come to
>> this statement. ceph osd df and ceph -s both show 16093 GB used and 39779 GB
>> out of 55872 GB available. The sum of the first 3 OSDs used space is, as you
>> stated, 6181 GB which is approx 38.4% so quite close to your target of 33%
>>
>>
>> Maybe I have to explain it another way:
>>
>> Directly after finishing the backfill I received this output:
>>
>>  health HEALTH_WARN
>> 4 pgs stuck unclean
>> recovery 1698/58476648 objects degraded (0.003%)
>> recovery 418137/58476648 objects misplaced (0.715%)
>> noscrub,nodeep-scrub flag(s) set
>>  monmap e9: 5 mons at
>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>> election epoch 464, quorum 0,1,2,3,4
>> ceph1,ceph2,ceph3,ceph4,ceph5
>>  osdmap e3086: 9 osds: 9 up, 9 in; 4 remapped pgs
>> flags noscrub,nodeep-scrub
>>   pgmap v9928160: 320 pgs, 3 pools, 4809 GB data, 19035 kobjects
>> 16093 GB used, 39779 GB / 55872 GB avail
>> 1698/58476648 objects degraded (0.003%)
>> 418137/58476648 objects misplaced (0.715%)
>>  316 active+clean
>>4 active+remapped
>>   client io 757 kB/s rd, 1 op/s
>>
>> # ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>>  0 1.28899  1.0  3724G  1924G  1799G 51.67 1.79
>>  1 1.57899  1.0  3724G  2143G  1580G 57.57 2.00
>>  2 1.68900  1.0  3724G  2114G  1609G 56.78 1.97
>>  3 6.78499  1.0  7450G  1234G  6215G 16.57 0.58
>>  4 8.3  1.0  7450G  1221G  6228G 16.40 0.57
>>  5 9.51500  1.0  7450G  1232G  6217G 16.54 0.57
>>  6 7.66499  1.0  7450G  1258G  6191G 16.89 0.59
>>  7 9.75499  1.0  7450G  2482G  4967G 33.33 1.16
>>  8 9.32999  1.0  7450G  2480G  4969G 33.30 1.16
>>   TOTAL 55872G 16093G 39779G 28.80
>> MIN/MAX VAR: 0.57/2.00  STDDEV: 17.54
>>
>> Here we can see, that the cluster is using 4809 GB data and has raw used
>> 16093GB. Or the other way, only 39779G available.
>>
>> Two days later I saw:
>>
>>  health HEALTH_WARN
>> 4 pgs stuck unclean
>> recovery 3486/58726035 objects degraded (0.006%)
>> recovery 420024/58726035 objects misplaced (0.715%)
>> noscrub,nodeep-scrub flag(s) set
>>  monmap e9: 5 mons at
>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>> election epoch 478, quorum 0,1,2,3,4
>> ceph1,ceph2,ceph3,ceph4,ceph5
>>  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>> flags noscrub,nodeep-scrub
>>   pgmap v9969059: 320 pgs, 3 pools, 4830 GB data, 19116 kobjects
>> 15150 GB used, 40722 GB / 55872 GB avail
>> 3486/58726035 objects degraded (0.006%)
>> 420024/58726035 objects misplaced (0.715%)
>>  316 active+clean
>>4 active+remapped
>>
>> # ceph osd df
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>>  0 1.28899  1.0  3724G  1696G  2027G 45.56 1.68
>>  1 1.57899  1.0  3724G  1705G  2018G 45.80 1.69
>>  2 1.68900  1.0  3724G  1794G  1929G 48.19 1.78
>>  3 6.78499  1.0  7450G  1239G  6210G 16.64 0.61
>>  4 8.3  1.0  7450G  1226G  6223G 16.46 0.61
>>  5 9.51500  1.0  7450G  1237G  6212G

Re: [ceph-users] Ceph pg active+clean+inconsistent

2017-01-09 Thread Shinobu Kinjo

According to the output you provided previously, OSD.51/90 might have
unfound object. To shuffle object again, you could do:

 # ceph osd set noout; ceph osd set nodown

 # systemctl restart ceph-osd@51

 * wait for OSD.51's process to be up

 # systemctl restart ceph-osd@90

 * wait for OSD.90's process to be up

 # ceph osd unset noout; ceph osd unset nodown

If your cluster is big enough, please do this very slowly.

>"might_have_unfound": [
>{
>"osd": "51",
>"status": "already probed"
>},
>{
>"osd": "90",
>"status": "already probed"
>}

Or you could also do to trigger scrubbing pg:

 # ceph pg scrub ${pg_id}

Regards,


On Tue, Jan 10, 2017 at 4:25 AM, Andras Pataki
<apat...@simonsfoundation.org> wrote:
> Yes, it doesn't cause issues, but I don't see any way to "repair" the
> problem.  One possible idea that I might do eventually if no solution is
> found is to copy the CephFS files in question and remove the ones with
> inconsistencies (which should remove the underlying rados objects).  But
> it'd be perhaps good to do some searching on how/why this problem came about
> before doing this.
>
> andras
>
>
>
> On 01/07/2017 06:48 PM, Shinobu Kinjo wrote:
>>
>> Sorry for the late.
>>
>> Are you still facing inconsistent pg status?
>>
>> On Wed, Jan 4, 2017 at 11:39 PM, Andras Pataki
>> <apat...@simonsfoundation.org> wrote:
>>>
>>> # ceph pg debug unfound_objects_exist
>>> FALSE
>>>
>>> Andras
>>>
>>>
>>> On 01/03/2017 11:38 PM, Shinobu Kinjo wrote:
>>>>
>>>> Would you run:
>>>>
>>>># ceph pg debug unfound_objects_exist
>>>>
>>>> On Wed, Jan 4, 2017 at 5:31 AM, Andras Pataki
>>>> <apat...@simonsfoundation.org> wrote:
>>>>>
>>>>> Here is the output of ceph pg query for one of hte
>>>>> active+clean+inconsistent
>>>>> PGs:
>>>>>
>>>>> {
>>>>>   "state": "active+clean+inconsistent",
>>>>>   "snap_trimq": "[]",
>>>>>   "epoch": 342982,
>>>>>   "up": [
>>>>>   319,
>>>>>   90,
>>>>>   51
>>>>>   ],
>>>>>   "acting": [
>>>>>   319,
>>>>>   90,
>>>>>   51
>>>>>   ],
>>>>>   "actingbackfill": [
>>>>>   "51",
>>>>>   "90",
>>>>>   "319"
>>>>>   ],
>>>>>   "info": {
>>>>>   "pgid": "6.92c",
>>>>>   "last_update": "342982'41304",
>>>>>   "last_complete": "342982'41304",
>>>>>   "log_tail": "342980'38259",
>>>>>   "last_user_version": 41304,
>>>>>   "last_backfill": "MAX",
>>>>>   "last_backfill_bitwise": 0,
>>>>>   "purged_snaps": "[]",
>>>>>   "history": {
>>>>>   "epoch_created": 262553,
>>>>>   "last_epoch_started": 342598,
>>>>>   "last_epoch_clean": 342613,
>>>>>   "last_epoch_split": 0,
>>>>>   "last_epoch_marked_full": 0,
>>>>>   "same_up_since": 342596,
>>>>>   "same_interval_since": 342597,
>>>>>   "same_primary_since": 342597,
>>>>>   "last_scrub": "342982'41177",
>>>>>   "last_scrub_stamp": "2017-01-02 18:19:48.081750",
>>>>>   "last_deep_scrub": "342965'37465",
>>>>>   "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
>>>>>   "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
>>>>>

Re: [ceph-users] Ceph Monitor cephx issues

2017-01-07 Thread Shinobu Kinjo

Good to know.
Sorry for such an inconvenient for you anyway.

Regards,

On Sun, Jan 8, 2017 at 2:19 PM, Alex Evonosky <alex.evono...@gmail.com>
wrote:

> Since this was a test lab, I totally purged the whole cluster and
> re-deployed..  working good now, thank you.
>
>
>
> Alex F. Evonosky
>
> <https://twitter.com/alexevon> <https://www.linkedin.com/in/alexevonosky>
>
> On Sat, Jan 7, 2017 at 9:14 PM, Alex Evonosky <alex.evono...@gmail.com>
> wrote:
>
>> Thank you..
>>
>>
>> After sending the post, I totally removed the mon and issued the build
>> with ceph-deploy:
>>
>>
>> In the logs now:
>>
>> 2017-01-07 21:12:38.113534 7fa9613fd700  0 cephx: verify_reply couldn't
>> decrypt with error: error decoding block for decryption
>> 2017-01-07 21:12:38.113546 7fa9613fd700  0 -- 10.10.10.138:6789/0 >>
>> 10.10.10.103:6789/0 pipe(0x55feb2e9 sd=12 :50266 s=1 pgs=0 cs=0 l=0
>> c=0x55feb2ca0a80).failed verifying authorize reply
>> 2017-01-07 21:12:38.114529 7fa95787b700  0 cephx: verify_reply couldn't
>> decrypt with error: error decoding block for decryption
>> 2017-01-07 21:12:38.114567 7fa95787b700  0 -- 10.10.10.138:6789/0 >>
>> 10.10.10.252:6789/0 pipe(0x55feb2e91400 sd=11 :38690 s=1 pgs=0 cs=0 l=0
>> c=0x55feb2ca0c00).failed verifying authorize reply
>> 2017-01-07 21:12:40.114522 7fa9613fd700  0 cephx: verify_reply couldn't
>> decrypt with error: error decoding block for decryption
>> 2017-01-07 21:12:40.114542 7fa9613fd700  0 -- 10.10.10.138:6789/0 >>
>> 10.10.10.103:6789/0 pipe(0x55feb2e9 sd=11 :50278 s=1 pgs=0 cs=0 l=0
>> c=0x55feb2ca0a80).failed verifying authorize reply
>> 2017-01-07 21:12:40.115706 7fa95787b700  0 cephx: verify_reply couldn't
>> decrypt with error: error decoding block for decryption
>> 2017-01-07 21:12:40.115721 7fa95787b700  0 -- 10.10.10.138:6789/0 >>
>> 10.10.10.252:6789/0 pipe(0x55feb2e91400 sd=12 :38702 s=1 pgs=0 cs=0 l=0
>> c=0x55feb2ca0c00).failed verifying authorize reply
>> 2017-01-07 21:12:41.621916 7fa956f79700  0 cephx: verify_authorizer could
>> not decrypt ticket info: error: NSS AES final round failed: -8190
>> 2017-01-07 21:12:41.621929 7fa956f79700  0 mon.alex-desktop@1(probing)
>> e0 ms_verify_authorizer bad authorizer from mon 10.10.10.103:6789/0
>> 2017-01-07 21:12:41.621944 7fa956f79700  0 -- 10.10.10.138:6789/0 >>
>> 10.10.10.103:6789/0 pipe(0x55feb2fb5400 sd=21 :6789 s=0 pgs=0 cs=0 l=0
>> c=0x55feb2ca1500).accept: got bad authorizer
>>
>>
>>
>> $ sudo ceph -s
>> cluster f5aba719-4856-4ae2-a5d4-f9ff0f614b60
>>  health HEALTH_WARN
>> 512 pgs degraded
>> 348 pgs stale
>> 512 pgs stuck unclean
>> 512 pgs undersized
>> 6 requests are blocked > 32 sec
>> recovery 25013/50026 objects degraded (50.000%)
>> mds cluster is degraded
>> 1 mons down, quorum 0,2 alpha,toshiba-laptop
>>  monmap e17: 3 mons at {alex-desktop=10.10.10.138:678
>> 9/0,alpha=10.10.10.103:6789/0,toshiba-laptop=10.10.10.252:6789/0}
>> election epoch 806, quorum 0,2 alpha,toshiba-laptop
>>   fsmap e201858: 1/1/1 up {0=1=up:replay}
>>  osdmap e200229: 3 osds: 2 up, 2 in; 85 remapped pgs
>> flags sortbitwise
>>   pgmap v4088774: 512 pgs, 4 pools, 50883 MB data, 25013 objects
>> 59662 MB used, 476 GB / 563 GB avail
>> 25013/50026 objects degraded (50.000%)
>>  348 stale+active+undersized+degraded
>>  164 active+undersized+degraded
>>
>>
>>
>> root@alex-desktop:/var/lib/ceph/mon/ceph-alex-desktop# ls -ls
>> total 8
>> 0 -rw-r--r-- 1 ceph ceph0 Jan  7 21:11 done
>> 4 -rw--- 1 ceph ceph   77 Jan  7 21:05 keyring
>> 4 drwxr-xr-x 2 ceph ceph 4096 Jan  7 21:10 store.db
>> 0 -rw-r--r-- 1 ceph ceph0 Jan  7 21:05 systemd
>>
>>
>>
>>
>> Very odd...  never seen this issue on the other monitor deployments...
>>
>>
>>
>>
>>
>>
>>
>>
>> Alex F. Evonosky
>>
>> <https://twitter.com/alexevon> <https://www.linkedin.com/in/alexevonosky>
>>
>> On Sat, Jan 7, 2017 at 8:54 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>>
>>> Using ``ceph-deploy`` will save your life:
>>>
>>>  # https://github.com/ceph/ceph/blob/master/doc/start/quick-cep
>>> h-deploy.rst
>>>   * Please look at: Adding Monitors
>>>
>>> If you are using

Re: [ceph-users] Ceph Monitor cephx issues

2017-01-07 Thread Shinobu Kinjo

Using ``ceph-deploy`` will save your life:

 # https://github.com/ceph/ceph/blob/master/doc/start/quick-ceph-deploy.rst
  * Please look at: Adding Monitors

If you are using centos or similar, the latest package is available here:

 #
http://download.ceph.com/rpm-jewel/el7/noarch/ceph-deploy-1.5.37-0.noarch.rpm

Regards,


On Sun, Jan 8, 2017 at 9:53 AM, Alex Evonosky <alex.evono...@gmail.com>
wrote:

> Thank you for the reply!
>
> I followed this article:
>
> http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-mons/
>
>
> Under the section: ADDING A MONITOR (MANUAL)
>
>
>
> Alex F. Evonosky
>
> <https://twitter.com/alexevon> <https://www.linkedin.com/in/alexevonosky>
>
> On Sat, Jan 7, 2017 at 6:36 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>
>> How did you add a third MON?
>>
>> Regards,
>>
>> On Sun, Jan 8, 2017 at 7:01 AM, Alex Evonosky <alex.evono...@gmail.com>
>> wrote:
>> > Anyone see this before?
>> >
>> >
>> > 2017-01-07 16:55:11.406047 7f095b379700  0 cephx: verify_reply couldn't
>> > decrypt with error: error decoding block for decryption
>> > 2017-01-07 16:55:11.406053 7f095b379700  0 -- 10.10.10.138:6789/0 >>
>> > 10.10.10.252:6789/0 pipe(0x55cf8d028000 sd=11 :47548 s=1 pgs=0 cs=0 l=0
>> > c=0x55cf8ce28f00).failed verifying authorize reply
>> >
>> >
>> >
>> > Two monitors are up just fine, just trying to add a third and a quorum
>> > cannot be met.  NTP is running and no iptables running at all on
>> internal
>> > cluster.
>> >
>> >
>> > Thank you.
>> > -Alex
>> >
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Monitor cephx issues

2017-01-07 Thread Shinobu Kinjo

How did you add a third MON?

Regards,

On Sun, Jan 8, 2017 at 7:01 AM, Alex Evonosky  wrote:
> Anyone see this before?
>
>
> 2017-01-07 16:55:11.406047 7f095b379700  0 cephx: verify_reply couldn't
> decrypt with error: error decoding block for decryption
> 2017-01-07 16:55:11.406053 7f095b379700  0 -- 10.10.10.138:6789/0 >>
> 10.10.10.252:6789/0 pipe(0x55cf8d028000 sd=11 :47548 s=1 pgs=0 cs=0 l=0
> c=0x55cf8ce28f00).failed verifying authorize reply
>
>
>
> Two monitors are up just fine, just trying to add a third and a quorum
> cannot be met.  NTP is running and no iptables running at all on internal
> cluster.
>
>
> Thank you.
> -Alex
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: Is this a deadlock?

2017-01-04 Thread Shinobu Kinjo

On Wed, Jan 4, 2017 at 6:05 PM, 许雪寒  wrote:
> We've already restarted the OSD successfully.
> Now, we are trying to figure out why the OSD suicide itself

Network issue which causes pretty unstable communication with other
OSDs in same acting set causes suicide usually.

>
> Re: [ceph-users] Is this a deadlock?
>
> Hi, thanks for the quick reply.
>
> We manually deployed this OSD, and it has been running for more than half a 
> year. The output last night should be the latter one that you metioned Last 
> night, one of our switch got some problem and made the OSD unconnected to 
> other peer, which in turn made the monitor to wrongly mark the OSD down.
>
> Thank you:-)
>
>
>
> On Wed, 4 Jan 2017 07:49:03 + 许雪寒 wrote:
>
>> Hi, everyone.
>>
>> Recently in one of our online ceph cluster, one OSD suicided itself after 
>> experiencing some network connectivity problem, and the OSD log is as 
>> follows:
>>
>
> Version of Ceph and all relevant things would help.
> Also "some network connectivity problem" is vague, if it were something like 
> a bad port or overloaded switch you'd think that more than one OSD would be 
> affected.
>
> [snip, I have nothing to comment on that part]
>>
>>
>
>> And by the way, when we first tried to restart OSD who committed suicide 
>> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said 
>> something like “OSD.619 is not found”, which seemed that OSD.619 was never 
>> created in this cluster. We are really confused, please help us.
>>
> How did you create that OSD?
> Manually or with ceph-deploy?
> The fact that you're trying to use a SYS-V initscript suggests both and older 
> Ceph version and OS and thus more likely a manual install.
>
> In which case that OSD needs to be defined in ceph.conf on that node.
> Full output of that error message would have told us these things, like:
> ---
> root@ceph-04:~# /etc/init.d/ceph start osd.444
> /etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 
> osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph 
> defines mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24)
> ---
> The above is the output from a Hammer cluster with OSDs deployed with 
> ceph-deploy.
> And incidentally the "ceph.conf" part of the output is a blatant lie and just 
> a repetition of what it gathered from /var/lib/ceph.
>
> This is a Hammer cluster with manually deployed OSDs:
> ---
> engtest03:~# /etc/init.d/ceph start osd.33
> /etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 
> mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 
> osd.22 osd.23, /var/lib/ceph defines )
> ---
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph per-user stats?

2017-01-03 Thread Shinobu Kinjo

On Wed, Jan 4, 2017 at 4:33 PM, Henrik Korkuc  wrote:
> On 17-01-04 03:16, Gregory Farnum wrote:
>>
>> On Fri, Dec 23, 2016 at 12:04 AM, Henrik Korkuc  wrote:
>>>
>>> Hello,
>>>
>>> I wondered if Ceph can emit stats (via perf counters, statsd or in some
>>> other way) IO and bandwidth stats per Ceph user? I was unable to find
>>> such
>>> stats. I know that we can get at least some of these stats from RGW, but
>>> I'd
>>> like to have something like that for RBD and CephFS. Example usage could
>>> be
>>> figuring out who is hammering CephFS with IO requests.
>>>
>>> Maybe someone could provide basic guidance where to dig in if I'd like to
>>> make this feature myself? I there any plans/blueprints for such stats?
>>
>> This really isn't feasible right now as a starter project: doing it
>> "for real" requires all OSDs track all clients and IOs and then a
>> central location to receive that data and correlate it. We discussed a
>> sampling implementation for ceph-mgr recently in the Ceph Dev Monthly
>> and I think work is upcoming to enable it, but I'm not sure what kind
>> of timeline it's on.
>> -Greg
>
>
> I am thinking about implementing it myself. My high level idea is to have
> additional thread per OSD to collect these stats and periodically emit them
> to statsd for aggregation by metrics system. As I am not familiar with Ceph
> code base it may take a while to implement.. :)
>
> If there are some docs/plans for ceph-mgr metrics maybe I could contribute

It's not metrics but something:

https://github.com/ceph/ceph/tree/master/doc/mgr

> to that instead doing it my way? This way we would be more aligned and it
> would be easier to contribute it back to Ceph.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph pg active+clean+inconsistent

2017-01-03 Thread Shinobu Kinjo

  "last_epoch_clean": 342613,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 342596,
> "same_interval_since": 342597,
> "same_primary_since": 342597,
> "last_scrub": "342982'41177",
> "last_scrub_stamp": "2017-01-02 18:19:48.081750",
> "last_deep_scrub": "342965'37465",
> "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
> "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
> },
> "stats": {
> "version": "342589'15033",
> "reported_seq": "21478",
> "reported_epoch": "342596",
> "state": "remapped+peering",
> "last_fresh": "2016-11-01 16:21:20.584113",
> "last_change": "2016-11-01 16:21:20.295685",
> "last_active": "2016-11-01 16:14:02.694748",
> "last_peered": "2016-11-01 16:14:02.694748",
> "last_clean": "2016-11-01 15:26:23.393984",
> "last_became_active": "2016-11-01 16:05:44.990630",
> "last_became_peered": "2016-11-01 16:05:44.990630",
> "last_unstale": "2016-11-01 16:21:20.584113",
> "last_undegraded": "2016-11-01 16:21:20.584113",
> "last_fullsized": "2016-11-01 16:21:20.584113",
> "mapping_epoch": 342596,
> "log_start": "341563'12014",
> "ondisk_log_start": "341563'12014",
> "created": 262553,
> "last_epoch_clean": 342587,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "342266'14514",
> "last_scrub_stamp": "2016-10-28 16:41:06.563820",
> "last_deep_scrub": "342266'14514",
> "last_deep_scrub_stamp": "2016-10-28 16:41:06.563820",
> "last_clean_scrub_stamp": "2016-10-28 16:41:06.563820",
> "log_size": 3019,
> "ondisk_log_size": 3019,
> "stats_invalid": false,
> "dirty_stats_invalid": false,
> "omap_stats_invalid": false,
> "hitset_stats_invalid": false,
> "hitset_bytes_stats_invalid": false,
> "pin_stats_invalid": true,
> "stat_sum": {
> "num_bytes": 12528581359,
> "num_objects": 3562,
> "num_object_clones": 0,
> "num_object_copies": 10686,
> "num_objects_missing_on_primary": 0,
> "num_objects_missing": 0,
> "num_objects_degraded": 0,
> "num_objects_misplaced": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 3562,
> "num_whiteouts": 0,
> "num_read": 3678,
> "num_read_kb": 10197642,
> "num_write": 15656,
> "num_write_kb": 19564203,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 5806,
> "num_bytes_recovered": 22687335556,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0,
> "num_bytes_hit_set_archive": 0,
> "num_flush": 0,
> "num_flush_kb": 0,
> "num_evict": 0,
>

Re: [ceph-users] documentation

2017-01-03 Thread Shinobu Kinjo

Description of ``--pool=data`` is fine but just confusing users.

http://docs.ceph.com/docs/jewel/start/quick-ceph-deploy/

should be synced with

https://github.com/ceph/ceph/blob/master/doc/start/quick-ceph-deploy.rst

I would recommend you to refer ``quick-ceph-deploy.rst`` because docs
in git are well-maintained.


On Mon, Jan 2, 2017 at 2:20 PM, Manuel Sopena Ballesteros
 wrote:
> Hi,
>
>
>
> Regarding this doc page à
> http://docs.ceph.com/docs/jewel/start/quick-ceph-deploy/
>
>
>
> I think the following text needs to be changed?
>
>
>
> rados put {object-name} {file-path} --pool=data
>
> to
>
> rados put {object-name} {file-path} --pool= {poolname}
>
> thank you
>
>
>
> Manuel Sopena Ballesteros | Big data Engineer
> Garvan Institute of Medical Research
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel...@garvan.org.au
>
>
>
> NOTICE
> Please consider the environment before printing this email. This message and
> any attachments are intended for the addressee named and may contain legally
> privileged/confidential/copyright information. If you are not the intended
> recipient, you should not read, use, disclose, copy or distribute this
> communication. If you have received this message in error please notify us
> at once by return email and then delete both messages. We accept no
> liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] problem accessing docs.ceph.com

2017-01-03 Thread Shinobu Kinjo

Yeah, dreamhost seems to have internal issue which is not quite good for us.
Sorry for that.

On Tue, Jan 3, 2017 at 5:41 PM, Rajib Hossen
 wrote:
> Hello, I can't browse docs.ceph.com for last 2/3 days. Google says it takes
> too many time to reload. I also couldn't ping the website.  I also check
> http://www.downforeveryoneorjustme.com/docs.ceph.com and it says it also
> down from other ends.
>
> Is there a problem in the server?
>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Migrate cephfs metadata to SSD in running cluster

2017-01-02 Thread Shinobu Kinjo

I've never done migration of cephfs_metadata from spindle disks to
ssds. But logically you could achieve this through 2 phases.

 #1 Configure CRUSH rule including spindle disks and ssds
 #2 Configure CRUSH rule for just pointing to ssds
  * This would cause massive data shuffling.

On Mon, Jan 2, 2017 at 2:36 PM, Mike Miller  wrote:
> Hi,
>
> Happy New Year!
>
> Can anyone point me to specific walkthrough / howto instructions how to move
> cephfs metadata to SSD in a running cluster?
>
> How is crush to be modified step by step such that the metadata migrate to
> SSD?
>
> Thanks and regards,
>
> Mike
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Shinobu Kinjo

The best practice to reweight OSDs is to run
test-reweight-by-utilization which is dry-run of reweighting OSDs
before running reweight-by-utilization.

On Sat, Dec 31, 2016 at 3:05 AM, Brian Andrus
 wrote:
> We have a set it and forget it cronjob setup once an hour to keep things a
> bit more balanced.
>
> 1 * * * * /bin/bash /home/briana/reweight_osd.sh 2>&1 | /usr/bin/logger -t
> ceph_reweight
>
> The script checks and makes sure cluster health is OK and no other
> rebalancing is going on. It will also check the reported STDDEV from `ceph
> osd df` and if outside acceptable ranges executes a gentle reweight.
>
>  ceph osd reweight-by-utilization 103 .015 10
>
> It's definitely an "over time" kind of thing, but after a week we are
> already seeing pretty good results. Pending OSD reboots, a few months from
> now our cluster should be seeing quite a bit less difference in utilization.
>
> The three parameters after the reweight-by-utilization are not well
> documented, but they are
>
> 103 - Select OSDs that are 3% above the average (default is 120 but we want
> a larger pool of OSDs to choose from to get an eventual tighter tolerance)
> .010 - don't reweight any OSD more than this increment (keeps the impact
> low)
> 10 - number of OSDs to select (to keep impact manageable)
>
> Hope that helps.
>
> On Fri, Dec 30, 2016 at 2:27 AM, Kees Meijs  wrote:
>>
>> Thanks, I'll try a manual reweight at first.
>>
>> Have a happy new year's eve (yes, I know it's a day early)!
>>
>> Regards,
>> Kees
>>
>> On 30-12-16 11:17, Wido den Hollander wrote:
>> > For this reason you can do a OSD reweight by running the 'ceph osd
>> > reweight-by-utilization' command or do it manually with 'ceph osd reweight 
>> > X
>> > 0-1'
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Brian Andrus
> Cloud Systems Engineer
> DreamHost, LLC
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Shinobu Kinjo

On Fri, Dec 30, 2016 at 7:27 PM, Kees Meijs  wrote:
> Thanks, I'll try a manual reweight at first.

Great.

CRUSH would probably be able to be more clever in the future anyway.

>
> Have a happy new year's eve (yes, I know it's a day early)!
>
> Regards,
> Kees
>
> On 30-12-16 11:17, Wido den Hollander wrote:
>> For this reason you can do a OSD reweight by running the 'ceph osd 
>> reweight-by-utilization' command or do it manually with 'ceph osd reweight X 
>> 0-1'
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Shinobu Kinjo

On Fri, Dec 30, 2016 at 7:17 PM, Wido den Hollander  wrote:
>
>> Op 30 december 2016 om 11:06 schreef Kees Meijs :
>>
>>
>> Hi Asley,
>>
>> We experience (using Hammer) a similar issue. Not that I have a perfect
>> solution to share, but I felt like mentioning a "me too". ;-)
>>
>> On a side note: we configured correct weight per drive as well.
>>
>
> Ceph will never balance 100% perfect for a few reasons:
>
> - CRUSH is not perfect nor will it ever be

This is not entirely true. Any **distributed** storage systems do not
realize this capabilities.

> - Object sizes vary
> - The amount of Placement Groups matters
>
> For this reason you can do a OSD reweight by running the 'ceph osd 
> reweight-by-utilization' command or do it manually with 'ceph osd reweight X 
> 0-1'
>
> Wido
>
>> Regards,
>> Kees
>>
>> On 29-12-16 11:54, Ashley Merrick wrote:
>> >
>> > Hello,
>> >
>> >
>> >
>> > I currently have 5 servers within my CEPH Cluster
>> >
>> >
>> >
>> > 2 x (10 * 8TB Disks)
>> >
>> > 3 x (10 * 4TB Disks)
>> >
>> >
>> >
>> > Currently seeing a larger difference in OSD use across the two
>> > separate server types, as well as within the server itself.
>> >
>> >
>> >
>> > For example on one 4TB server I have an OSD at 64% and one at 84%,
>> > where on the 8TB servers the OSD range from 49% to 64%, where the
>> > highest used OSD’s are on the 4TB.
>> >
>> >
>> >
>> > Each drive has a weight set correctly for the drive size and each
>> > server has the correct weight set, below is my crush map. Apart from
>> > running the command to adjust the re-weight is there anything I am
>> > doing wrong or should change for better spread of data, not looking
>> > for near perfect but where the 8TB drives are sitting at 64% max and
>> > 4TB are sitting at 80%’s causes a big inbalance.
>> >
>> >
>> >
>> > # begin crush map
>> >
>> > tunable choose_local_tries 0
>> >
>> > tunable choose_local_fallback_tries 0
>> >
>> > tunable choose_total_tries 50
>> >
>> > tunable chooseleaf_descend_once 1
>> >
>> > tunable chooseleaf_vary_r 1
>> >
>> > tunable straw_calc_version 1
>> >
>> > tunable allowed_bucket_algs 54
>> >
>> >
>> >
>> > # buckets
>> >
>> > host sn1 {
>> >
>> > id -2   # do not change unnecessarily
>> >
>> > # weight 72.800
>> >
>> > alg straw2
>> >
>> > hash 0  # rjenkins1
>> >
>> > item osd.0 weight 7.280
>> >
>> > item osd.1 weight 7.280
>> >
>> > item osd.3 weight 7.280
>> >
>> > item osd.4 weight 7.280
>> >
>> > item osd.2 weight 7.280
>> >
>> > item osd.5 weight 7.280
>> >
>> > item osd.6 weight 7.280
>> >
>> > item osd.7 weight 7.280
>> >
>> > item osd.8 weight 7.280
>> >
>> > item osd.9 weight 7.280
>> >
>> > }
>> >
>> > host sn3 {
>> >
>> > id -6   # do not change unnecessarily
>> >
>> > # weight 72.800
>> >
>> > alg straw2
>> >
>> > hash 0  # rjenkins1
>> >
>> > item osd.10 weight 7.280
>> >
>> > item osd.11 weight 7.280
>> >
>> > item osd.12 weight 7.280
>> >
>> > item osd.13 weight 7.280
>> >
>> > item osd.14 weight 7.280
>> >
>> > item osd.15 weight 7.280
>> >
>> > item osd.16 weight 7.280
>> >
>> > item osd.17 weight 7.280
>> >
>> > item osd.18 weight 7.280
>> >
>> > item osd.19 weight 7.280
>> >
>> > }
>> >
>> > host sn4 {
>> >
>> > id -7   # do not change unnecessarily
>> >
>> > # weight 36.060
>> >
>> > alg straw2
>> >
>> > hash 0  # rjenkins1
>> >
>> > item osd.20 weight 3.640
>> >
>> > item osd.21 weight 3.640
>> >
>> > item osd.22 weight 3.640
>> >
>> > item osd.23 weight 3.640
>> >
>> > item osd.24 weight 3.640
>> >
>> > item osd.25 weight 3.640
>> >
>> > item osd.26 weight 3.640
>> >
>> > item osd.27 weight 3.640
>> >
>> > item osd.28 weight 3.640
>> >
>> > item osd.29 weight 3.300
>> >
>> > }
>> >
>> > host sn5 {
>> >
>> > id -8   # do not change unnecessarily
>> >
>> > # weight 36.060
>> >
>> > alg straw2
>> >
>> > hash 0  # rjenkins1
>> >
>> > item osd.30 weight 3.640
>> >
>> > item osd.31 weight 3.640
>> >
>> > item osd.32 weight 3.640
>> >
>> > item osd.33 weight 3.640
>> >
>> > item osd.34 weight 3.640
>> >
>> > item osd.35 weight 3.640
>> >
>> > item osd.36 weight 3.640
>> >
>> > item osd.37 weight 3.640
>> >
>> > item osd.38 weight 3.640
>> >
>> > item osd.39 weight 3.640
>> >
>> > }
>> >
>> > host sn6 {
>> >
>> > id -9   # do not change unnecessarily
>> >
>> > # weight 36.060
>> >
>> > alg straw2
>> >
>> > hash 0  # rjenkins1
>> >
>> > item osd.40 weight 3.640
>> >
>> > item osd.41 weight 3.640
>> >
>> > item osd.42 weight

Re: [ceph-users] How to know if an object is stored in clients?

2016-12-30 Thread Shinobu Kinjo

You can track activity of acting set by using:

 # ceph daemon osd.${osd id} dump_ops_in_flight

On Fri, Dec 30, 2016 at 3:59 PM, Jaemyoun Lee 
wrote:

> Dear Wido,
> Is there a command to check the ACK? Or, may you tell me a source code
> function for the received ACK?
>
> Thanks,
> Jae
>
> On Thu, Dec 29, 2016 at 6:56 PM Wido den Hollander  wrote:
>
>>
>> > Op 28 december 2016 om 12:58 schreef Jaemyoun Lee <
>> jaemy...@hanyang.ac.kr>:
>> >
>> >
>> > Hello,
>> >
>> > I executed the RADOS tool to store an object as follows:
>> > ```
>> > user@ClientA:~$ rados put -p=rbd objectA a.txt
>> > ```
>> >
>> > I wonder how the client knows a completion of storing the object in some
>> > OSDs.
>> >
>>
>> When the primary OSD for a PG Acks to the client it knows that it is
>> stored on ALL replicas.
>>
>> RADOS/Ceph always writes synchronous.
>>
>> Wido
>>
>> > Thanks,
>> > Jae
>> >
>> > --
>> >   Jaemyoun Lee
>> >
>> >   CPS Lab. (Cyber-Physical Systems Laboratory in Hanyang University)
>> >   E-mail : jaemy...@hanyang.ac.kr
>> >   Website : http://cpslab.hanyang.ac.kr
>> >
>> >
>> > 
>> 
>> ---
>> > 위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가
>> 금지된 정보가 포함돼 있을 수 있습니다.
>> > 귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게
>> 공개, 복사, 전송, 배포해서는 안 됩니다.
>> > 본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.
>> >
>> > This e-mail is intended only for the named recipient.
>> > Dissemination, distribution, forwarding, or copying of this e-mail by
>> anyone other than the intended recipient is prohibited.
>> > If you have received it in error, please notify the sender by e-mail
>> and completely delete it. Thank you for your cooperation.
>> > 
>> 
>> ---___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>   Jaemyoun Lee
>
>   CPS Lab. (Cyber-Physical Systems Laboratory in Hanyang University)
>   E-mail : jaemy...@hanyang.ac.kr
>   Website : http://cpslab.hanyang.ac.kr
>
> --
>
>
> 위 전자우편에 포함된 정보는 지정된 수신인에게만 발송되는 것으로 보안을 유지해야 하는 정보와 법률상 및 기타 사유로 공개가 금지된
> 정보가 포함돼 있을 수 있습니다.
>
> 귀하가 이 전자우편의 지정 수신인이 아니라면 본 메일에 포함된 정보의 전부 또는 일부를 무단으로 보유, 사용하거나 제3자에게 공개,
> 복사, 전송, 배포해서는 안 됩니다.
>
> 본 메일이 잘못 전송되었다면, 전자우편 혹은 전화로 연락해주시고, 메일을 즉시 삭제해 주시기 바랍니다. 협조해 주셔서 감사합니다.
>
>
> This e-mail is intended only for the named recipient.
>
> Dissemination, distribution, forwarding, or copying of this e-mail by
> anyone other than the intended recipient is prohibited.
>
> If you have received it in error, please notify the sender by e-mail and
> completely delete it. Thank you for your cooperation.
> --
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush - nuts and bolts

2016-12-29 Thread Shinobu Kinjo

First off either on write / on read, client access to primary OSD
directly first. More specifically clients access to the MON 1st to get
cluster map, then primary OSD.

So it doesn't matter to think of difference between write/read. Only
difference between those 2 operations basically is that you must wait
for ack from replica OSD(s) to primary OSD before you get ack from
primary OSD.

How objects locations are calculated is:

 #1 Hash the object name
 #2 Calculate the has modulo the number of PGs
 #3 Get the pool id
 #4 Prepends the pool id to result of #2 to get the PG

To find object location, what you need to do is:

 # ceph osd map  
  e.g.,
   ceph osd map rbd HOSTS
   osdmap e11 pool 'rbd' (0) object 'HOSTS' -> pg 0.bc5444d9 (0.1) ->
up ([2,0,1], p2) acting ([2,0,1], p2)

Actual location of object on OSD(s) is:
 e.g.,
 ls /var/lib/ceph/osd/ceph-2/current/0.1_head/
 __head_0001__0  HOSTS__head_BC5444D9__0

On Fri, Dec 30, 2016 at 8:55 AM, Ukko <ukkohakkarai...@gmail.com> wrote:
> Hi Shinobe,
>
> The documentation did not help me. I could not find the info on how the
> location for the object to be written gets selected, nor on the client side,
> how is the object's to be read location calculated.
>
> So in an environment of 10 storage nodes, 10 OSDs in each, size 3, 2 pools,
> 10 PGs each. How objectA (10 kB), objectB (10 MB), and objectC (10 GB) get
> located on write? How are the located by client on read? :)
>
> On Thu, Dec 29, 2016 at 2:01 PM, Ukko Hakkarainen
> <ukkohakkarai...@gmail.com> wrote:
>>
>> Shinobe,
>>
>> I'll re-check if the info I'm after is there, I recall not. I'll get back
>> to you later.
>>
>> Thanks!
>>
>> > Shinobu Kinjo <ski...@redhat.com> kirjoitti 29.12.2016 kello 5.28:
>> >
>> > Please see the following:
>> >
>> > http://docs.ceph.com/docs/giant/architecture/
>> >
>> > Everything you would want to know about is there.
>> >
>> > Regards,
>> >
>> >> On Thu, Dec 29, 2016 at 8:27 AM, Ukko <ukkohakkarai...@gmail.com>
>> >> wrote:
>> >> I'd be interested in CRUSH algorithm simplified in series of
>> >> pictures. How does a storage node write and client  read, and
>> >> how do they calculate what they're after? What gets where/Where is it
>> >> found/Why?
>> >>
>> >> I suggest over-simplified storage system of e.g. 1 monitor, 4 storage
>> >> nodes,
>> >> 2 OSDs/node,
>> >> 3 PGs/OSD, 2 pools?
>> >>
>> >> http://ceph.com/papers/weil-crush-sc06.pdf did not solve this for me.
>> >>
>> >>
>> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph program uses lots of memory

2016-12-29 Thread Shinobu Kinjo

And we may be interested in your cluster's configuration.

 # ceph --show-config > $(hostname).$(date +%Y%m%d).ceph_conf.txt

On Fri, Dec 30, 2016 at 7:48 AM, David Turner  wrote:

> Another thing that I need to make sure on is that your number of PGs in
> the pool with 90% of the data is a power of 2 (256, 512, 1024, 2048, etc).
>  If that is the case, then I need the following information.
>
> 1) Pool replica size
> 2) The number of the pool with the data
> 3) A copy of your osdmap (ceph osd getmap -o osd_map.bin)
> 4) Full output of (ceph osd tree)
> 5) Full output of (ceph osd df)
>
> With that I can generate a new crushmap that is balanced for your cluster
> to equalize all of the osds % used.
>
> Our clusters have more than 1k osds and the difference between the top
> used osd and the least used osd is within 2% in those clusters.  We have
> 99.9% of our data in 1 pool.
>
> --
>
>  David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> --
>
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Bryan
> Henderson [bry...@giraffe-data.com]
> Sent: Thursday, December 29, 2016 3:31 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] ceph program uses lots of memory
>
>
> Does anyone know why the 'ceph' program uses so much memory?  If I run it
> with
> an address space rlimit of less than 300M, it usually dies with messages
> about
> not being able to allocate memory.
>
> I'm curious as to what it could be doing that requires so much address
> space.
>
> It doesn't matter what specific command I'm doing and it does this even
> with
> there is no ceph cluster running, so it must be something pretty basic.
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH - best books and learning sites

2016-12-29 Thread Shinobu Kinjo

I always tend to jump into:

 https://github.com/ceph

Everything is there.

On Fri, Dec 30, 2016 at 2:34 AM, Michael Hackett  wrote:
> Hello Andre,
>
> The Ceph site would be the best place to get the information you are looking
> for, specifically the docs section: http://docs.ceph.com/docs/master/.
>
> Karan Singh actually wrote two books which can be useful as initial
> resources as well
>
> Learning Ceph:
>
> https://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?ie=UTF8=1483032651=8-1=ceph
>
> Ceph Cookbook:
>
> https://www.amazon.com/Ceph-Cookbook-Karan-Singh-ebook/dp/B0171UHJGY/ref=sr_1_2?ie=UTF8=1483032651=8-2=ceph
>
> These resources as well as the community are a good place to start.
>
> Thanks,
>
> Mike Hackett
> Sr Software Maintenance Engineer
> Red Hat Ceph Storage Product Lead
>
>
> On Thu, Dec 29, 2016 at 6:20 AM, Andre Forigato 
> wrote:
>>
>> Hello,
>>
>> I'm starting to study Ceph for implementation in our company.
>>
>> I need the help of the community.
>> I'm looking for Ceph's best books and learning sites.
>>
>> Are the people using Suse or Redhat distribution?
>> My question is what best Linux distribution should I use?
>>
>>
>> Thanks to the Ceph community.
>>
>> Andre
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crush - nuts and bolts

2016-12-28 Thread Shinobu Kinjo

Please see the following:

 http://docs.ceph.com/docs/giant/architecture/

Everything you would want to know about is there.

Regards,

On Thu, Dec 29, 2016 at 8:27 AM, Ukko  wrote:
> I'd be interested in CRUSH algorithm simplified in series of
> pictures. How does a storage node write and client  read, and
> how do they calculate what they're after? What gets where/Where is it
> found/Why?
>
> I suggest over-simplified storage system of e.g. 1 monitor, 4 storage nodes,
> 2 OSDs/node,
> 3 PGs/OSD, 2 pools?
>
> http://ceph.com/papers/weil-crush-sc06.pdf did not solve this for me.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recover VM Images from Dead Cluster

2016-12-24 Thread Shinobu Kinjo

On Sun, Dec 25, 2016 at 7:33 AM, Brad Hubbard  wrote:
> On Sun, Dec 25, 2016 at 3:33 AM, w...@42on.com  wrote:
>>
>>
>>> Op 24 dec. 2016 om 17:20 heeft L. Bader  het volgende 
>>> geschreven:
>>>
>>> Do you have any references on this?
>>>
>>> I searched for something like this quite a lot and did not find anything...
>>>
>>
>> No, saw it somewhere on the ML I think, but I  am not sure.
>>
>> I just know it is in development or on a todo somewhere.
>
> Already done I believe.
>
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

That approach should work. But things which you *must* make sure
before taking action are:

 1.You must stop all OSDs which you're going to SSH to before
ceph-objectstore-tool'ing.
 2.You should do that approach *much* more carefully.

So if you have any doubt, please let us know.

>
> HTH.
>
>>
>>>
 On 24.12.2016 14:55, w...@42on.com wrote:

> Op 24 dec. 2016 om 14:47 heeft L. Bader  het 
> volgende geschreven:
>
> Hello,
>
> I have a problem with our (dead) Ceph-Cluster: The configuration seems to 
> be gone (deleted / overwritten) and all monitors are gone aswell. 
> However, we do not have (up-to-date) backups for all VMs (used with 
> Proxmox) and we would like to recover them from "raw" OSDs only (we have 
> all OSDs mounted on one Storage Server). Restoring the cluster itself 
> seems impossible.
>
 Work is on it's way iirc to restore MONs from OSD data.

 You might want to search for that, the tracker or Github might help.

> To recover the VM images I tried to write a simple tool that:
> 1) searches all OSDs for udata files
> 2) Sorts them by Image ID
> 3) Sorts them by "position" / offset
> 4) Assembles the 4MB blocks to a single file using dd
>
> (See: https://gitlab.lbader.de/kryptur/ceph-recovery/tree/master )
>
> However, for many (nearly all) images there are missing blocks (empty 
> parts I guess). So I created a 4MB block of Null-Bytes for each missing 
> parts.
>
> The problem is that the created Image is not usable. fdisk detects 
> partitions correctly, but we cannot access the data in any way.
>
> Is there another way to recover the data without having any (working) 
> ceph tools?
>
> Greetings and Merry Christmas :)
>
> Lennart
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph pg active+clean+inconsistent

2016-12-23 Thread Shinobu Kinjo

Plus do this as well:

 # rados list-inconsistent-obj ${PG ID}

On Fri, Dec 23, 2016 at 7:08 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> Could you also try this?
>
> $ attr -l 
> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.0009__head_EED893F1__6
>
> Take note of any of ceph._, ceph._@1, ceph._@2, etc.
>
> For me on my test cluster it looks like this.
>
> $ attr -l 
> dev/osd1/current/0.3_head/benchmark\\udata\\urskikr.localdomain\\u16952\\uobject99__head_2969453B__0
> Attribute "cephos.spill_out" has a 2 byte value for
> dev/osd1/current/0.3_head/benchmark\udata\urskikr.localdomain\u16952\uobject99__head_2969453B__0
> Attribute "ceph._" has a 250 byte value for
> dev/osd1/current/0.3_head/benchmark\udata\urskikr.localdomain\u16952\uobject99__head_2969453B__0
> Attribute "ceph.snapset" has a 31 byte value for
> dev/osd1/current/0.3_head/benchmark\udata\urskikr.localdomain\u16952\uobject99__head_2969453B__0
> Attribute "ceph._@1" has a 53 byte value for
> dev/osd1/current/0.3_head/benchmark\udata\urskikr.localdomain\u16952\uobject99__head_2969453B__0
> Attribute "selinux" has a 37 byte value for
> dev/osd1/current/0.3_head/benchmark\udata\urskikr.localdomain\u16952\uobject99__head_2969453B__0
>
> Then dump out ceph._ to a file and append all ceph._@X attributes like so.
>
> $ attr -q -g ceph._
> dev/osd1/current/0.3_head/benchmark\\udata\\urskikr.localdomain\\u16952\\uobject99__head_2969453B__0
>> /tmp/attr1
> $ attr -q -g ceph._@1
> dev/osd1/current/0.3_head/benchmark\\udata\\urskikr.localdomain\\u16952\\uobject99__head_2969453B__0
>>> /tmp/attr1
>
> Note the ">>" on the second command to append the output, not
> overwrite. Do this for each ceph._@X attribute.
>
> Then display the file as an object_info_t structure and check the size value.
>
> $ bin/ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
> {
> "oid": {
> "oid": "benchmark_data_rskikr.localdomain_16952_object99",
> "key": "",
> "snapid": -2,
> "hash": 694764859,
> "max": 0,
> "pool": 0,
> "namespace": ""
> },
> "version": "9'19",
> "prior_version": "0'0",
> "last_reqid": "client.4110.0:100",
> "user_version": 19,
> "size": 4194304,
> "mtime": "2016-12-23 19:13:57.012681",
> "local_mtime": "2016-12-23 19:13:57.032306",
> "lost": 0,
> "flags": 52,
> "snaps": [],
> "truncate_seq": 0,
> "truncate_size": 0,
> "data_digest": 2293522445,
> "omap_digest": 4294967295,
> "expected_object_size": 4194304,
> "expected_write_size": 4194304,
> "alloc_hint_flags": 53,
> "watchers": {}
> }
>
> Depending on the output one method for fixing this may be to use a
> binary editing technique such a laid out in
> https://www.spinics.net/lists/ceph-devel/msg16519.html to set the size
> value to zero. Your target value is 1c.
>
> $ printf '%x\n' 1835008
> 1c
>
> Make sure you check it is right before injecting it back in with "attr -s"
>
> What version is this? Did you look for a similar bug on the tracker?
>
> HTH.
>
>
> --
> Cheers,
> Brad
>
> On Fri, Dec 23, 2016 at 4:27 PM, Shinobu Kinjo <ski...@redhat.com> wrote:
>> Would you be able to execute ``ceph pg ${PG ID} query`` against that
>> particular PG?
>>
>> On Wed, Dec 21, 2016 at 11:44 PM, Andras Pataki
>> <apat...@simonsfoundation.org> wrote:
>>> Yes, size = 3, and I have checked that all three replicas are the same zero
>>> length object on the disk.  I think some metadata info is mismatching what
>>> the OSD log refers to as "object info size".  But I'm not sure what to do
>>> about it.  pg repair does not fix it.  In fact, the file this object
>>> corresponds to in CephFS is shorter so this chunk shouldn't even exist I
>>> think (details are in the original email).  Although I may be understanding
>>> the situation wrong ...
>>>
>>> Andras
>>>
>>>
>>> On 12/21/2016 07:17 AM, Mehmet wrote:
>>>
>>> Hi Andras,
>>>
>>> Iam not the experienced User but i guess you could have a look on this
>>> object on each related osd for the pg, compare them and delete the Different

Re: [ceph-users] Ceph pg active+clean+inconsistent

2016-12-22 Thread Shinobu Kinjo

Would you be able to execute ``ceph pg ${PG ID} query`` against that
particular PG?

On Wed, Dec 21, 2016 at 11:44 PM, Andras Pataki
 wrote:
> Yes, size = 3, and I have checked that all three replicas are the same zero
> length object on the disk.  I think some metadata info is mismatching what
> the OSD log refers to as "object info size".  But I'm not sure what to do
> about it.  pg repair does not fix it.  In fact, the file this object
> corresponds to in CephFS is shorter so this chunk shouldn't even exist I
> think (details are in the original email).  Although I may be understanding
> the situation wrong ...
>
> Andras
>
>
> On 12/21/2016 07:17 AM, Mehmet wrote:
>
> Hi Andras,
>
> Iam not the experienced User but i guess you could have a look on this
> object on each related osd for the pg, compare them and delete the Different
> object. I assume you have size = 3.
>
> Then again pg repair.
>
> But be carefull iirc the replica will be recovered from the primary pg.
>
> Hth
>
> Am 20. Dezember 2016 22:39:44 MEZ, schrieb Andras Pataki
> :
>>
>> Hi cephers,
>>
>> Any ideas on how to proceed on the inconsistencies below?  At the moment
>> our ceph setup has 5 of these - in all cases it seems like some zero length
>> objects that match across the three replicas, but do not match the object
>> info size.  I tried running pg repair on one of them, but it didn't repair
>> the problem:
>>
>> 2016-12-20 16:24:40.870307 7f3e1a4b1700  0 log_channel(cluster) log [INF]
>> : 6.92c repair starts
>> 2016-12-20 16:27:06.183186 7f3e1a4b1700 -1 log_channel(cluster) log [ERR]
>> : repair 6.92c 6:34932257:::1000187bbb5.0009:head on disk size (0) does
>> not match object info size (3014656) adjusted for ondisk to (3014656)
>> 2016-12-20 16:27:35.885496 7f3e17cac700 -1 log_channel(cluster) log [ERR]
>> : 6.92c repair 1 errors, 0 fixed
>>
>>
>> Any help/hints would be appreciated.
>>
>> Thanks,
>>
>> Andras
>>
>>
>> On 12/15/2016 10:13 AM, Andras Pataki wrote:
>>
>> Hi everyone,
>>
>> Yesterday scrubbing turned up an inconsistency in one of our placement
>> groups.  We are running ceph 10.2.3, using CephFS and RBD for some VM
>> images.
>>
>> [root@hyperv017 ~]# ceph -s
>> cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
>>  health HEALTH_ERR
>> 1 pgs inconsistent
>> 1 scrub errors
>> noout flag(s) set
>>  monmap e15: 3 mons at
>> {hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0}
>> election epoch 27192, quorum 0,1,2
>> hyperv029,hyperv030,hyperv031
>>   fsmap e17181: 1/1/1 up {0=hyperv029=up:active}, 2 up:standby
>>  osdmap e342930: 385 osds: 385 up, 385 in
>> flags noout
>>   pgmap v37580512: 34816 pgs, 5 pools, 673 TB data, 198 Mobjects
>> 1583 TB used, 840 TB / 2423 TB avail
>>34809 active+clean
>>4 active+clean+scrubbing+deep
>>2 active+clean+scrubbing
>>1 active+clean+inconsistent
>>   client io 87543 kB/s rd, 671 MB/s wr, 23 op/s rd, 2846 op/s wr
>>
>> # ceph pg dump | grep inconsistent
>> 6.13f1  46920   0   0   0 16057314767 30873087
>> active+clean+inconsistent 2016-12-14 16:49:48.391572  342929'41011
>> 342929:43966 [158,215,364]   158 [158,215,364]   158 342928'40540
>> 2016-12-14 16:49:48.391511  342928'405402016-12-14 16:49:48.391511
>>
>> I tried a couple of other deep scrubs on pg 6.13f1 but got repeated
>> errors.  In the OSD logs:
>>
>> 2016-12-14 16:48:07.733291 7f3b56e3a700 -1 log_channel(cluster) log [ERR]
>> : deep-scrub 6.13f1 6:8fc91b77:::1000187bb70.0009:head on disk size (0)
>> does not match object info size (1835008) adjusted for ondisk to (1835008)
>> I looked at the objects on the 3 OSD's on their respective hosts and they
>> are the same, zero length files:
>>
>> # cd ~ceph/osd/ceph-158/current/6.13f1_head
>> # find . -name *1000187bb70* -ls
>> 6697380 -rw-r--r--   1 ceph ceph0 Dec 13 17:00
>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.0009__head_EED893F1__6
>>
>> # cd ~ceph/osd/ceph-215/current/6.13f1_head
>> # find . -name *1000187bb70* -ls
>> 5398156470 -rw-r--r--   1 ceph ceph0 Dec 13 17:00
>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.0009__head_EED893F1__6
>>
>> # cd ~ceph/osd/ceph-364/current/6.13f1_head
>> # find . -name *1000187bb70* -ls
>> 18814322150 -rw-r--r--   1 ceph ceph0 Dec 13 17:00
>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.0009__head_EED893F1__6
>>
>> At the time of the write, there wasn't anything unusual going on as far as
>> I can tell (no hardware/network issues, all processes were up, etc).
>>
>> This pool is a CephFS data pool, and the corresponding file (inode hex
>> 1000187bb70, decimal 1099537300336) looks like this:
>>
>> # ls -li chr4.tags.tsv
>> 1099537300336 -rw-r--r-- 1

Re: [ceph-users] Ceph Import Error

2016-12-21 Thread Shinobu Kinjo

Can you share exact steps you took to build the cluster?

On Thu, Dec 22, 2016 at 3:39 AM, Aakanksha Pudipeddi
 wrote:
> I mean setup a Ceph cluster after compiling from source and make install. I 
> usually use the long form to setup the cluster. The mon setup is fine but 
> when I create an OSD using ceph osd create or even check the status using 
> ceph -s after the monitor is setup, I get this error. The PATH, 
> LD_LIBRARY_PATH and PYTHONPATH have been set accordingly.
>
> Thanks
> Aakanksha
> -Original Message-
> From: John Spray [mailto:jsp...@redhat.com]
> Sent: Wednesday, December 21, 2016 2:24 AM
> To: Aakanksha Pudipeddi
> Cc: ceph-users
> Subject: Re: [ceph-users] Ceph Import Error
>
> On Tue, Dec 20, 2016 at 11:32 PM, Aakanksha Pudipeddi 
>  wrote:
>> I am trying to setup kraken from source and I get an import error on
>> using the ceph command:
>>
>>
>>
>> Traceback (most recent call last):
>>
>>   File "/home/ssd/src/vanilla-ceph/ceph-install/bin/ceph", line 112,
>> in 
>>
>> from ceph_argparse import \
>>
>> ImportError: cannot import name descsort_key
>>
>>
>>
>> The python path is correctly pointing to the location of ceph_argparse
>> but I want to know if it is a known error. On looking online, I found
>> that this is generally a result of circular dependencies, although of
>> what, I do not know yet. Any help would be appreciated.
>
> Not sure what you mean by "setup from source" in this case -- are you trying 
> to install Ceph system wide, or are you trying to run it out of your source 
> tree?
>
> What is the overall command (including PYTHONPATH etc) that you are trying to 
> run?
>
> John
>
>
>>
>>
>> Thanks!
>>
>> Aakanksha
>>
>>
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs quota

2016-12-14 Thread Shinobu Kinjo

Would you give us some outputs?

 # getfattr -n ceph.quota.max_bytes /some/dir

and

 # ls -l /some/dir

On Thu, Dec 15, 2016 at 4:41 PM, gjprabu  wrote:
>
> Hi Team,
>
>   We are using ceph version 10.2.4 (Jewel) and data's are mounted
> with cephfs file system in linux. We are trying to set quota for directory
> and files but its don't worked with below document. I have set 100mb for
> directory quota but after reaching keep me allowing to put the data in that
> location. Highly appreciate any one help on this issue.
>
> http://docs.ceph.com/docs/jewel/cephfs/quota/
>
>
> Regards
> Prabu GJ
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] can cache-mode be set to readproxy for tier cachewith ceph 0.94.9 ?

2016-12-13 Thread Shinobu Kinjo

> ps: When we first met this issue, restarting the mds could cure that. (but 
> that was ceph 0.94.1).

Is this still working?

Since you're using 0.94.9, bug(#12551) you mentioned was fixed.

Can you do the followings to see object appear to you as ZERO size is
actually there:
 # rados -p ${cache pool} ls
 # rados -p ${cache pool} get ${object} /tmp/file
 # ls -l /tmp/file

-- Original --
From:  "Shinobu Kinjo"<ski...@redhat.com>;
Date:  Tue, Dec 13, 2016 06:21 PM
To:  "JiaJia Zhong"<zhongjia...@haomaiyi.com>;
Cc:  "CEPH list"<ceph-users@lists.ceph.com>; "ukernel"<uker...@gmail.com>;
Subject:  Re: [ceph-users] can cache-mode be set to readproxy for tier
cachewith ceph 0.94.9 ?



On Tue, Dec 13, 2016 at 4:38 PM, JiaJia Zhong <zhongjia...@haomaiyi.com> wrote:
>
> hi cephers:
> we are using ceph hammer 0.94.9,  yes, It's not the latest ( jewel),
> with some ssd osds for tiering,  cache-mode is set to readproxy, 
> everything seems to be as expected,
> but when reading some small files from cephfs, we got 0 bytes.


Would you be able to share:

 #1 How small is actual data?
 #2 Is the symptom reproduceable with same size of different data?
 #3 can you share your ceph.conf(ceph --show-config)?

>
>
> I did some search and got the below link,
> 
> http://ceph-users.ceph.narkive.com/g4wcB8ED/cephfs-with-cache-tiering-reading-files-are-filled-with-0s
> that's almost the same as what we are suffering from except  the 
> cache-mode in the link is writeback, ours is readproxy.
>
> that bug shall have been FIXED in 0.94.9 
> (http://tracker.ceph.com/issues/12551)
> but we still can encounter with that occasionally ：(
>
>enviroment:
>  - ceph: 0.94.9
>  - kernel client: 4.2.0-36-generic ( ubuntu 14.04 )
>  - any others needed ?
>
>Question:
>1.  does readproxy mode work on ceph0.94.9 ? since there are only 
> writeback and readonly in  the document for hammer.
>2.  any one with (Jewel or Hammer) met the same issue ?
>
>
> loop Yan, Zheng
>Quote from the link for convince.
>  """
> Hi,
>
> I am experiencing an issue with CephFS with cache tiering where the kernel
> clients are reading files filled entirely with 0s.
>
> The setup:
> ceph 0.94.3
> create cephfs_metadata replicated pool
> create cephfs_data replicated pool
> cephfs was created on the above two pools, populated with files, then:
> create cephfs_ssd_cache replicated pool,
> then adding the tiers:
> ceph osd tier add cephfs_data cephfs_ssd_cache
> ceph osd tier cache-mode cephfs_ssd_cache writeback
> ceph osd tier set-overlay cephfs_data cephfs_ssd_cache
>
> While the cephfs_ssd_cache pool is empty, multiple kernel clients on
> different hosts open the same file (the size of the file is small, <10k) at
> approximately the same time. A number of the clients from the OS level see
> the entire file being empty. I can do a rados -p {cache pool} ls for the
> list of files cached, and do a rados -p {cache pool} get {object} /tmp/file
> and see the complete contents of the file.
> I can repeat this by setting cache-mode to forward, rados -p {cache pool}
> cache-flush-evict-all, checking no more objects in cache with rados -p
> {cache pool} ls, resetting cache-mode to writeback with an empty pool, and
> doing the multiple same file opens.
>
> Has anyone seen this issue? It seems like what may be a race condition
> where the object is not yet completely loaded into the cache pool so the
> cache pool serves out an incomplete object.
> If anyone can shed some light or any suggestions to help debug this issue,
> that would be very helpful.
>
> Thanks,
> Arthur"""
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] can cache-mode be set to readproxy for tier cache with ceph 0.94.9 ?

2016-12-13 Thread Shinobu Kinjo

On Tue, Dec 13, 2016 at 4:38 PM, JiaJia Zhong 
wrote:

> hi cephers:
> we are using ceph hammer 0.94.9,  yes, It's not the latest ( jewel),
> with some ssd osds for tiering,  cache-mode is set to readproxy,
> everything seems to be as expected,
> but when reading some small files from cephfs, we got 0 bytes.
>

Would you be able to share:

 #1 How small is actual data?
 #2 Is the symptom reproduceable with same size of different data?
 #3 can you share your ceph.conf(ceph --show-config)?


>
> I did some search and got the below link,
> http://ceph-users.ceph.narkive.com/g4wcB8ED/cephfs-
> with-cache-tiering-reading-files-are-filled-with-0s
> that's almost the same as what we are suffering from except  the
> cache-mode in the link is writeback, ours is readproxy.
>
> that bug shall have been FIXED in 0.94.9 (http://tracker.ceph.com/
> issues/12551)
> but we still can encounter with that occasionally ：(
>
>enviroment:
>  - ceph: 0.94.9
>  - kernel client: 4.2.0-36-generic ( ubuntu 14.04 )
>  - any others needed ?
>
>Question:
>1.  does readproxy mode work on ceph0.94.9 ? since there are only
> writeback and readonly in  the document for hammer.
>2.  any one with (Jewel or Hammer) met the same issue ?
>
>
> loop Yan, Zheng
>Quote from the link for convince.
>  """
> Hi, 
>
> I am experiencing an issue with CephFS with cache tiering where the kernel
> clients are reading files filled entirely with 0s.
>
> The setup:
> ceph 0.94.3
> create cephfs_metadata replicated pool
> create cephfs_data replicated pool
> cephfs was created on the above two pools, populated with files, then:
> create cephfs_ssd_cache replicated pool,
> then adding the tiers:
> ceph osd tier add cephfs_data cephfs_ssd_cache
> ceph osd tier cache-mode cephfs_ssd_cache writeback
> ceph osd tier set-overlay cephfs_data cephfs_ssd_cache
>
> While the cephfs_ssd_cache pool is empty, multiple kernel clients on
> different hosts open the same file (the size of the file is small, <10k) at
> approximately the same time. A number of the clients from the OS level see
> the entire file being empty. I can do a rados -p {cache pool} ls for the
> list of files cached, and do a rados -p {cache pool} get {object} /tmp/file
> and see the complete contents of the file.
> I can repeat this by setting cache-mode to forward, rados -p {cache pool}
> cache-flush-evict-all, checking no more objects in cache with rados -p
> {cache pool} ls, resetting cache-mode to writeback with an empty pool, and
> doing the multiple same file opens.
>
> Has anyone seen this issue? It seems like what may be a race condition
> where the object is not yet completely loaded into the cache pool so the
> cache pool serves out an incomplete object.
> If anyone can shed some light or any suggestions to help debug this issue,
> that would be very helpful.
>
> Thanks,
> Arthur"""
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] A question about io consistency in osd down case

2016-12-12 Thread Shinobu Kinjo

On Sat, Dec 10, 2016 at 11:00 PM, Jason Dillaman  wrote:
> I should clarify that if the OSD has silently failed (e.g. the TCP
> connection wasn't reset and packets are just silently being dropped /
> not being acked), IO will pause for up to "osd_heartbeat_grace" before

The number is how long an OSD will wait for a response from another
OSD before telling the MONs that it's not responding.

> IO can proceed again.
>
> On Sat, Dec 10, 2016 at 8:46 AM, Jason Dillaman  wrote:
>> On Sat, Dec 10, 2016 at 6:11 AM, zhong-yan.gu  wrote:
>>> Hi Jason,
>>> sorry to bother you. A question about io consistency in osd down case :
>>> 1. a write op arrives primary osd A
>>> 2. osd A does  local write and sends out replica writes to osd B and C
>>> 3. B finishes write and return ACK to A. However C is down and has no chance
>>> to send out ACK.
>>>
>>> In this case A will not reply ACK to client. after a while cluster detects C
>>> is down and enters peering. after peering,  how will be the previous write
>>> op to be processed?
>>
>> AFAIK, assuming you have a replica size of 3 and a minimum replica
>> size of 2, losing one OSD within the PG set won't be enough to stall
>> the write operation assuming it wasn't the primary PG OSD that went
>> offline. Quorum was available so both online OSDs were able to log the
>> transaction to help recover the offline OSD when it becomes available
>> again. Once the offline OSD comes back, it can replay the log received
>> from its peers to get back in sync. There is actually lots of
>> available documentation on this process [1].
>>
>>> Does the client still have a chance to receive the ACK?
>>
>> Yup, the client will receive the ACK as soon as  PGs have
>> safely committed the IO.
>>
>>> The next time if client read the corresponding data, is it updated or not?
>>
>> If an IO has been ACKed back to the client, future reads to that
>> extent will return the committed data (we don't want to go backwards
>> in time).
>>
>>> For the case that both B and C are down before ack replied to A, is there
>>> any difference?
>>
>> Assuming you have  = 2 (the default), your IO would be
>> blocked until those OSDs come back online or until the mons have
>> detected those OSDs are dead and has remapped the affected PGs to new
>> (online) OSDs.
>>
>>>
>>> Is there any case in which ceph finished writes silently but no ack to
>>> clients?
>>
>> Sure, if your client dies before it receives the ACK from the OSDs.
>> However, your data is still crash consistent.
>>
>>> Zhongyan
>>>
>>
>> [1] 
>> https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/log_based_pg.rst
>>
>> --
>> Jason
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I want to submit a PR - Can someone guide me

2016-11-18 Thread Shinobu Kinjo

On Sat, Nov 19, 2016 at 6:59 AM, Brad Hubbard  wrote:
> +ceph-devel
>
> On Fri, Nov 18, 2016 at 8:45 PM, Nick Fisk  wrote:
>> Hi All,
>>
>> I want to submit a PR to include fix in this tracker bug, as I have just 
>> realised I've been experiencing it.
>>
>> http://tracker.ceph.com/issues/9860
>>
>> I understand that I would also need to update the debian/ceph-osd.* to get 
>> the file copied, however I'm not quite sure where this
>> new file (/usr/lib/os-probes/10ceph) should sit in the Ceph source tree. Can 
>> someone advise me please?

It would be better to talk on tracker, and IRC(#ceph-devel) to make
progresses and track.

>>
>> Nick
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Shinobu Kinjo

So I am wondering ``was`` is the recommended way to fix this issue for
the cluster running Jewel release (10.2.2)?

So I am wondering ``what`` is the recommended way to fix this issue
for the cluster running Jewel release (10.2.2)?

typo?


On Mon, Aug 8, 2016 at 8:19 PM, Mykola Dvornik <mykola.dvor...@gmail.com> wrote:
> @Shinobu
>
> According to
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
>
> "If you cannot start an OSD because it is full, you may delete some data by
> deleting some placement group directories in the full OSD."
>
>
> On 8 August 2016 at 13:16, Shinobu Kinjo <shinobu...@gmail.com> wrote:
>>
>> On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik <mykola.dvor...@gmail.com>
>> wrote:
>> > Dear ceph community,
>> >
>> > One of the OSDs in my cluster cannot start due to the
>> >
>> > ERROR: osd init failed: (28) No space left on device
>> >
>> > A while ago it was recommended to manually delete PGs on the OSD to let
>> > it
>> > start.
>>
>> Who recommended that?
>>
>> >
>> > So I am wondering was is the recommended way to fix this issue for the
>> > cluster running Jewel release (10.2.2)?
>> >
>> > Regards,
>> >
>> > --
>> >  Mykola
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Email:
>> shin...@linux.com
>> shin...@redhat.com
>
>
>
>
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recovering full OSD

2016-08-08 Thread Shinobu Kinjo

On Mon, Aug 8, 2016 at 8:01 PM, Mykola Dvornik  wrote:
> Dear ceph community,
>
> One of the OSDs in my cluster cannot start due to the
>
> ERROR: osd init failed: (28) No space left on device
>
> A while ago it was recommended to manually delete PGs on the OSD to let it
> start.

Who recommended that?

>
> So I am wondering was is the recommended way to fix this issue for the
> cluster running Jewel release (10.2.2)?
>
> Regards,
>
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs going down when we bring down some OSD nodes Or cut-off the cluster network link between OSD nodes

2016-08-07 Thread Shinobu Kinjo

On Sun, Aug 7, 2016 at 6:56 PM, Christian Balzer  wrote:
>
> [Reduced to ceph-users, this isn't community related]
>
> Hello,
>
> On Sat, 6 Aug 2016 20:23:41 +0530 Venkata Manojawa Paritala wrote:
>
>> Hi,
>>
>> We have configured single Ceph cluster in a lab with the below
>> specification.
>>
>> 1. Divided the cluster into 3 logical sites (SiteA, SiteB & SiteC). This is
>> to simulate that nodes are part of different Data Centers and having
>> network connectivity between them for DR.
>
> You might want to search the ML archives, this has been discussed plenty
> of times.
> While DR and multi-site replication certainly is desirable, it is also
> going to introduce painful latencies with Ceph, especially if your sites
> aren't relatively close to each other (Metro, less than 10km fiber runs).
>
> The new rbd-mirror feature may or may not help in this kind of scenario,
> see the posts about this just in the last few days.
>
> Since you didn't explicitly mentioned it, you do have custom CRUSH rules
> to distribute your data accordingly?
>
>> 2. Each site operates in a different subnet and each subnet is part of one
>> VLAN. We have configured routing so that OSD nodes in one site can
>> communicate to OSD nodes in the other 2 sites.
>> 3. Each site will have one monitor  node, 2  OSD nodes (to which we have
>> disks attached) and IO generating clients.
>
> You will want more monitors in a production environment and depending on
> the actual topology more "sites" to break ties.
>
> For example if you have triangle setup, give your primary site 3 MONs
> and the other sites 2 MONs each.
>
> Of course this means if you loose all network links between your sites,
> you still won't be able to reach quorum.
>
>> 4. We have configured 2 networks.
>> 4.1. Public network - To which all the clients, monitors and OSD nodes are
>> connected
>> 4.2. Cluster network - To which only the OSD nodes are connected for -
>> Replication/recovery/hearbeat traffic.
>>
> Unless actually needed, I (and others) tend to avoid split networks, since
> it can introduce "wonderful" failure scenarios, as you just found out.
>
> The only reason for such a split network setup in my book is if your
> storage nodes can write FASTER than the aggregate bandwidth of your
> network links to those nodes.
>
>> 5. We have 2 issues here.
>> 5.1. We are unable sustain IO for clients from individual sites when we
>> isolate the OSD nodes by bringing down ONLY the cluster network between
>> sites. Logically this will make the individual sites to be in isolation
>> with respect to the cluster network. Please note that the public network is
>> still connected between the sites.
>>
> See above, that's expected.
> Though in a real world setup I'd expect both networks to fail (common fiber
> trunk being severed) at the same time.
>
> Again, instead of 2 networks you'll be better off with as single, but
> fully redundant network.
>
>> 5.2. In a fully functional cluster, when we bring down 2 sites (shutdown
>> the OSD services of 2 sites - say Site A OSDs and Site B OSDs) then, OSDs
>> in the third site (Site C) are going down (OSD Flapping).
>>
>
> This is a bit unclear, if you only shut down the OSDs and MONs are still
> running and have connectivity the cluster should have a working quorum
> still (the thing you're thinking about below).
>
> OTOH, loosing 2/3rd of your OSDs with normal (min_size=2) replication
> settings will lock your cluster up anyway.
>
> Regards,
>
> Christian
>
>> We need workarounds/solutions to  fix the above 2 issues.
>>
>> Below are some of the parameters we have already mentioned in the Cenf.conf
>> to sustain the cluster for a longer time, when we cut-off the links between
>> sites. But, they were not successful.
>>
>> --
>> [global]
>> public_network = 10.10.0.0/16
>> cluster_network = 192.168.100.0/16,192.168.150.0/16,192.168.200.0/16
>> osd hearbeat address = 172.16.0.0/16
>>
>> [monitor]
>> mon osd report timeout = 1800
>>
>> [OSD}

Typo?

>> osd heartbeat interval = 12
>> osd hearbeat grace = 60
>> osd mon heartbeat interval = 60
>> osd mon report interval max = 300
>> osd mon report interval min = 10
>> osd mon act timeout = 60
>> .
>> .
>> 
>>
>> We also confiured the parameter "osd_heartbeat_addr" and tried with the
>> values - 1) Ceph public network (assuming that when we bring down the
>> cluster network hearbeat should happen via public network). 2) Provided a
>> different network range altogether and had physical connections. But both
>> the options did not work.
>>
>> We have a total of 49 OSDs (14 in Site A, 14 in SiteB, 21 in SiteC) in the
>> cluster. One Monitor in each Site.
>>
>> We need to try the below two options.
>>
>> A) Increase the "mon osd min down reporters" value. Question is how much.
>> Say, if I give this value to 49, then will the client IO sustain when we
>> cut-off the cluster network links between sites. In this case one issue
>> would be that if the

Re: [ceph-users] How to configure OSD heart beat to happen on public network

2016-08-02 Thread Shinobu Kinjo

osd_heartbeat_addr must be in [osd] section.

On Thu, Jul 28, 2016 at 4:31 AM, Venkata Manojawa Paritala
 wrote:
> Hi,
>
> I have configured the below 2 networks in Ceph.conf.
>
> 1. public network
> 2. cluster_network
>
> Now, the heart beat for the OSDs is happening thru cluster_network. How can
> I configure the heart beat to happen thru public network?
>
> I actually configured the property "osd heartbeat address" in the global
> section and provided public network's subnet, but it is not working out.
>
> Am I doing something wrong? Appreciate your quick responses, as I need to
> urgently.
>
>
> Thanks & Regards,
> Manoj
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] setting crushmap while creating pool fails

2016-07-15 Thread Shinobu Kinjo

Thank you for that report.
Sorry for that -;

 shinobu

On Fri, Jul 15, 2016 at 4:47 PM, Oliver Dzombic <i...@ip-interactive.de>
wrote:

> Hi Shinobu,
>
> > osd_pool_default_crush_replicated_ruleset = 2
>
> Thats already set, and ignored.
>
> If your crushmap does not start with ruleset id 0 you will see this
> missbehaviour.
>
> Also your mon servers will crashing.
>
> See http://tracker.ceph.com/issues/16653
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 15.07.2016 um 06:22 schrieb Shinobu Kinjo:
> >> osd_pool_default_crush_replicated_ruleset = 2
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Shinobu Kinjo

Can you reproduce with debug client = 20?

On Tue, Jul 5, 2016 at 10:16 AM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Dear All...
>
> We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2.
>
> We are currently using ceph-fuse to mount cephfs in a number of clients.
>
> ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
> scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
> application requesting 4 hosts with 4 cores each (16 instances in total) .
> According to the user, each instance has its own dedicated inputs and
> outputs.
>
> Please note that if we go back to ceph-fuse 9.2.0 client everything works
> fine.
>
> The ceph-fuse 10.2.2 client segfault is the following (we were able to
> capture it mounting ceph-fuse in debug mode):
>
> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
> ceph-fuse[7346]: starting ceph client
> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 0x7f6af8c12320
> newargc=11
> ceph-fuse[7346]: starting fuse
> *** Caught signal (Segmentation fault) **
>  in thread 7f69d7fff700 thread_name:ceph-fuse
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x297ef2) [0x7f6aedbecef2]
>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
> [0x7f6aedaee035]
>  5: (()+0x199891) [0x7f6aedaee891]
>  6: (()+0x15b76) [0x7f6aed50db76]
>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>  9: (clone()+0x6d) [0x7f6aeb8d193d]
> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f69d7fff700 thread_name:ceph-fuse
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x297ef2) [0x7f6aedbecef2]
>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
> [0x7f6aedaee035]
>  5: (()+0x199891) [0x7f6aedaee891]
>  6: (()+0x15b76) [0x7f6aed50db76]
>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> The full dump is quite long. Here are the very last bits of it. Let me
> know if you need the full dump.
>
> --- begin dump of recent events ---
>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
> _getxattr(137c789, "security.capability", 0) = -61
>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 ll_write
> 0x7f6a08028be0 137c78c 20094~34
>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 ll_write
> 0x7f6a08028be0 20094~34 = 34
>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 ll_write
> 0x7f6a100145f0 137c78d 28526~34
>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 ll_write
> 0x7f6a100145f0 28526~34 = 34
>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559 ll_forget
> 137c78c 1
>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559 ll_forget
> 137c789 1
>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a94006350 137c789 22010~216
>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a94006350 22010~216 = 216
>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
> ll_getxattr 137c78c.head security.capability size 0
>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
> _getxattr(137c78c, "security.capability", 0) = -61
>
> 
>
>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
> _getxattr(137c78a, "security.capability", 0) = -61
>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 ll_write
> 0x7f6a08042560 137c78b 11900~34
>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 ll_write
> 0x7f6a08042560 11900~34 = 34
>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
> ll_getattr 11e9c80.head
>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
> ll_getattr 11e9c80.head = 0
>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559 ll_forget
> 137c78a 1
>   -154> 2016-07-05 10:09:14.043738 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a140d5930 137c78a 18292~34
>   -153> 2016-07-05 10:09:14.043759 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a140d5930 18292~34 = 34
>   -152> 2016-07-05 10:09:14.043767 7f6ac17fb700  3 client.464559 ll_forget
> 11e9c80 1
>   -151> 2016-07-05 10:09:14.043784 7f6aa8cf9700  3 client.464559 ll_flush
> 0x7f6a00049fe0 11e9c80
>   -150> 2016-07-05 10:09:14.043794 7f6aa8cf9700  3 client.464559
> ll_getxattr 137c78a.head security.capability size 0
>   -149> 2016-07-05 10:09:14.043799 7f6aa8cf9700  3

Re: [ceph-users] 答复: 转发: how to fix the mds damaged issue

2016-07-04 Thread Shinobu Kinjo

Reproduce with 'debug mds = 20' and 'debug ms = 20'.

 shinobu

On Mon, Jul 4, 2016 at 9:42 PM, Lihang <li.h...@h3c.com> wrote:

> Thank you very much for your advice. The command "ceph mds repaired 0"
> work fine in my cluster, my cluster state become HEALTH_OK and the cephfs
> state become normal also. but in the monitor or mds log file ,it just
> record the replay and recover process log without point out somewhere is
> abnormal . and I haven't the log when this issue happened . So I haven't
> found out the root cause of this issue. I'll try to reproduce this issue .
> thank you very much again!
> fisher
>
> -邮件原件-
> 发件人: John Spray [mailto:jsp...@redhat.com]
> 发送时间: 2016年7月4日 17:49
> 收件人: lihang 12398 (RD)
> 抄送: ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] 转发: how to fix the mds damaged issue
>
> On Sun, Jul 3, 2016 at 8:06 AM, Lihang <li.h...@h3c.com> wrote:
> > root@BoreNode2:~# ceph -v
> >
> > ceph version 10.2.0
> >
> >
> >
> > 发件人: lihang 12398 (RD)
> > 发送时间: 2016年7月3日 14:47
> > 收件人: ceph-users@lists.ceph.com
> > 抄送: Ceph Development; 'uker...@gmail.com'; zhengbin 08747 (RD);
> > xusangdi
> > 11976 (RD)
> > 主题: how to fix the mds damaged issue
> >
> >
> >
> > Hi, my ceph cluster mds is damaged and the cluster is degraded after
> > our machines library power down suddenly. then the cluster is
> > “HEALTH_ERR” and cann’t be recovered to health by itself after my
> >
> > Reboot the storage node system or restart the ceph cluster yet. After
> > that I also use the following command to remove the damaged mds, but
> > the damaged mds be removed failed and the issue exist still. The
> > another two mds state is standby. Who can tell me how to fix this
> > issue and find out what happened in my cluter?
> >
> > the remove damaged mds process in my storage node as follows.
> >
> > 1> Execute ”stop ceph-mds-all” command  in the damaged mds node
> >
> > 2>  ceph mds rmfailed 0 --yes-i-really-mean-it
>
> rmfailed is not something you want to use in these circumstances.
>
> > 3>  root@BoreNode2:~# ceph  mds rm 0
> >
> > mds gid 0 dne
> >
> >
> >
> > The detailed status of my cluster as following:
> >
> > root@BoreNode2:~# ceph -s
> >
> >   cluster 98edd275-5df7-414f-a202-c3d4570f251c
> >
> >  health HEALTH_ERR
> >
> > mds rank 0 is damaged
> >
> > mds cluster is degraded
> >
> >  monmap e1: 3 mons at
> > {BoreNode2=172.16.65.141:6789/0,BoreNode3=172.16.65.142:6789/0,BoreNod
> > e4=172.16.65.143:6789/0}
> >
> > election epoch 1010, quorum 0,1,2
> > BoreNode2,BoreNode3,BoreNode4
> >
> >   fsmap e168: 0/1/1 up, 3 up:standby, 1 damaged
> >
> >  osdmap e338: 8 osds: 8 up, 8 in
> >
> > flags sortbitwise
> >
> >   pgmap v17073: 1560 pgs, 5 pools, 218 kB data, 32 objects
> >
> > 423 MB used, 3018 GB / 3018 GB avail
> >
> > 1560 active+clean
>
> When an MDS rank is marked as damaged, that means something invalid was
> found when reading from the pool storing metadata objects.  The next step
> is to find out what that was.  Look in the MDS log and in ceph.log from the
> time when it went damaged, to find the most specific error message you can.
>
> If you do not have the logs and want to have the MDS try operating again
> (to reproduce whatever condition caused it to be marked damaged), you can
> enable it by using "ceph mds repaired 0", then start the daemon and see how
> it is failing.
>
> John
>
> > root@BoreNode2:~# ceph mds dump
> >
> > dumped fsmap epoch 168
> >
> > fs_name TudouFS
> >
> > epoch   156
> >
> > flags   0
> >
> > created 2016-04-02 02:48:11.150539
> >
> > modified2016-04-03 03:04:57.347064
> >
> > tableserver 0
> >
> > root0
> >
> > session_timeout 60
> >
> > session_autoclose   300
> >
> > max_file_size   1099511627776
> >
> > last_failure0
> >
> > last_failure_osd_epoch  83
> >
> > compat  compat={},rocompat={},incompat={1=base v0.20,2=client
> > writeable ranges,3=default file layouts on dirs,4=dir inode in
> > separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> > omap,8=file layout v2}
> >
> > max_mds 1
> >
> > in  0
> >
> > up  {}
> >
> > failed
> >
> > da

Re: [ceph-users] object size changing after a pg repair

2016-06-30 Thread Shinobu Kinjo

Thank you for your clarification.

On Thu, Jun 30, 2016 at 2:50 PM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Hi Shinobu
>
> > Sorry probably I don't understand your question properly.
> > Is what you're worry about that object mapped to specific pg could be
> overwritten on different osds?
>
> Not really. I was worried by seeing object sizes changing on the fly.
>
> I will try to clarify.
>
> We are enabling cephfs for our user community.
>
> My mind was set to a context where data is not changing. I got scared
> because I was not finding plausible for the object to change size and
> content. I thought this was a consequence of a bad repair.
>

> However, thinking it over, if we have some application overwritting the
> same file over and over again (which I think we have), that means that we
> will see the same objects change size and content over time. In cephfs, the
> name of the objects is directly related to the file inode and how it is
> stripped so the object names do not actually change if a file is
> overwritten. Right?!  So, in summary, in this scenario, it is normal for
> objects to change size and content all the time.
>
> A consequence of this is that the very fast overwrite of files / objects
> could raise some scrub errors if, by chance, ceph is scrubbing pgs with
> objects which are changing on the fly.
>

IMHO in case that situation, clients write operations to rados will be
cancel (maybe `cancel` is not appropriate word in this sentence) until the
full epoch before touching same object.
Since clients must have latest OSD map.

Does it make sense?

Anyway in case I've been missing something, some will add more.


>
> Does this make sense?
> G.
>
>
>
> 
> From: Shinobu Kinjo [shinobu...@gmail.com]
> Sent: 30 June 2016 15:10
> To: Goncalo Borges
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] object size changing after a pg repair
>
> On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges <
> goncalo.bor...@sydney.edu.au<mailto:goncalo.bor...@sydney.edu.au>> wrote:
>
> Hi Shinobu
>
> I've run
>
># ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query
> > 6.263query2.txt
>
> and I am sending the full 6.263query1.txt output as well as the results of
>
># diff -Nua  6.263query1.txt  6.263query2.txt
>
> Actually the sizes of the objects are fixed if the data isn't changed,
> right?! I am actually thinking in a scenario where a user application is
> overwriting the same file meaning that the objects will be overwritten all
> the time. If by change scrub runs in one of these objects, there is a
> chance that it finds differences depending on how fast the file / object
> are overwritten and replicated to the different osds.
>
> Sorry probably I don't understand your question properly.
> Is what you're worry about that object mapped to specific pg could be
> overwritten on different osds?
>
>
> Is this possible?
>
> Cheers
> Goncalo
>
>
> # cat 6.263query1.txt
> {
> "state": "active+clean",
> "snap_trimq": "[]",
> "epoch": 1005,
> "up": [
> 56,
> 39,
> 6
> ],
> "acting": [
> 56,
> 39,
> 6
> ],
> "actingbackfill": [
> "6",
> "39",
> "56"
> ],
> "info": {
> "pgid": "6.263",
> "last_update": "1005'2273061",
> "last_complete": "1005'2273061",
> "log_tail": "1005'227",
> "last_user_version": 2273061,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 341,
> "last_epoch_started": 996,
> "last_epoch_clean": 996,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 994,
> "same_interval_since": 995,
> "same_primary_since": 995,
> "last_scrub": "1005'2076134",
> "last_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_deep_scrub": "1005'2076134",
> "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
>

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Shinobu Kinjo

On Thu, Jun 30, 2016 at 1:48 PM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

>
> Hi Shinobu
>
> I've run
>
># ceph pg 6.263 query > 6.263query1.txt; sleep 30; ceph pg 6.263 query
> > 6.263query2.txt
>
> and I am sending the full 6.263query1.txt output as well as the results of
>
># diff -Nua  6.263query1.txt  6.263query2.txt
>
> Actually the sizes of the objects are fixed if the data isn't changed,
> right?! I am actually thinking in a scenario where a user application is
> overwriting the same file meaning that the objects will be overwritten all
> the time. If by change scrub runs in one of these objects, there is a
> chance that it finds differences depending on how fast the file / object
> are overwritten and replicated to the different osds.
>

Sorry probably I don't understand your question properly.
Is what you're worry about that object mapped to specific pg could be
overwritten on different osds?


>
> Is this possible?
>
> Cheers
> Goncalo
>
>
> # cat 6.263query1.txt
> {
> "state": "active+clean",
> "snap_trimq": "[]",
> "epoch": 1005,
> "up": [
> 56,
> 39,
> 6
> ],
> "acting": [
> 56,
> 39,
> 6
> ],
> "actingbackfill": [
> "6",
> "39",
> "56"
> ],
> "info": {
> "pgid": "6.263",
> "last_update": "1005'2273061",
> "last_complete": "1005'2273061",
> "log_tail": "1005'227",
> "last_user_version": 2273061,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 341,
> "last_epoch_started": 996,
> "last_epoch_clean": 996,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 994,
> "same_interval_since": 995,
> "same_primary_since": 995,
> "last_scrub": "1005'2076134",
> "last_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_deep_scrub": "1005'2076134",
> "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256"
> },
> "stats": {
> "version": "1005'2273061",
> "reported_seq": "2937682",
> "reported_epoch": "1005",
> "state": "active+clean",
> "last_fresh": "2016-06-30 04:38:13.270047",
> "last_change": "2016-06-30 02:13:00.455293",
> "last_active": "2016-06-30 04:38:13.270047",
> "last_peered": "2016-06-30 04:38:13.270047",
> "last_clean": "2016-06-30 04:38:13.270047",
> "last_became_active": "2016-06-27 04:57:36.949798",
> "last_became_peered": "2016-06-27 04:57:36.949798",
> "last_unstale": "2016-06-30 04:38:13.270047",
> "last_undegraded": "2016-06-30 04:38:13.270047",
> "last_fullsized": "2016-06-30 04:38:13.270047",
> "mapping_epoch": 994,
> "log_start": "1005'227",
> "ondisk_log_start": "1005'227",
> "created": 341,
> "last_epoch_clean": 996,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "1005'2076134",
> "last_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_deep_scrub": "1005'2076134",
> "last_deep_scrub_stamp": "2016-06-30 02:13:00.455256",
> "last_clean_scrub_stamp": "2016-06-30 02:13:00.455256",
> "log_size": 3061,
> "ondisk_log_size": 3061,
> "stats_invalid&

Re: [ceph-users] object size changing after a pg repair

2016-06-29 Thread Shinobu Kinjo

What does `ceph pg 6.263 query` show you?


On Thu, Jun 30, 2016 at 12:02 PM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Dear Cephers...
>
> Today our ceph cluster gave us a couple of scrub errors regarding
> inconsistent pgs. We just upgraded from 9.2.0 to 10.2.2 two days ago.
>
> # ceph health detail
> HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; crush map has legacy
> tunables (require bobtail, min is firefly)
> pg 6.39c is active+clean+inconsistent, acting [2,60,32]
> pg 6.263 is active+clean+inconsistent, acting [56,39,6]
> 2 scrub errors
> crush map has legacy tunables (require bobtail, min is firefly); see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> We have started by looking to pg 6.263. Errors were only appearing in
> osd.56 logs but not in others.
>
> # cat  ceph-osd.56.log-20160629 | grep -Hn 'ERR'
> (standard input):8569:2016-06-29 08:09:50.952397 7fd023322700 -1
> log_channel(cluster) log [ERR] : scrub 6.263
> 6:c645f18e:::12a343d.:head on disk size (1836) does not match
> object info size (41242) adjusted for ondisk to (41242)
> (standard input):8602:2016-06-29 08:11:11.227865 7fd023322700 -1
> log_channel(cluster) log [ERR] : 6.263 scrub 1 errors
>
> So, we did a 'ceph pg repair  6.263'.
>
> Eventually, that pg went back to 'active+clean'
>
> # ceph pg dump | grep ^6.263
> dumped all in format plain
> 6.263   10845   0   0   0   0   39592671010 3037
> 3037active+clean2016-06-30 02:13:00.455293  1005'2126237
> 1005:2795768[56,39,6]   56  [56,39,6]   56
> 1005'20761342016-06-30 02:13:00.455256  1005'20761342016-06-30
> 02:13:00.455256
>
> However, in the logs i found
>
> 2016-06-30 02:03:03.992240 osd.56 192.231.127.226:6801/21569 278 :
> cluster [INF] 6.263 repair starts
> 2016-06-30 02:13:00.455237 osd.56 192.231.127.226:6801/21569 279 :
> cluster [INF] 6.263 repair ok, 0 fixed
>
> I did not like the '0 fixed'.
>
> Inspecting a bit more, I found that the object inside the pg in all
> involved osds are changing size. For example in osd.56 (but the same thing
> is true in 39 and 6) I found in consecutive 'ls -l' commands:
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 8602 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> [root@rccephosd8 ceph]# ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 170 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 15436 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 26044 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 14076 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 31110 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 0 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 20230 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
> -rw-r--r-- 1 ceph ceph 23392 Jun 30 02:53
> /var/lib/ceph/osd/ceph-56/current/6.263_head/DIR_3/DIR_6/DIR_2/DIR_A/12a343d.__head_718FA263__6
>
> # ls -l
>

Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-25 Thread Shinobu Kinjo

What will the followings show you?

ceph pg 12.258 list_unfound  // maybe hung...
ceph pg dump_stuck

and enable debug to osd.4

debug osd = 20
debug filestore = 20
debug ms = 1

But honestly my best bet is to upgrade to the latest. It would save
your life much more.

 - Shinobu

On Thu, May 26, 2016 at 5:25 AM, Heath Albritton <halbr...@harm.org> wrote:
> I fear I've hit a bug as well.  Considering an upgrade to the latest release 
> of hammer.  Somewhat concerned that I may lose those PGs.
>
>
> -H
>
>> On May 25, 2016, at 07:42, Gregory Farnum <gfar...@redhat.com> wrote:
>>
>>> On Tue, May 24, 2016 at 11:19 PM, Heath Albritton <halbr...@harm.org> wrote:
>>> Not going to attempt threading and apologies for the two messages on
>>> the same topic.  Christian is right, though.  3 nodes per tier, 8 SSDs
>>> per node in the cache tier, 12 spinning disks in the cold tier.  10GE
>>> client network with a separate 10GE back side network.  Each node in
>>> the cold tier has two Intel P3700 SSDs as a journal.  This setup has
>>> yielded excellent performance over the past year.
>>>
>>> The memory exhaustion comes purely from one errant OSD process.  All
>>> the remaining processes look fairly normal in terms of memory
>>> consumption.
>>>
>>> These nodes aren't particularly busy.  A random sampling shows a few
>>> hundred kilobytes of data being written and very few reads.
>>>
>>> Thus far, I've done quite a bit of juggling of OSDs.  Setting the
>>> cluster to noup.  Restarting the failed ones, letting them get to the
>>> current map and then clearing the noup flag and letting them rejoin.
>>> Eventually, they'll fail again and then a fairly intense recovery
>>> happens.
>>>
>>> here's ceph -s:
>>>
>>> https://dl.dropboxusercontent.com/u/90634073/ceph/ceph_dash_ess.txt
>>>
>>> Cluster has been in this state for a while.  There are 3 PGs that seem
>>> to be problematic:
>>>
>>> [root@t2-node01 ~]# pg dump | grep recovering
>>> -bash: pg: command not found
>>> [root@t2-node01 ~]# ceph pg dump | grep recovering
>>> dumped all in format plain
>>> 9.2f1 1353 1075 4578 1353 1075 9114357760 2611 2611
>>> active+recovering+degraded+remapped 2016-05-24 21:49:26.766924
>>> 8577'2611 8642:84 [15,31] 15 [15,31,0] 15 5123'2483 2016-05-23
>>> 23:52:54.360710 5123'2483 2016-05-23 23:52:54.360710
>>> 12.258 878 875 2628 0 0 4414509568 1534 1534
>>> active+recovering+undersized+degraded 2016-05-24 21:47:48.085476
>>> 4261'1534 8587:17712 [4,20] 4 [4,20] 4 4261'1534 2016-05-23
>>> 07:22:44.819208 4261'1534 2016-05-23 07:22:44.819208
>>> 11.58 376 0 1 2223 0 1593129984 4909 4909
>>> active+recovering+degraded+remapped 2016-05-24 05:49:07.531198
>>> 8642'409248 8642:406269 [56,49,41] 56 [40,48,62] 40 4261'406995
>>> 2016-05-22 21:40:40.205540 4261'406450 2016-05-21 21:37:35.497307
>>>
>>> pg 9.2f1 query:
>>> https://dl.dropboxusercontent.com/u/90634073/ceph/pg_9.21f.txt
>>>
>>> When I query 12.258 it just hangs
>>>
>>> pg 11.58 query:
>>> https://dl.dropboxusercontent.com/u/90634073/ceph/pg_11.58.txt
>>
>> Well, you've clearly had some things go very wrong. That "undersized"
>> means that the pg doesn't have enough copies to be allowed to process
>> writes, and I'm a little confused that it's also marked active but I
>> don't quite remember the PG state diagrams involved. You should
>> consider it down; it should be trying to recover itself though. I'm
>> not quite certain if the query is considered an operation it's not
>> allowed to service (which the RADOS team will need to fix, if it's not
>> done already in later releases) or if the query hanging is indicative
>> of yet another problem.
>>
>> The memory expansion is probably operations incoming on some of those
>> missing objects, or on the PG which can't take writes (but is trying
>> to recover itself to a state where it *can*). In general it shouldn't
>> be enough to exhaust the memory in the system, but you might have
>> mis-tuned things so that clients are allowed to use up a lot more
>> memory than is appropriate, or there might be a bug in v0.94.5.
>> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Email:
shin...@linux.com
shin...@redhat.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-30 Thread Shinobu Kinjo

On Sat, Apr 30, 2016 at 7:00 PM, Oliver Dzombic <i...@ip-interactive.de> wrote:
> Hi,
>
> sure.
>
> http://tracker.ceph.com/issues/13643

Thanks!!

I've totally missed that -;

>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 30.04.2016 um 10:38 schrieb Shinobu Kinjo:
>> On Sat, Apr 30, 2016 at 5:32 PM, Oliver Dzombic <i...@ip-interactive.de> 
>> wrote:
>>> Hi,
>>>
>>> there is a memory allocation bug, at least in hammer.
>>>
>>
>> Could you give us any pointer?
>>
>>> Mouting an rbd volume as a block device on a ceph node might run you
>>> into that. Then your mount wont work, and you will have to restart the
>>> OSD daemon(s).
>>>
>>> Its generally not a perfectly good idea.
>>>
>>> Better use a dedicated client for the mounts.
>>>
>>> --
>>> Mit freundlichen Gruessen / Best regards
>>>
>>> Oliver Dzombic
>>> IP-Interactive
>>>
>>> mailto:i...@ip-interactive.de
>>>
>>> Anschrift:
>>>
>>> IP Interactive UG ( haftungsbeschraenkt )
>>> Zum Sonnenberg 1-3
>>> 63571 Gelnhausen
>>>
>>> HRB 93402 beim Amtsgericht Hanau
>>> Geschäftsführung: Oliver Dzombic
>>>
>>> Steuer Nr.: 35 236 3622 1
>>> UST ID: DE274086107
>>>
>>>
>>> Am 29.04.2016 um 17:30 schrieb Edward Huyer:
>>>> This is more of a "why" than a "can I/should I" question.
>>>>
>>>> The Ceph block device quickstart says (if I interpret it correctly) not to 
>>>> use a physical machine as both a Ceph RBD client and a node for hosting 
>>>> OSDs or other Ceph services.
>>>>
>>>> Is this interpretation correct? If so, what is the reasoning? If not, what 
>>>> is it actually saying?
>>>>
>>>> Thanks in advance.
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Email:
shin...@linux.com
GitHub:
shinobu-x
Blog:
Life with Distributed Computational System based on OpenSource
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-30 Thread Shinobu Kinjo

On Sat, Apr 30, 2016 at 5:32 PM, Oliver Dzombic <i...@ip-interactive.de> wrote:
> Hi,
>
> there is a memory allocation bug, at least in hammer.
>

Could you give us any pointer?

> Mouting an rbd volume as a block device on a ceph node might run you
> into that. Then your mount wont work, and you will have to restart the
> OSD daemon(s).
>
> Its generally not a perfectly good idea.
>
> Better use a dedicated client for the mounts.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 29.04.2016 um 17:30 schrieb Edward Huyer:
>> This is more of a "why" than a "can I/should I" question.
>>
>> The Ceph block device quickstart says (if I interpret it correctly) not to 
>> use a physical machine as both a Ceph RBD client and a node for hosting OSDs 
>> or other Ceph services.
>>
>> Is this interpretation correct? If so, what is the reasoning? If not, what 
>> is it actually saying?
>>
>> Thanks in advance.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Email:
shin...@linux.com
GitHub:
shinobu-x
Blog:
Life with Distributed Computational System based on OpenSource
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replace Journal

2016-04-21 Thread Shinobu Kinjo

This is a previous thread about journal disk replacement.

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-May/039434.html

I hope this would be helpful for you.

Cheers,
S

- Original Message -
From: "Martin Wilderoth" 
To: ceph-us...@ceph.com
Sent: Friday, April 22, 2016 1:20:17 PM
Subject: [ceph-users] Replace Journal

I have a ceph cluster and I will change my journal devices to new SSD's . 

In some instructions of doing this they refer to a journal file (link to UUID 
of journal ) 

In my OSD folder this journal don’t exist. 

This instructions is renaming the UUID of new device to the old UUID not to 
break anything. 

i was planning to use the command ceph - osd -- mkjournal and update the ceph . 
conf accordingly. 

Do I need to take care of my missing journal symlink , i think it is a symlink 
? 
Why is it there, and how is it used ?. 

I actually don’t remember the command i used to create the disk but it's some 
years ago and i doubt i used ceph -disk. 

I found the following process in this list, that seemed good. But its still not 
clear to me if i can skip this journal link ? 

set noout 
stop the osds 
flush the journal 
replace journal SSDs 
recreate journal partitions 
update ceph . conf to reflect new journal device names 
recreate the journal (for the existing osds ) 
start the osds 
unset noout 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 3 >

1 - 100 of 233 matches

Mail list logo