Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-04 Thread Neha Ojha
We'll get https://github.com/ceph/ceph/pull/32000 out in 13.2.8 as
quickly as possible.

Neha

On Wed, Dec 4, 2019 at 6:56 AM Dan van der Ster  wrote:
>
> My advice is to wait.
>
> We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry
> picked and the OSDs no longer crash.
>
> My vote would be for a quick 13.2.8.
>
> -- Dan
>
> On Wed, Dec 4, 2019 at 2:41 PM Frank Schilder  wrote:
> >
> > Is this issue now a no-go for updating to 13.2.7 or are there only some 
> > specific unsafe scenarios?
> >
> > Best regards,
> >
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: ceph-users  on behalf of Dan van 
> > der Ster 
> > Sent: 03 December 2019 16:42:45
> > To: ceph-users
> > Subject: Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg
> >
> > I created https://tracker.ceph.com/issues/43106 and we're downgrading
> > our osds back to 13.2.6.
> >
> > -- dan
> >
> > On Tue, Dec 3, 2019 at 4:09 PM Dan van der Ster  wrote:
> > >
> > > Hi all,
> > >
> > > We're midway through an update from 13.2.6 to 13.2.7 and started
> > > getting OSDs crashing regularly like this [1].
> > > Does anyone obviously know what the issue is? (Maybe
> > > https://github.com/ceph/ceph/pull/26448/files ?)
> > > Or is it some temporary problem while we still have v13.2.6 and
> > > v13.2.7 osds running concurrently?
> > >
> > > Thanks!
> > >
> > > Dan
> > >
> > > [1]
> > >
> > > 2019-12-03 15:53:51.817 7ff3a3d39700 -1 osd.1384 2758889
> > > build_incremental_map_msg missing incremental map 2758889
> > > 2019-12-03 15:53:51.817 7ff3a453a700 -1 osd.1384 2758889
> > > build_incremental_map_msg missing incremental map 2758889
> > > 2019-12-03 15:53:51.817 7ff3a453a700 -1 osd.1384 2758889
> > > build_incremental_map_msg unable to load latest map 2758889
> > > 2019-12-03 15:53:51.822 7ff3a453a700 -1 *** Caught signal (Aborted) **
> > >  in thread 7ff3a453a700 thread_name:tp_osd_tp
> > >
> > >  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic 
> > > (stable)
> > >  1: (()+0xf5f0) [0x7ff3c620b5f0]
> > >  2: (gsignal()+0x37) [0x7ff3c522b337]
> > >  3: (abort()+0x148) [0x7ff3c522ca28]
> > >  4: (OSDService::build_incremental_map_msg(unsigned int, unsigned int,
> > > OSDSuperblock&)+0x767) [0x555d60e8d797]
> > >  5: (OSDService::send_incremental_map(unsigned int, Connection*,
> > > std::shared_ptr&)+0x39e) [0x555d60e8dbee]
> > >  6: (OSDService::share_map_peer(int, Connection*,
> > > std::shared_ptr)+0x159) [0x555d60e8eda9]
> > >  7: (OSDService::send_message_osd_cluster(int, Message*, unsigned
> > > int)+0x1a5) [0x555d60e8f085]
> > >  8: (ReplicatedBackend::issue_op(hobject_t const&, eversion_t const&,
> > > unsigned long, osd_reqid_t, eversion_t, eversion_t, hobject_t,
> > > hobject_t, std::vector
> > > > const&, boost::optional&,
> > > ReplicatedBackend::InProgressOp*, ObjectStore::Transaction&)+0x452)
> > > [0x555d6116e522]
> > >  9: (ReplicatedBackend::submit_transaction(hobject_t const&,
> > > object_stat_sum_t const&, eversion_t const&,
> > > std::unique_ptr >&&,
> > > eversion_t const&, eversion_t const&, std::vector > > std::allocator > const&,
> > > boost::optional&, Context*, unsigned long,
> > > osd_reqid_t, boost::intrusive_ptr)+0x6f5) [0x555d6117ed85]
> > >  10: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> > > PrimaryLogPG::OpContext*)+0xd62) [0x555d60ff5142]
> > >  11: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xf12)
> > > [0x555d61035902]
> > >  12: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3679)
> > > [0x555d610397a9]
> > >  13: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
> > > ThreadPool::TPHandle&)+0xc99) [0x555d6103d869]
> > >  14: (OSD::dequeue_op(boost::intrusive_ptr,
> > > boost::intrusive_ptr, ThreadPool::TPHandle&)+0x1b7)
> > > [0x555d60e8e8a7]
> > >  15: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&,
> > > ThreadPool::TPHandle&)+0x62) [0x555d611144c2]
> > >  16: (OSD::ShardedOpWQ::_process(unsigned int,
> > > ceph::heartbeat_handle_d*)+0x592) [0x555d60eb25f2]
> > >  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d3)
> > > [0x7ff3c929f5b3]
> > >  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7ff3c92a01a0]
> > >  19: (()+0x7e65) [0x7ff3c6203e65]
> > >  20: (clone()+0x6d) [0x7ff3c52f388d]
> > >  NOTE: a copy of the executable, or `objdump -rdS ` is
> > > needed to interpret this.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possibly a bug on rocksdb

2019-08-12 Thread Neha Ojha
Hi Samuel,

You can use https://tracker.ceph.com/issues/41211 to provide the
information that Brad requested, along with debug_osd=20, using
debug_rocksdb=20 and debug_bluestore=20 might be useful.

Thanks,
Neha



On Sun, Aug 11, 2019 at 4:18 PM Brad Hubbard  wrote:
>
> Could you create a tracker for this?
>
> Also, if you can reproduce this could you gather a log with
> debug_osd=20 ? That should show us the superblock it was trying to
> decode as well as additional details.
>
> On Mon, Aug 12, 2019 at 6:29 AM huxia...@horebdata.cn
>  wrote:
> >
> > Dear folks,
> >
> > I had an OSD down, not because of a bad disk, but most likely a bug hit on 
> > Rockdb. Any one had similar issue?
> >
> > I am using Luminous 12.2.12 version. Log attached below
> >
> > thanks,
> > Samuel
> >
> > **
> > [root@horeb72 ceph]# head -400 ceph-osd.4.log
> > 2019-08-11 07:30:02.186519 7f69bd020700  0 -- 192.168.10.72:6805/5915 >> 
> > 192.168.10.73:6801/4096 conn(0x56549cfc0800 :6805 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
> > accept connect_seq 15 vs existing csq=15 existing_state=STATE_STANDBY
> > 2019-08-11 07:30:02.186871 7f69bd020700  0 -- 192.168.10.72:6805/5915 >> 
> > 192.168.10.73:6801/4096 conn(0x56549cfc0800 :6805 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
> > accept connect_seq 16 vs existing csq=15 existing_state=STATE_STANDBY
> > 2019-08-11 07:30:02.242291 7f69bc81f700  0 -- 192.168.10.72:6805/5915 >> 
> > 192.168.10.71:6805/5046 conn(0x5654b93ed000 :6805 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
> > accept connect_seq 15 vs existing csq=15 existing_state=STATE_STANDBY
> > 2019-08-11 07:30:02.242554 7f69bc81f700  0 -- 192.168.10.72:6805/5915 >> 
> > 192.168.10.71:6805/5046 conn(0x5654b93ed000 :6805 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
> > accept connect_seq 16 vs existing csq=15 existing_state=STATE_STANDBY
> > 2019-08-11 07:30:02.260295 7f69bc81f700  0 -- 192.168.10.72:6805/5915 >> 
> > 192.168.10.73:6806/4864 conn(0x56544de16800 :6805 
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
> > accept connect_seq 15 vs existing csq=15 
> > existing_state=STATE_CONNECTING_WAIT_CONNECT_REPLY
> > 2019-08-11 17:11:01.968247 7ff4822f1d80 -1 WARNING: the following dangerous 
> > and experimental features are enabled: bluestore,rocksdb
> > 2019-08-11 17:11:01.968333 7ff4822f1d80  0 ceph version 12.2.12 
> > (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process 
> > ceph-osd, pid 1048682
> > 2019-08-11 17:11:01.970611 7ff4822f1d80  0 pidfile_write: ignore empty 
> > --pid-file
> > 2019-08-11 17:11:01.991542 7ff4822f1d80 -1 WARNING: the following dangerous 
> > and experimental features are enabled: bluestore,rocksdb
> > 2019-08-11 17:11:01.997597 7ff4822f1d80  0 load: jerasure load: lrc load: 
> > isa
> > 2019-08-11 17:11:01.997710 7ff4822f1d80  1 bdev create path 
> > /var/lib/ceph/osd/ceph-4/block type kernel
> > 2019-08-11 17:11:01.997723 7ff4822f1d80  1 bdev(0x564774656c00 
> > /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
> > 2019-08-11 17:11:01.998127 7ff4822f1d80  1 bdev(0x564774656c00 
> > /var/lib/ceph/osd/ceph-4/block) open size 858887553024 (0xc7f9b0, 
> > 800GiB) block_size 4096 (4KiB) non-rotational
> > 2019-08-11 17:11:01.998231 7ff4822f1d80  1 bdev(0x564774656c00 
> > /var/lib/ceph/osd/ceph-4/block) close
> > 2019-08-11 17:11:02.265144 7ff4822f1d80  1 bdev create path 
> > /var/lib/ceph/osd/ceph-4/block type kernel
> > 2019-08-11 17:11:02.265177 7ff4822f1d80  1 bdev(0x564774658a00 
> > /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
> > 2019-08-11 17:11:02.265695 7ff4822f1d80  1 bdev(0x564774658a00 
> > /var/lib/ceph/osd/ceph-4/block) open size 858887553024 (0xc7f9b0, 
> > 800GiB) block_size 4096 (4KiB) non-rotational
> > 2019-08-11 17:11:02.266233 7ff4822f1d80  1 bdev create path 
> > /var/lib/ceph/osd/ceph-4/block.db type kernel
> > 2019-08-11 17:11:02.266256 7ff4822f1d80  1 bdev(0x564774589a00 
> > /var/lib/ceph/osd/ceph-4/block.db) open path 
> > /var/lib/ceph/osd/ceph-4/block.db
> > 2019-08-11 17:11:02.266812 7ff4822f1d80  1 bdev(0x564774589a00 
> > /var/lib/ceph/osd/ceph-4/block.db) open size 2759360 (0x6fc20, 
> > 27.9GiB) block_size 4096 (4KiB) non-rotational
> > 2019-08-11 17:11:02.266998 7ff4822f1d80  1 bdev create path 
> > /var/lib/ceph/osd/ceph-4/block type kernel
> > 2019-08-11 17:11:02.267015 7ff4822f1d80  1 bdev(0x564774659a00 
> > /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
> > 2019-08-11 17:11:02.267412 7ff4822f1d80  1 bdev(0x564774659a00 
> > /var/lib/ceph/osd/ceph-4/block) open size 858887553024 (0xc7f9b0, 
> > 800GiB) block_size 4096 (4KiB) non-rotational
> > 2019-08-11 17:11:02.298355 7ff4822f1d80  0  set 

[ceph-users] mutable health warnings

2019-06-13 Thread Neha Ojha
Hi everyone,

There has been some interest in a feature that helps users to mute
health warnings. There is a trello card[1] associated with it and
we've had some discussion[2] in the past in a CDM about it. In
general, we want to understand a few things:

1. what is the level of interest in this feature
2. for how long should we mute these warnings - should the period be
decided by us or the user
3. possible misuse of this feature and negative impacts of muting some warnings

Let us know what you think.

[1] https://trello.com/c/vINMkfTf/358-mute-health-warnings
[2] https://pad.ceph.com/p/cephalocon-usability-brainstorming

Thanks,
Neha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some ceph config parameters default values

2019-02-18 Thread Neha Ojha
On Sat, Feb 16, 2019 at 12:44 PM Oliver Freyermuth
 wrote:
>
> Dear Cephalopodians,
>
> in some recent threads on this list, I have read about the "knobs":
>
>   pglog_hardlimit (false by default, available at least with 12.2.11 and 
> 13.2.5)
>   bdev_enable_discard (false by default, advanced option, no description)
>   bdev_async_discard  (false by default, advanced option, no description)
>
> I am wondering about the defaults for these settings, and why these settings 
> seem mostly undocumented.
>
> It seems to me that on SSD / NVMe devices, you would always want to enable 
> discard for significantly increased lifetime,
> or run fstrim regularly (which you can't with bluestore since it's a 
> filesystem of its own). From personal experience,
> I have already lost two eMMC devices in Android phones early due to trimming 
> not working fine.
> Of course, on first generation SSD devices, "discard" may lead to data loss 
> (which for most devices has been fixed with firmware updates, though).
>
> I would presume that async-discard is also advantageous, since it seems to 
> queue the discards and work on these in bulk later
> instead of issuing them immediately (that's what I grasp from the code).
>
> Additionally, it's unclear to me whether the bdev-discard settings also 
> affect WAL/DB devices, which are very commonly SSD/NVMe devices
> in the Bluestore age.
>
> Concerning the pglog_hardlimit, I read on that list that it's safe and limits 
> maximum memory consumption especially for backfills / during recovery.
> So it "sounds" like this is also something that could be on by default. But 
> maybe that is not the case yet to allow downgrades after failed upgrades?

This flag will be on by default in nautilus and that's not the case in
luminous and mimic to a handle upgrades.
>
>
>
> So in the end, my question is:
> Is there a reason why these values are not on by default, and are also not 
> really mentioned in the documentation?
> Are they just "not ready yet" / unsafe to be on by default, or are the 
> defaults just like that because they have always been at this value,
> and defaults will change with the next major release (nautilus)?

We can certainly make this more explicit in our documentation.

Thanks,
Neha

>
> Cheers,
> Oliver
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-02-13 Thread Neha Ojha
On Wed, Feb 13, 2019 at 12:49 AM Siegfried Höllrigl <
siegfried.hoellr...@xidras.com> wrote:

> Hi !
>
> We have now successfully upgraded (from 12.2.10) to 12.2.11.
>
Seems to be quite stable. (Using RBD, CephFS and RadosGW)
>
Great!

>
> Most of our OSDs are still on Filestore.
>
> Should we set the "pglog_hardlimit" (as it mus not be unset anymore) ?
>
If you wish to keep the length of your pg logs within bounds, you should
set this flag.

>
> What exactly will this limit ?


> This option will allow you to put a hard cap on the number of pg log
entries and thus memory consumed by the pg log, even during recovery and
backfill.


> Are there any risks ?
>
No

>
> Any pre-checks recommended ?
>
Since you have successfully upgraded to 12.2.11, you don't need anything
else.

>
> Br,
>
>
> Thanks,
Neha

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-02-07 Thread Neha Ojha
On Thu, Feb 7, 2019 at 10:50 AM Dan van der Ster  wrote:
>
> On Fri, Feb 1, 2019 at 10:18 PM Neha Ojha  wrote:
> >
> > On Fri, Feb 1, 2019 at 1:09 PM Robert Sander
> >  wrote:
> > >
> > > Am 01.02.19 um 19:06 schrieb Neha Ojha:
> > >
> > > > If you would have hit the bug, you should have seen failures like
> > > > https://tracker.ceph.com/issues/36686.
> > > > Yes, pglog_hardlimit is off by default in 12.2.11. Since you are
> > > > running 12.2.9(which has the patch that allows you to limit the length
> > > > of the pg log), you could follow the steps and upgrade to 12.2.11 and
> > > > set this flag.
> > >
> > > The question is: If I am now on 12.2.9 and see no issues, do I have to
> > > set this flag after upgrading to 12.2.11?
> > You don't have to.
> > This flag lets you restrict the length of your pg logs, so if you do
> > not want to use this functionality, no need to set this.
>
> I guess that a 12.2.11 cluster with pglog_hardlimit enabled cannot
> upgrade to mimic until 13.2.5 is released?

You can upgrade to a mimic version < 13.2.5, but won't have the
feature in 13.2.1 and for 13.2.2-13.2.4 might run the risk of hitting
http://tracker.ceph.com/issues/36686 (under rare conditions).
13.2.5 will have the feature with the upgrade fix and should be
released soon, so it is best to upgrade to 13.2.5.



>
>
> -- Dan
>
>
> >
> > >
> > > Regards
> > > --
> > > Robert Sander
> > > Heinlein Support GmbH
> > > Schwedter Str. 8/9b, 10119 Berlin
> > >
> > > http://www.heinlein-support.de
> > >
> > > Tel: 030 / 405051-43
> > > Fax: 030 / 405051-19
> > >
> > > Zwangsangaben lt. §35a GmbHG:
> > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-02-01 Thread Neha Ojha
On Fri, Feb 1, 2019 at 1:09 PM Robert Sander
 wrote:
>
> Am 01.02.19 um 19:06 schrieb Neha Ojha:
>
> > If you would have hit the bug, you should have seen failures like
> > https://tracker.ceph.com/issues/36686.
> > Yes, pglog_hardlimit is off by default in 12.2.11. Since you are
> > running 12.2.9(which has the patch that allows you to limit the length
> > of the pg log), you could follow the steps and upgrade to 12.2.11 and
> > set this flag.
>
> The question is: If I am now on 12.2.9 and see no issues, do I have to
> set this flag after upgrading to 12.2.11?
You don't have to.
This flag lets you restrict the length of your pg logs, so if you do
not want to use this functionality, no need to set this.

>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> http://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.11 Luminous released

2019-02-01 Thread Neha Ojha
On Fri, Feb 1, 2019 at 1:11 AM Mark Schouten  wrote:
>
> On Fri, Feb 01, 2019 at 08:44:51AM +0100, Abhishek wrote:
> > * This release fixes the pg log hard limit bug that was introduced in
> >   12.2.9, https://tracker.ceph.com/issues/36686.  A flag called
> >   `pglog_hardlimit` has been introduced, which is off by default. Enabling
> >   this flag will limit the length of the pg log.  In order to enable
> >   that, the flag must be set by running `ceph osd set pglog_hardlimit`
> >   after completely upgrading to 12.2.11. Once the cluster has this flag
> >   set, the length of the pg log will be capped by a hard limit. Once set,
> >   this flag *must not* be unset anymore.
>
> I'm confused about this. I have a cluster runnine 12.2.9, but should a
> just upgrade and be done with it, or should I execute the steps
> mentioned above? The pglog_hardlimit is off by default, which suggests I
> should not do anything. But since it is related to this bug which I may
> or may not be hitting, I'm not sure.

If you would have hit the bug, you should have seen failures like
https://tracker.ceph.com/issues/36686.
Yes, pglog_hardlimit is off by default in 12.2.11. Since you are
running 12.2.9(which has the patch that allows you to limit the length
of the pg log), you could follow the steps and upgrade to 12.2.11 and
set this flag.

>
> > * There have been fixes to RGW dynamic and manual resharding, which no
> > longer
> >   leaves behind stale bucket instances to be removed manually. For finding
> > and
> >   cleaning up older instances from a reshard a radosgw-admin command
> > `reshard
> >   stale-instances list` and `reshard stale-instances rm` should do the
> > necessary
> >   cleanup.
>
>
> Very happy about this! It will cleanup my cluster for sure! This also closes
> https://tracker.ceph.com/issues/23651 I think?
>
> --
> Mark Schouten  | Tuxis Internet Engineering
> KvK: 61527076  | http://www.tuxis.nl/
> T: 0318 200208 | i...@tuxis.nl
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Docubetter: New Schedule

2019-01-25 Thread Neha Ojha
Hi All,

Starting February, Docubetter meetings will be held twice a month, on
the second and fourth Wednesday of each month. We will alternate
meeting times to ensure that all time zones have the opportunity to
participate.

Second Wednesday: 12:30 ET (starting February 13)
Fourth Wednesday: 21:00 ET
Meeting: https://bluejeans.com/908675367?src=calendarLink

We will update the Ceph Community calendar to reflect the same.
Our next topic of discussion is going to be search engine optimization
for Ceph documentation.

Thanks,
Neha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] logging of cluster status (Jewel vs Luminous and later)

2019-01-24 Thread Neha Ojha
Hi Matthew,

Some of the logging was intentionally removed because it used to
clutter up the logs. However, we are bringing back some of the useful
stuff back and have a tracker ticket
https://tracker.ceph.com/issues/37886 open for it.

Thanks,
Neha


On Thu, Jan 24, 2019 at 12:13 PM Stefan Kooman  wrote:
>
> Quoting Matthew Vernon (m...@sanger.ac.uk):
> > Hi,
> >
> > On our Jewel clusters, the mons keep a log of the cluster status e.g.
> >
> > 2019-01-24 14:00:00.028457 7f7a17bef700  0 log_channel(cluster) log [INF] :
> > HEALTH_OK
> > 2019-01-24 14:00:00.646719 7f7a46423700  0 log_channel(cluster) log [INF] :
> > pgmap v66631404: 173696 pgs: 10 active+clean+scrubbing+deep, 173686
> > active+clean; 2271 TB data, 6819 TB used, 9875 TB / 16695 TB avail; 1313
> > MB/s rd, 236 MB/s wr, 12921 op/s
> >
> > This is sometimes useful after a problem, to see when thing started going
> > wrong (which can be helpful for incident response and analysis) and so on.
> > There doesn't appear to be any such logging in Luminous, either by mons or
> > mgrs. What am I missing?
>
> Our mons keep a log in /var/log/ceph/ceph.log (running luminous 12.2.8).
> Is that log present on your systems?
>
> Gr. Stefan
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-maintainers] v13.2.4 Mimic released

2019-01-08 Thread Neha Ojha
When upgrading from 13.2.1 to 13.2.4, you should be careful about
http://tracker.ceph.com/issues/36686. It might be worth considering
the workaround mentioned here:
https://github.com/ceph/ceph/blob/master/doc/releases/mimic.rst#v1322-mimic.

Thanks,
Neha

On Tue, Jan 8, 2019 at 9:42 AM Patrick Donnelly  wrote:
>
> On Mon, Jan 7, 2019 at 7:10 AM Alexandre DERUMIER  wrote:
> >
> > Hi,
> >
> > >>* Ceph v13.2.2 includes a wrong backport, which may cause mds to go into
> > >>'damaged' state when upgrading Ceph cluster from previous version.
> > >>The bug is fixed in v13.2.3. If you are already running v13.2.2,
> > >>upgrading to v13.2.3 does not require special action.
> >
> > Any special action for upgrading from 13.2.1 ?
>
> No special actions for CephFS are required for the upgrade.
>
> --
> Patrick Donnelly
> ___
> Ceph-maintainers mailing list
> ceph-maintain...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-maintainers-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Neha Ojha
On Thu, Nov 8, 2018 at 12:16 PM, Ricardo J. Barberis
 wrote:
> Hi Neha, thank you for the info.
>
> I'd like to clarify that we didn't actually upgrade to 12.2.9, we just
> installed 4 more OSD servers and those got 12.2.9, so we have a mixture
> of 12.2.9 and 12.2.8.
>
> Should we:
> - keep as is and wait for 12.2.10+ before proceeding?
> - downgrade our newest OSDs from 12.2.9 to 12.2.8?
> - upgrade everithing to 12.2.9?

I think the best way to proceed is to upgrade the newest OSDs and
restart all OSDs with 12.2.9, or only upgrade when all PGs are
active+clean, if restart is not an option.

>
>
> Our current setup (we still have disks to add as OSDs):
>
> # ceph versions
> {
> "mon": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
> luminous (stable)": 5
> },
> "mgr": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
> luminous (stable)": 5
> },
> "osd": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
> luminous (stable)": 75,
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) 
> luminous (stable)": 16
> },
> "mds": {},
> "overall": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) 
> luminous (stable)": 85,
> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) 
> luminous (stable)": 16
> }
> }
>
>
> El Miércoles 07/11/2018 a las 20:28, Neha Ojha escribió:
>> For those on 12.2.9 -
>>
>> If you have successfully upgraded to 12.2.9, there is no reason for
>> you to downgrade, since the bug appears while upgrading to 12.2.9 -
>> http://tracker.ceph.com/issues/36686. We suggest you to not upgrade to
>> 12.2.10, which reverts the feature that caused this bug. Also, 12.2.10
>> does not have much in-store except for the revert. We are working on a
>> clean upgrade path for this feature and will announce it when it is
>> ready.
>>
>> For those who haven't upgraded to 12.2.9 -
>>
>> Please avoid this release and wait for 12.2.10.
>>
>> More information here:
>> https://www.spinics.net/lists/ceph-devel/msg43509.html,
>> https://www.spinics.net/lists/ceph-users/msg49112.html
>>
>> Again, sorry about the inconvenience and hope this helps!
>>
>> Thanks,
>> Neha
>>
>> On Wed, Nov 7, 2018 at 2:38 PM, Ricardo J. Barberis
>>
>>  wrote:
>> > El Miércoles 07/11/2018 a las 10:58, Simon Ironside escribió:
>> >> On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> >> >> I wonder if there is any release announcement for ceph 12.2.9 that I
>> >> >> missed. I just found the new packages on download.ceph.com, is this
>> >> >> an official release?
>> >> >
>> >> > This is because 12.2.9 have a several bugs. You should avoid to use
>> >> > this release and wait for 12.2.10
>> >>
>> >> Argh! What's it doing in the repos then?? I've just upgraded to it!
>> >> What are the bugs? Is there a thread about them?
>> >>
>> >> Simon
>> >
>> > Is it safe to downgrade from 12.2.9 to 12.2.8?
>> >
>> > Or should we just wait for 12.2.10?
>> >
>> > Thanks,
> --
> Ricardo J. Barberis
> Usuario Linux Nº 250625: http://counter.li.org/
> Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
> Senior SysAdmin / IT Architect - www.DonWeb.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Neha Ojha
For those on 12.2.9 -

If you have successfully upgraded to 12.2.9, there is no reason for
you to downgrade, since the bug appears while upgrading to 12.2.9 -
http://tracker.ceph.com/issues/36686. We suggest you to not upgrade to
12.2.10, which reverts the feature that caused this bug. Also, 12.2.10
does not have much in-store except for the revert. We are working on a
clean upgrade path for this feature and will announce it when it is
ready.

For those who haven't upgraded to 12.2.9 -

Please avoid this release and wait for 12.2.10.

More information here:
https://www.spinics.net/lists/ceph-devel/msg43509.html,
https://www.spinics.net/lists/ceph-users/msg49112.html

Again, sorry about the inconvenience and hope this helps!

Thanks,
Neha

On Wed, Nov 7, 2018 at 2:38 PM, Ricardo J. Barberis
 wrote:
> El Miércoles 07/11/2018 a las 10:58, Simon Ironside escribió:
>> On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> >> I wonder if there is any release announcement for ceph 12.2.9 that I
>> >> missed. I just found the new packages on download.ceph.com, is this an
>> >> official release?
>> >
>> > This is because 12.2.9 have a several bugs. You should avoid to use this
>> > release and wait for 12.2.10
>>
>> Argh! What's it doing in the repos then?? I've just upgraded to it!
>> What are the bugs? Is there a thread about them?
>>
>> Simon
>
> Is it safe to downgrade from 12.2.9 to 12.2.8?
>
> Or should we just wait for 12.2.10?
>
> Thanks,
> --
> Ricardo J. Barberis
> Usuario Linux Nº 250625: http://counter.li.org/
> Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
> Senior SysAdmin / IT Architect - www.DonWeb.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: pg log hard limit upgrade bug

2018-11-05 Thread Neha Ojha
+ ceph-users


-- Forwarded message --
From: Neha Ojha 
Date: Mon, Nov 5, 2018 at 9:50 AM
Subject: pg log hard limit upgrade bug
To: Ceph Development 
Cc: Nathan Cutler , Yuri Weinstein
, Josh Durgin 


Hi All,

We have discovered an issue with the pg log hard limit
patches(https://github.com/ceph/ceph/pull/23211,
https://github.com/ceph/ceph/pull/24308), where a partial upgrade
during backfill, can cause the osds on the previous version, to fail
with "assert(trim_to <= info.last_complete)". Full description of the
bug is here: http://tracker.ceph.com/issues/36686.

These changes are in 13.2.2 and 12.2.9, and a workaround for users is
to upgrade and restart all OSDs to a version with the pg hard limit,
or only upgrade when all PGs are active+clean.

Until we add capability to have the pg log hard limit work smoothly in
the upgrade case, we will be reverting these changes,
https://github.com/ceph/ceph/pull/24903, and releasing 12.2.10 as
early as possible.

We are also reverting https://github.com/ceph/ceph/pull/24902, which
is a low impact bug, but might causes issues in the field.

Sorry for any inconvenience caused due to this.

Thanks,
Neha
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-05 Thread Neha Ojha
Hi JJ,

In the case, the condition olog.head >= log.tail is not true,
therefore it crashes. Could you please open a tracker
issue(https://tracker.ceph.com/) and attach the osd logs and the pg
dump output?

Thanks,
Neha

On Thu, Oct 4, 2018 at 9:29 AM, Jonas Jelten  wrote:
> Hello!
>
> Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some 
> ODSs crash when they start peering.
> I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2).
>
> The problem was induced by when RAM was filled up and ODS processes then 
> crashed because of memory allocation failures.
>
> No weird commands (e.g. force_create_pg) were used on this cluster and it was 
> set up with 13.2.1 initially.
> The affected pool seems to be a replicated pool with size=3 and min_size=2 
> (which haven't been changed).
>
> Crash log of osd.4 (only the crashed thread):
>
> 99424: -1577> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
> 99425: -1576> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 
> queue_want_up_thru want 1433 <= queued 1433, currently 1426
> 99427: -1574> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> 3.8 to_process <> waiting <>
> waiting_peering {}
> 99428: -1573> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
> 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 
> sent:1433 3.8( v 866'122691 (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) queued
> 99430: -1571> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> 3.8 to_process  PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 
> notify: (query:1433 sent:1433 3.8( v
> 866'122691 (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 
> lis/c 1401/859 les/c/f 1402/860/0
> 1433/1433/1433)) features: 0x3ffddff8ffa4fffb ([859,1432] 
> intervals=([1213,1215] acting 0,2),([1308,1311] acting
> 4,10),([1401,1403] acting 2,10),([1426,1428] acting 2,4)) +create_info) prio 
> 255 cost 10 e1433)> waiting <>
> waiting_peering {}
> 99433: -1568> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
> 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 
> sent:1433 3.8( v 866'122691 (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) pg 
> 0x56013bc87400
> 99437: -1564> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
> do_peering_event: epoch_sent: 1433 epoch_requested:
> 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 
> (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info
> 99440: -1561> 2018-10-04 13:40:11.024 7f3838417700  7 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
> state: handle_pg_notify from osd.2
> 99444: -1557> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}]  got dup 
> osd.2 info 3.8( v 866'122691
> (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 
> 1401/859 les/c/f 1402/860/0 1433/1433/1433),
> identical to ours
> 99445: -1556> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
> 99446: -1555> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 
> queue_want_up_thru want 1433 <= queued 1433, currently 1426
> 99448: -1553> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> 3.8 to_process <> waiting <>
> waiting_peering {}
> 99450: -1551> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
> 1433 epoch_requested: 1433 MLogRec from 2 

Re: [ceph-users] Fwd: What's the fastest way to try out object classes?

2017-10-30 Thread Neha Ojha
Hi Zheyuan,

You can build Ceph from source and run make install. This should place
objclass.h in /include/rados/ .

Thanks,
Neha

On Mon, Oct 30, 2017 at 2:18 PM, Zheyuan Chen  wrote:
>
> -- Forwarded message --
> From: Zheyuan Chen 
> Date: Mon, Oct 30, 2017 at 2:16 PM
> Subject: What's the fastest way to try out object classes?
> To: ceph-users@lists.ceph.com
>
>
> Hi All,
>
> I'd like to try out object classes.
> http://docs.ceph.com/docs/master/rados/api/objclass-sdk/
> I used this docker image: https://hub.docker.com/r/ceph/demo/, but found the
> object class sdk is not included (couldn't find
> /usr/local/include/rados/objectclass.h) even after I installed
> librados-devel manually.
>
> Do I have to build from the source code if I want to have objectclass.h?
> What is the fastest way to set up the environment if I want to try out
> object classes?
>
> Thank you very much!
> Zheyuan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com