Re: [ceph-users] New best practices for osds???

2019-07-26 Thread Anthony D'Atri
> This is worse than I feared, but very much in the realm of concerns I 
> had with using single-disk RAID0 setups.? Thank you very much for 
> posting your experience!? My money would still be on using *high write 
> endurance* NVMes for DB/WAL and whatever I could afford for block.?


yw.  Of course there are all manner of use-cases and constraints, so others 
have different experiences.  Perhaps with the freedom to not use a certain HBA 
vendor things would be somewhat better but in said past life the practice cost 
hundreds of thousands of dollars.

I personally have a low tolerance for fuss, and management / mapping of WAL/DB 
devices still seems like a lot of fuss especially when drives fail or have to 
be replaced for other reasons.

For RBD clusters/pools at least I really enjoy not having to mess with multiple 
devices; I’d rather run colo with SATA SSDs than spinners with NVMe WAL+DB. 

- aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should I use "rgw s3 auth order = local, external"

2019-07-26 Thread Christian
Hi,

I found this (rgw s3 auth order = local, external) on the web:
https://opendev.org/openstack/charm-ceph-radosgw/commit/3e54b570b1124354704bd5c35c93dce6d260a479

Which is seemingly exactly what I need for circumventing higher
latency when switching on keystone authentication. In fact it even
improves performance slightly without enabling keystone authentication
which strikes me as odd. Which leads me to the conclusion that this is
disabling some mechanism that usually takes time.

I could not find any official documentation for this option.
Does anyone have any experience with this?

Regards,
Christian

PS: Sorry for the resend, I used the wrong sending address.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
On 26.07.19 15:03, Stefan Kooman wrote:
> Quoting Peter Sabaini (pe...@sabaini.at):
>> What kind of commit/apply latency increases have you seen when adding a
>> large numbers of OSDs? I'm nervous how sensitive workloads might react
>> here, esp. with spinners.
> 
> You mean when there is backfilling going on? Instead of doing "a big

Yes exactly. I usually tune down max rebalance and max recovery active
knobs to lessen impact but still I found the additional write load can
substantially increase i/o latencies. Not all workloads like this.

> bang" you can also use Dan van der Ster's trick with upmap balancer:
> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> 
> See
> https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

Thanks, thats interesting -- though I wish it weren't necessary.


cheers,
peter.


> So you would still have norebalance / nobackfill / norecover and ceph
> balancer off. Then you run the script as many times as necessary to get
> "HEALTH_OK" again (on clusters other than nautilus) and there a no more
> PGs remapped. Unset the flags and enable the ceph balancer ... now the
> balancer will slowly move PGs to the new OSDs.
> 
> We've used this trick to increase the number of PGs on a pool, and will
> use this to expand the cluster in the near future.
> 
> This only works if you can use the balancer in "upmap" mode. Note that
> using upmap requires that all clients be Luminous or newer. If you are
> using cephfs kernel client it might report as not compatible (jewel) but
> recent linux distributions work well (Ubuntu 18.04 / CentOS 7).
> 
> Gr. Stefan
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading and lost OSDs

2019-07-26 Thread Alfredo Deza
On Thu, Jul 25, 2019 at 7:00 PM Bob R  wrote:

> I would try 'mv /etc/ceph/osd{,.old}' then run 'ceph-volume  simple scan'
> again. We had some problems upgrading due to OSDs (perhaps initially
> installed as firefly?) missing the 'type' attribute and iirc the
> 'ceph-volume simple scan' command refused to overwrite existing json files
> after I made some changes to ceph-volume.
>

Ooof. I could swear that this issue was fixed already and it took me a
while to find out that it wasn't at all. We saw this a few months ago in
our Long Running Cluster used for dogfooding.

I've created a ticket to track this work at
http://tracker.ceph.com/issues/40987

But what you've done is exactly why we chose to persist the JSON files in
/etc/ceph/osd/*.json, so that an admin could tell if anything is missing
(or incorrect like in this case) and make the changes needed.



> Bob
>
> On Wed, Jul 24, 2019 at 1:24 PM Alfredo Deza  wrote:
>
>>
>>
>> On Wed, Jul 24, 2019 at 4:15 PM Peter Eisch 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> I appreciate the insistency that the directions be followed.  I wholly
>>> agree.  The only liberty I took was to do a ‘yum update’ instead of just
>>> ‘yum update ceph-osd’ and then reboot.  (Also my MDS runs on the MON hosts,
>>> so it got update a step early.)
>>>
>>>
>>>
>>> As for the logs:
>>>
>>>
>>>
>>> [2019-07-24 15:07:22,713][ceph_volume.main][INFO  ] Running command:
>>> ceph-volume  simple scan
>>>
>>> [2019-07-24 15:07:22,714][ceph_volume.process][INFO  ] Running command:
>>> /bin/systemctl show --no-pager --property=Id --state=running ceph-osd@*
>>>
>>> [2019-07-24 15:07:27,574][ceph_volume.main][INFO  ] Running command:
>>> ceph-volume  simple activate --all
>>>
>>> [2019-07-24 15:07:27,575][ceph_volume.devices.simple.activate][INFO  ]
>>> activating OSD specified in
>>> /etc/ceph/osd/0-93fb5f2f-0273-4c87-a718-886d7e6db983.json
>>>
>>> [2019-07-24 15:07:27,576][ceph_volume.devices.simple.activate][ERROR ]
>>> Required devices (block and data) not present for bluestore
>>>
>>> [2019-07-24 15:07:27,576][ceph_volume.devices.simple.activate][ERROR ]
>>> bluestore devices found: [u'data']
>>>
>>> [2019-07-24 15:07:27,576][ceph_volume][ERROR ] exception caught by
>>> decorator
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py",
>>> line 59, in newfunc
>>>
>>> return f(*a, **kw)
>>>
>>>   File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148,
>>> in main
>>>
>>> terminal.dispatch(self.mapper, subcommand_args)
>>>
>>>   File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line
>>> 182, in dispatch
>>>
>>> instance.main()
>>>
>>>   File
>>> "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/main.py", line
>>> 33, in main
>>>
>>> terminal.dispatch(self.mapper, self.argv)
>>>
>>>   File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line
>>> 182, in dispatch
>>>
>>> instance.main()
>>>
>>>   File
>>> "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/activate.py",
>>> line 272, in main
>>>
>>> self.activate(args)
>>>
>>>   File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py",
>>> line 16, in is_root
>>>
>>> return func(*a, **kw)
>>>
>>>   File
>>> "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/activate.py",
>>> line 131, in activate
>>>
>>> self.validate_devices(osd_metadata)
>>>
>>>   File
>>> "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/activate.py",
>>> line 62, in validate_devices
>>>
>>> raise RuntimeError('Unable to activate bluestore OSD due to missing
>>> devices')
>>>
>>> RuntimeError: Unable to activate bluestore OSD due to missing devices
>>>
>>>
>>>
>>> (this is repeated for each of the 16 drives)
>>>
>>>
>>>
>>> Any other thoughts?  (I’ll delete/create the OSDs with ceph-deply
>>> otherwise.)
>>>
>>
>> Try using `ceph-volume simple scan --stdout` so that it doesn't persist
>> data onto /etc/ceph/osd/ and inspect that the JSON produced is capturing
>> all the necessary details for OSDs.
>>
>> Alternatively, I would look into the JSON files already produced in
>> /etc/ceph/osd/ and check if the details are correct. The `scan` sub-command
>> does a tremendous effort to cover all cases where ceph-disk
>> created an OSD (filestore, bluestore, dmcrypt, etc...) but it is possible
>> that it may be hitting a problem. This is why the tool made these JSON
>> files available, so that they could be inspected and corrected if anything.
>>
>> The details of the scan sub-command can be found at
>> http://docs.ceph.com/docs/master/ceph-volume/simple/scan/ and the JSON
>> structure is described in detail below at
>> http://docs.ceph.com/docs/master/ceph-volume/simple/scan/#json-contents
>>
>> In this particular case the tool is refusing to activate what seems to be
>> a bluestore OSD. Is it really a bluestore OSD? if so, then it can't find
>> where is the data partition. What does that partition

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
Yes, definitely enable standby-replay. I saw sub-second failovers with
standby-replay, but when I restarted the new rank 0 (previously 0-s)
while the standby was syncing up to become 0-s, the failover took
several minutes. This was with ~30GiB of cache.

On Fri, Jul 26, 2019 at 12:41 PM Burkhard Linke
 wrote:
>
> Hi,
>
>
> one particular interesting point in setups with a large number of active
> files/caps is the failover.
>
>
> If your MDS fails (assuming single MDS, multiple MDS with multiple
> active ranks behave in the same way for _each_ rank), the monitors will
> detect the failure and update the mds map. CephFS clients will be
> notified about the update, connect to the new MDS the rank has failed
> over to (hopefully within the connect timeout...). They will also
> re-request all their currently active caps from the MDS to allow it to
> recreate the state of the point in time before the failure.
>
>
> And this is were things can get "interesting". Assuming a cold standby
> MDS, the MDS will receive the information about all active files and
> capabilities assigned to the various client. It also has to _stat_ all
> these files during the rejoin phase. And if million of files have to
> stat'ed, this may take time, put a lot of pressure on the metadata and
> data pools, and might even lead to timeouts and subsequent failure or
> failover to another MDS.
>
>
> We had some problems with this in the past, but it became better and
> less failure prone with every ceph release (great work, ceph
> developers!). Our current setup has up to 15 million cached inodes and
> several million caps in the worst case (during nightly backup). The caps
> per client limit in luminous/nautilus? helps a lot with reducing the
> number of active files and caps.
>
> Prior to nautilus we configured a secondary MDS as standby-replay, which
> allows it to cache the same inodes that were active on the primary.
> During rejoin the stat call can be served from cache, which makes the
> failover a lot faster and less demanding for the ceph cluster itself. In
> nautilus the setup for standby-replay has moved from a daemon feature to
> a filesystem feature (one spare MDS becomes designated standby-replay
> for a rank). But there are also other caveats like not selecting one of
> these as failover for another rank.
>
>
> So if you want to test cephfs for your use case, I would highly
> recommend to test failover, too. Both a controlled failover and an
> unexpected one. You may also want to use multiple active MDS, but my
> experience with these setups is limited.
>
>
> Regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Burkhard Linke

Hi,


one particular interesting point in setups with a large number of active 
files/caps is the failover.



If your MDS fails (assuming single MDS, multiple MDS with multiple 
active ranks behave in the same way for _each_ rank), the monitors will 
detect the failure and update the mds map. CephFS clients will be 
notified about the update, connect to the new MDS the rank has failed 
over to (hopefully within the connect timeout...). They will also 
re-request all their currently active caps from the MDS to allow it to 
recreate the state of the point in time before the failure.



And this is were things can get "interesting". Assuming a cold standby 
MDS, the MDS will receive the information about all active files and 
capabilities assigned to the various client. It also has to _stat_ all 
these files during the rejoin phase. And if million of files have to 
stat'ed, this may take time, put a lot of pressure on the metadata and 
data pools, and might even lead to timeouts and subsequent failure or 
failover to another MDS.



We had some problems with this in the past, but it became better and 
less failure prone with every ceph release (great work, ceph 
developers!). Our current setup has up to 15 million cached inodes and 
several million caps in the worst case (during nightly backup). The caps 
per client limit in luminous/nautilus? helps a lot with reducing the 
number of active files and caps.


Prior to nautilus we configured a secondary MDS as standby-replay, which 
allows it to cache the same inodes that were active on the primary. 
During rejoin the stat call can be served from cache, which makes the 
failover a lot faster and less demanding for the ceph cluster itself. In 
nautilus the setup for standby-replay has moved from a daemon feature to 
a filesystem feature (one spare MDS becomes designated standby-replay 
for a rank). But there are also other caveats like not selecting one of 
these as failover for another rank.



So if you want to test cephfs for your use case, I would highly 
recommend to test failover, too. Both a controlled failover and an 
unexpected one. You may also want to use multiple active MDS, but my 
experience with these setups is limited.



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
Ok, great. Some numbers for you:
I have a filesystem of 50 million files, 5.4 TB.
The data pool is on HDD OSDs with Optane DB/WAL, size=3.
The metadata pool (Optane OSDs) has 17GiB "stored", 20GiB "used", at
size=3. 5.18M objects.
When doing parallel rsyncs, with ~14M inodes open, the MDS cache goes
to about 40GiB but it remains stable. MDS CPU usage goes to about 400%
(4 cores worth, spread across 6-8 processes). Hope you find this
useful.

On Fri, Jul 26, 2019 at 11:05 AM Stefan Kooman  wrote:
>
> Quoting Nathan Fish (lordci...@gmail.com):
> > MDS CPU load is proportional to metadata ops/second. MDS RAM cache is
> > proportional to # of files (including directories) in the working set.
> > Metadata pool size is proportional to total # of files, plus
> > everything in the RAM cache. I have seen that the metadata pool can
> > balloon 8x between being idle, and having every inode open by a
> > client.
> > The main thing I'd recommend is getting SSD OSDs to dedicate to the
> > metadata pools, and SSDs for the HDD OSD's DB/WAL. NVMe if you can. If
> > you put that much metadata on only HDDs, it's going to be slow.
>
> Only SSD for OSD data pool and NVMe for metadata pool, so that should be
> fine. Besides the initial loading of that many files / directories this
> workload shouldn't be any problem.
>
> Thanks for your feedback.
>
> Gr. Stefan
>
> --
> | BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Stefan Kooman
Quoting Nathan Fish (lordci...@gmail.com):
> MDS CPU load is proportional to metadata ops/second. MDS RAM cache is
> proportional to # of files (including directories) in the working set.
> Metadata pool size is proportional to total # of files, plus
> everything in the RAM cache. I have seen that the metadata pool can
> balloon 8x between being idle, and having every inode open by a
> client.
> The main thing I'd recommend is getting SSD OSDs to dedicate to the
> metadata pools, and SSDs for the HDD OSD's DB/WAL. NVMe if you can. If
> you put that much metadata on only HDDs, it's going to be slow.

Only SSD for OSD data pool and NVMe for metadata pool, so that should be
fine. Besides the initial loading of that many files / directories this
workload shouldn't be any problem.

Thanks for your feedback.

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
MDS CPU load is proportional to metadata ops/second. MDS RAM cache is
proportional to # of files (including directories) in the working set.
Metadata pool size is proportional to total # of files, plus
everything in the RAM cache. I have seen that the metadata pool can
balloon 8x between being idle, and having every inode open by a
client.
The main thing I'd recommend is getting SSD OSDs to dedicate to the
metadata pools, and SSDs for the HDD OSD's DB/WAL. NVMe if you can. If
you put that much metadata on only HDDs, it's going to be slow.



On Fri, Jul 26, 2019 at 5:11 AM Stefan Kooman  wrote:
>
> Hi List,
>
> We are planning to move a filesystem workload (currently nfs) to CephFS.
> It's around 29 TB. The unusual thing here is the amount of directories
> in use to host the files. In order to combat a "too many files in one
> directory" scenario a "let's make use of recursive directories" approach.
> Not ideal either. This workload is supposed to be moved to (Ceph) S3
> sometime in the future, but until then, it has to go to a shared
> filesystem ...
>
> So what is unusual about this? The directory layout looks like this
>
> /data/files/00/00/[0-8][0-9]/[0-9]/ from this point on there will be 7
> directories created to store 1 file.
>
> Total amount of directories in a file path is 14. There are around 150 M
> files in 400 M directories.
>
> The working set won't be big. Most files will just sit around and will
> not be touched. The active amount of files wil be a few thousand.
>
> We are wondering if this kind of directory structure is suitable for
> CephFS. Might the MDS get difficulties with keeping up that many inodes
> / dentries or doesn't it care at all?
>
> The amount of metadata overhead might be horrible, but we will test that
> out.
>
> Thanks,
>
> Stefan
>
>
> --
> | BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pools limit

2019-07-26 Thread M Ranga Swami Reddy
ceph using for RBD only

On Wed, Jul 24, 2019 at 12:55 PM Wido den Hollander  wrote:

>
>
> On 7/16/19 6:53 PM, M Ranga Swami Reddy wrote:
> > Thanks for your reply..
> > Here, new pool creations and pg auto scale may cause rebalance..which
> > impact the ceph cluster performance..
> >
> > Please share name space detail like how to use etc
> >
>
> Would it be RBD, Rados, CephFS? What would you be using on top of Ceph?
>
> Wido
>
> >
> >
> > On Tue, 16 Jul, 2019, 9:30 PM Paul Emmerich,  > > wrote:
> >
> > 100+ pools work fine if you can get the PG count right (auto-scaler
> > helps, there are some options that you'll need to tune for small-ish
> > pools).
> >
> > But it's not a "nice" setup. Have you considered using namespaces
> > instead?
> >
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at
> https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io 
> > Tel: +49 89 1896585 90
> >
> >
> > On Tue, Jul 16, 2019 at 4:17 PM M Ranga Swami Reddy
> > mailto:swamire...@gmail.com>> wrote:
> >
> > Hello - I have created 10 nodes ceph cluster with 14.x version.
> > Can you please confirm below:
> >
> > Q1 - Can I create 100+ pool (or more) on the cluster? (the
> > reason is - creating a pool per project). Any limitation on pool
> > creation?
> >
> > Q2 - In the above pool - I use 128 PG-NUM - to start with and
> > enable autoscale for PG_NUM, so that based on the data in the
> > pool, PG_NUM will increase by ceph itself.
> >
> > Let me know if any limitations for the above and any fore see
> issue?
> >
> > Thanks
> > Swami
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Jason Dillaman
On Fri, Jul 26, 2019 at 9:26 AM Mykola Golub  wrote:
>
> On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote:
> > Thank you for the clarification.
> >
> > But i was trying with openstack-cinder.. when i load some data into the
> > volume around 50gb, the image sync will stop by 5 % or something within
> > 15%...  What could be the reason?
>
> I suppose you see image sync stop in mirror status output? Could you
> please provide an example? And I suppose you don't see any other
> messages in rbd-mirror log apart from what you have already posted?
> Depending on configuration rbd-mirror might log in several logs. Could
> you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'`
> may be useful for this.
>
> BTW, what rbd-mirror version are you running?

>From the previous thread a few days ago (not sure why a new thread was
started on this same topic), to me it sounded like one or more OSDs
isn't reachable from the secondary site:

> > Scenario 2:
> > but when i create a 50gb volume with another glance image. Volume  get 
> > created. and in the backend i could see the rbd images both in primary and 
> > secondary
> >
> > From rbd mirror image status i found secondary cluster starts copying , and 
> > syncing was struck at around 14 %... It will be in 14 % .. no progress at 
> > all. should I set any parameters for this like timeout??
> >
> > I manually checked rbd --cluster primary object-map check ..  
> > No results came for the objects and the command was in hanging.. Thats why 
> > got worried on the failed to map object key log. I couldnt even rebuild the 
> > object map.

> It sounds like one or more of your primary OSDs are not reachable from
> the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
> rbd = 20", you should be able to see the last object it attempted to
> copy. From that, you could use "ceph osd map" to figure out the
> primary OSD for that object.



> --
> Mykola Golub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Mykola Golub
On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote:
> Thank you for the clarification.
> 
> But i was trying with openstack-cinder.. when i load some data into the
> volume around 50gb, the image sync will stop by 5 % or something within
> 15%...  What could be the reason?

I suppose you see image sync stop in mirror status output? Could you
please provide an example? And I suppose you don't see any other
messages in rbd-mirror log apart from what you have already posted?
Depending on configuration rbd-mirror might log in several logs. Could
you please try to find all its logs? `lsof |grep 'rbd-mirror.*log'`
may be useful for this.

BTW, what rbd-mirror version are you running?

-- 
Mykola Golub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Stefan Kooman
Quoting Peter Sabaini (pe...@sabaini.at):
> What kind of commit/apply latency increases have you seen when adding a
> large numbers of OSDs? I'm nervous how sensitive workloads might react
> here, esp. with spinners.

You mean when there is backfilling going on? Instead of doing "a big
bang" you can also use Dan van der Ster's trick with upmap balancer:
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py

See
https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

So you would still have norebalance / nobackfill / norecover and ceph
balancer off. Then you run the script as many times as necessary to get
"HEALTH_OK" again (on clusters other than nautilus) and there a no more
PGs remapped. Unset the flags and enable the ceph balancer ... now the
balancer will slowly move PGs to the new OSDs.

We've used this trick to increase the number of PGs on a pool, and will
use this to expand the cluster in the near future.

This only works if you can use the balancer in "upmap" mode. Note that
using upmap requires that all clients be Luminous or newer. If you are
using cephfs kernel client it might report as not compatible (jewel) but
recent linux distributions work well (Ubuntu 18.04 / CentOS 7).

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
What kind of commit/apply latency increases have you seen when adding a
large numbers of OSDs? I'm nervous how sensitive workloads might react
here, esp. with spinners.

cheers,
peter.

On 24.07.19 20:58, Reed Dier wrote:
> Just chiming in to say that this too has been my preferred method for
> adding [large numbers of] OSDs.
> 
> Set the norebalance nobackfill flags.
> Create all the OSDs, and verify everything looks good.
> Make sure my max_backfills, recovery_max_active are as expected.
> Make sure everything has peered.
> Unset flags and let it run.
> 
> One crush map change, one data movement.
> 
> Reed
> 
>>
>> That works, but with newer releases I've been doing this:
>>
>> - Make sure cluster is HEALTH_OK
>> - Set the 'norebalance' flag (and usually nobackfill)
>> - Add all the OSDs
>> - Wait for the PGs to peer. I usually wait a few minutes
>> - Remove the norebalance and nobackfill flag
>> - Wait for HEALTH_OK
>>
>> Wido
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding block.db afterwards

2019-07-26 Thread Igor Fedotov

Hi Frank,

you can specify new db size in the following way:

CEPH_ARGS="--bluestore-block-db-size 107374182400" ceph-bluestore-tool 
bluefs-bdev-new-db 



Thanks,

Igor

On 7/26/2019 2:49 PM, Frank Rothenstein wrote:

Hi,

I'm running a small (3 hosts) ceph cluster. ATM I want to speed up my
cluster by adding seperate block.db SSDs. OSDs at creation were pure
spinning HDDs, no "--block.db /dev/sdxx"-parameter. So there is no
symlink block.db in /var/lib/ceph/osd/ceph-xx/
There is a "ceph-bluestore-tool bluefs-bdev-new-db ..." which should
check existence of block.db and create an new one.
Documentation for this tool is fairly basic, error-output also, so I'm
stuck. (Only way seems to be destroying and recreation of OSD with
block.db-parameter)

CLI:

ceph-bluestore-tool bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-35
--dev-target /dev/sdaa1
Output:

inferring bluefs devices from bluestore path
DB size isn't specified, please set Ceph bluestore-block-db-size config
parameter

for DB size I tried "bluestore_block_db_size = 32212254720" in
ceph.conf
for path I tried different versions

Any help an this would be appreciated.

Frank




Frank Rothenstein

Systemadministrator
Fon: +49 3821 700 125
Fax: +49 3821 700 190
Internet: www.bodden-kliniken.de
E-Mail: f.rothenst...@bodden-kliniken.de


_
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
Sandhufe 2
18311 Ribnitz-Damgarten

Telefon: 03821-700-0
Telefax: 03821-700-240

E-Mail: i...@bodden-kliniken.de
Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 
079/133/40188
Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko 
Milski, MBA; Dipl.-Kfm.(FH) Gunnar Bölke


Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten 
Adressaten bestimmt. Wenn Sie nicht der
vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, 
beachten Sie bitte, dass jede
Form der Veröffentlichung, Vervielfältigung oder Weitergabe des 
Inhalts dieser E-Mail unzulässig ist.
Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu 
löschen.


      © BODDEN-KLINIKEN Ribnitz-Damgarten GmbH 2019
*** Virenfrei durch Kerio Mail Server AntiSPAM und Bitdefender 
Antivirus ***



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding block.db afterwards

2019-07-26 Thread Frank Rothenstein
Hi,

I'm running a small (3 hosts) ceph cluster. ATM I want to speed up my
cluster by adding seperate block.db SSDs. OSDs at creation were pure
spinning HDDs, no "--block.db /dev/sdxx"-parameter. So there is no
symlink block.db in /var/lib/ceph/osd/ceph-xx/
There is a "ceph-bluestore-tool bluefs-bdev-new-db ..." which should
check existence of block.db and create an new one. 
Documentation for this tool is fairly basic, error-output also, so I'm
stuck. (Only way seems to be destroying and recreation of OSD with
block.db-parameter)

CLI:

ceph-bluestore-tool bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-35 
--dev-target /dev/sdaa1
Output:

inferring bluefs devices from bluestore path
DB size isn't specified, please set Ceph bluestore-block-db-size config
parameter

for DB size I tried "bluestore_block_db_size = 32212254720" in
ceph.conf
for path I tried different versions 

Any help an this would be appreciated.

Frank




Frank Rothenstein 

Systemadministrator
Fon: +49 3821 700 125
Fax: +49 3821 700 190Internet: www.bodden-kliniken.de
E-Mail: f.rothenst...@bodden-kliniken.de


_
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH
Sandhufe 2
18311 Ribnitz-Damgarten

Telefon: 03821-700-0
Telefax: 03821-700-240

E-Mail: i...@bodden-kliniken.de 
Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 
079/133/40188
Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski, 
MBA; Dipl.-Kfm.(FH) Gunnar Bölke

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten 
bestimmt. Wenn Sie nicht der 
vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten 
Sie bitte, dass jede 
Form der Veröffentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser 
E-Mail unzulässig ist. 
Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen. 

      © BODDEN-KLINIKEN Ribnitz-Damgarten GmbH 2019
*** Virenfrei durch Kerio Mail Server AntiSPAM und Bitdefender Antivirus ***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New best practices for osds???

2019-07-26 Thread Mark Nelson


On 7/25/19 9:27 PM, Anthony D'Atri wrote:

We run few hundred HDD OSDs for our backup cluster, we set one RAID 0 per HDD 
in order to be able
to use -battery protected- write cache from the RAID controller. It really 
improves performance, for both
bluestore and filestore OSDs.

Having run something like 6000 HDD-based FileStore OSDs with colo journals on 
RAID HBAs I’d like to offer some contrasting thoughts.

TL;DR:  Never again!  False economy.  ymmv.

Details:

* The implementation predated me and was carved in dogfood^H^H^H^H^H^H^Hstone, 
try as I might I could not get it fixed.

* Single-drive RAID0 VDs were created to expose the underlying drives to the 
OS.  When the architecture was conceived, the HBAs in question didn’t have 
JBOD/passthrough, though a firmware update shortly thereafter did bring that 
ability.  That caching was a function of VDs wasn’t known at the time.

* My sense was that the FBWC did offer some throughput performance for at least 
some workloads, but at the cost of latency.

* Using a RAID-capable HBA in IR mode with FBWC meant having to monitor for the 
presence and status of the BBU/supercap

* The utility needed for that monitoring, when invoked with ostensibly 
innocuous parameters, would lock up the HBA for several seconds.

* Traditional BBUs are rated for lifespan of *only* one year.  FBWCs maybe for 
… three?  Significant cost to RMA or replace them:  time and karma wasted 
fighting with the system vendor CSO, engineer and remote hands time to take the 
system down and swap.  And then the connectors for the supercap were touchy; 
15% of the time the system would come up and not see it at all.

* The RAID-capable HBA itself + FBWC + supercap cost …. a couple three hundred 
more than an IT / JBOD equivalent

* There was a little-known flaw in secondary firmware that caused FBWC / 
supercap modules to be falsely reported bad.  The system vendor acted like I 
was making this up and washed their hands of it, even when I provided them the 
HBA vendors’ artifacts and documents.

* There were two design flaws that could and did result in cache data loss when 
a system rebooted or lost power.  There was a field notice for this, which 
required harvesting serial numbers and checking each.  The affected range of 
serials was quite a bit larger than what the validation tool admitted.  I had 
to manage the replacement of 302+ of these in production use, each needing 
engineer time time to manage Ceph, to do the hands work, and hassle with RMA 
paperwork.

* There was a firmware / utility design flaw that caused the HDD’s onboard 
volatile write cache to be silently turned on, despite an HBA config dump 
showing a setting that should have left it off.  Again data was lost when a 
node crashed hard or lost power.

* There was another firmware flaw that prevented booting if there was pinned / 
preserved cache data after a reboot / power loss if a drive failed or was 
yanked.  The HBA’s option ROM utility would block booting and wait for input on 
the console.  One could get in and tell it to discard that cache, but it would 
not actually do so, instead looping back to the same screen.  The only way to 
get the system to boot again was to replace and RMA the HBA.

* The VD layer lessened the usefulness of iostat data.  It also complicated OSD 
deployment / removal / replacement.  A smartctl hack to access SMART attributes 
below the VD layer would work on some systems but not others.

* The HBA model in question would work normally with a certain CPU generation, 
but not with slightly newer servers with the next CPU generation.  They would 
randomly, on roughly one boot out of five, negotiate PCIe gen3 which they 
weren’t capable of handling properly, and would silently run at about 20% of 
normal speed.  Granted this isn’t necessarily specific to an IR HBA.



Add it all up, and my assertion is that the money, time, karma, and user impact 
you save from NOT dealing with a RAID HBA *more than pays for* using SSDs for 
OSDs instead.



This is worse than I feared, but very much in the realm of concerns I 
had with using single-disk RAID0 setups.  Thank you very much for 
posting your experience!  My money would still be on using *high write 
endurance* NVMes for DB/WAL and whatever I could afford for block.  I 
still have vague hopes that in the long run we move away from the idea 
of of distinct block/db/wal devices and toward pools of resources that 
the OSD makes it's own decisions about.  I'd like to be able to hand the 
OSD a pile of hardware and say "go".  That might mean something like an 
internal caching scheme but with slow eviction and initial placement 
hints (IE L0 SST files should nearly always end up on fast storage).



If it were structured like the PriorityCacheManager, we'd have SSTs for 
different column family prefixes (OMAP, onodes, etc) competing for fast 
BlueFS device storage with bluestore at different priority levels (so 
for example onode L0 would be very

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
Thank you for the clarification.

But i was trying with openstack-cinder.. when i load some data into the
volume around 50gb,
either it will say

ImageReplayer: 0x7f7264016c50 [17/244d1ab5-8147-45ed-8cd1-9b3613f1f104]
handle_shut_down: mirror image no longer exists

or

 the image sync will stop by 5 % or something within 15%...  What could be
the reason?


On Fri, Jul 26, 2019 at 4:40 PM Ajitha Robert 
wrote:

>
>
>
>
>
>
> On Fri, Jul 26, 2019 at 3:01 PM Mykola Golub 
> wrote:
>
>> On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote:
>> >  I have a rbd mirroring setup with primary and secondary clusters as
>> peers
>> > and I have a pool enabled image mode.., In this i created a rbd image ,
>> > enabled with journaling.
>> > But whenever i enable mirroring on the image,  I m getting error in
>> > rbdmirror.log and  osd.log.
>> > I have increased the timeouts.. nothing worked and couldnt traceout the
>> > error
>> > please guide me to solve this error.
>> >
>> > *Logs*
>> > http://paste.openstack.org/show/754766/
>>
>> What do you mean by "nothing worked"? According to mirroring status
>> the image is mirroring: it is in "up+stopped" state on the primary as
>> expected, and in "up+replaying" state on the secondary with 0 entries
>> behind master.
>>
>> The "failed to get omap key" error in the osd log is harmless, and
>> just a week ago the fix was merged upstream not to display it.
>>
>> The cause of "InstanceWatcher: ... resending after timeout" error in
>> the rbd-mirror log is not clear but if it is not repeating it is
>> harmless too.
>>
>> I see you were trying to map the image with krbd. It is expected to
>> fail as the krbd does not support "journaling" feature, which is
>> necessary for mirroring. You can access those images only with librbd
>> (e.g. mapping with rbd-nbd driver or via qemu).
>>
>> --
>> Mykola Golub
>>
>
>
> --
>
>
> *Regards,Ajitha R*
>


-- 


*Regards,Ajitha R*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
Thank you for the clarification.

But i was trying with openstack-cinder.. when i load some data into the
volume around 50gb, the image sync will stop by 5 % or something within
15%...  What could be the reason?






On Fri, Jul 26, 2019 at 3:01 PM Mykola Golub 
wrote:

> On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote:
> >  I have a rbd mirroring setup with primary and secondary clusters as
> peers
> > and I have a pool enabled image mode.., In this i created a rbd image ,
> > enabled with journaling.
> > But whenever i enable mirroring on the image,  I m getting error in
> > rbdmirror.log and  osd.log.
> > I have increased the timeouts.. nothing worked and couldnt traceout the
> > error
> > please guide me to solve this error.
> >
> > *Logs*
> > http://paste.openstack.org/show/754766/
>
> What do you mean by "nothing worked"? According to mirroring status
> the image is mirroring: it is in "up+stopped" state on the primary as
> expected, and in "up+replaying" state on the secondary with 0 entries
> behind master.
>
> The "failed to get omap key" error in the osd log is harmless, and
> just a week ago the fix was merged upstream not to display it.
>
> The cause of "InstanceWatcher: ... resending after timeout" error in
> the rbd-mirror log is not clear but if it is not repeating it is
> harmless too.
>
> I see you were trying to map the image with krbd. It is expected to
> fail as the krbd does not support "journaling" feature, which is
> necessary for mirroring. You can access those images only with librbd
> (e.g. mapping with rbd-nbd driver or via qemu).
>
> --
> Mykola Golub
>


-- 


*Regards,Ajitha R*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Mykola Golub
On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote:
>  I have a rbd mirroring setup with primary and secondary clusters as peers
> and I have a pool enabled image mode.., In this i created a rbd image ,
> enabled with journaling.
> But whenever i enable mirroring on the image,  I m getting error in
> rbdmirror.log and  osd.log.
> I have increased the timeouts.. nothing worked and couldnt traceout the
> error
> please guide me to solve this error.
> 
> *Logs*
> http://paste.openstack.org/show/754766/

What do you mean by "nothing worked"? According to mirroring status
the image is mirroring: it is in "up+stopped" state on the primary as
expected, and in "up+replaying" state on the secondary with 0 entries
behind master.

The "failed to get omap key" error in the osd log is harmless, and
just a week ago the fix was merged upstream not to display it.

The cause of "InstanceWatcher: ... resending after timeout" error in
the rbd-mirror log is not clear but if it is not repeating it is
harmless too.

I see you were trying to map the image with krbd. It is expected to
fail as the krbd does not support "journaling" feature, which is
necessary for mirroring. You can access those images only with librbd
(e.g. mapping with rbd-nbd driver or via qemu).

-- 
Mykola Golub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Stefan Kooman
Hi List,

We are planning to move a filesystem workload (currently nfs) to CephFS.
It's around 29 TB. The unusual thing here is the amount of directories
in use to host the files. In order to combat a "too many files in one
directory" scenario a "let's make use of recursive directories" approach.
Not ideal either. This workload is supposed to be moved to (Ceph) S3
sometime in the future, but until then, it has to go to a shared
filesystem ...

So what is unusual about this? The directory layout looks like this

/data/files/00/00/[0-8][0-9]/[0-9]/ from this point on there will be 7
directories created to store 1 file.

Total amount of directories in a file path is 14. There are around 150 M
files in 400 M directories.

The working set won't be big. Most files will just sit around and will
not be touched. The active amount of files wil be a few thousand.

We are wondering if this kind of directory structure is suitable for
CephFS. Might the MDS get difficulties with keeping up that many inodes
/ dentries or doesn't it care at all?

The amount of metadata overhead might be horrible, but we will test that
out.

Thanks,

Stefan


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] loaded dup inode (but no mds crash)

2019-07-26 Thread Dan van der Ster
Hi all,

Last night we had 60 ERRs like this:

2019-07-26 00:56:44.479240 7efc6cca1700  0 mds.2.cache.dir(0x617)
_fetched  badness: got (but i already had) [inode 0x10006289992
[...2,head] ~mds2/stray1/10006289992 auth v14438219972 dirtyparent
s=116637332 nl=8 n(v0 rc2019-07-26 00:56:17.199090 b116637332 1=1+0)
(iversion lock) | request=0 lock=0 caps=0 remoteparent=0 dirtyparent=1
dirty=1 authpin=0 0x5561321eee00] mode 33188 mtime 2017-07-11
16:20:50.00
2019-07-26 00:56:44.479333 7efc6cca1700 -1 log_channel(cluster) log
[ERR] : loaded dup inode 0x10006289992 [2,head] v14437387948 at
~mds2/stray3/10006289992, but inode 0x10006289992.head v14438219972
already exists at ~mds2/stray1/10006289992

Looking through this ML this often corresponds to crashing MDS's and
needing a disaster recovery procedure to follow.
We haven't had any crash

Is there something we should do *now* to fix these before any assert
is triggered?

Thanks!

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Disarmed] Re: ceph-ansible firewalld blocking ceph comms

2019-07-26 Thread Nathan Harper
The firewalld service 'ceph' includes the range of ports required.

Not sure why it helped, but after a reboot of each OSD node the issue went
away!

On Thu, 25 Jul 2019 at 23:14,  wrote:

> Nathan;
>
> I'm not an expert on firewalld, but shouldn't you have a list of open
> ports?
>
>  ports: ?
>
> Here's the configuration on my test cluster:
> public (active)
>   target: default
>   icmp-block-inversion: no
>   interfaces: bond0
>   sources:
>   services: ssh dhcpv6-client
>   ports: 6789/tcp 3300/tcp 6800-7300/tcp 8443/tcp
>   protocols:
>   masquerade: no
>   forward-ports:
>   source-ports:
>   icmp-blocks:
>   rich rules:
> trusted (active)
>   target: ACCEPT
>   icmp-block-inversion: no
>   interfaces: bond1
>   sources:
>   services:
>   ports: 6789/tcp 3300/tcp 6800-7300/tcp 8443/tcp
>   protocols:
>   masquerade: no
>   forward-ports:
>   source-ports:
>   icmp-blocks:
>   rich rules:
>
> I use interfaces as selectors, but would think source selectors would work
> the same.
>
> You might start by adding the MON ports to the firewall on the MONs:
> firewall-cmd --zone=public --add-port=6789/tcp --permanent
> firewall-cmd --zone=public --add-port=3300/tcp --permanent
> firewall-cmd --reload
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Nathan Harper
> Sent: Thursday, July 25, 2019 2:08 PM
> To: ceph-us...@ceph.com
> Subject: [Disarmed] Re: [ceph-users] ceph-ansible firewalld blocking ceph
> comms
>
> This is a new issue to us, and did not have the same problem running the
> same activity on our test system.
> Regards,
> Nathan
>
> On 25 Jul 2019, at 22:00, solarflow99  wrote:
> I used ceph-ansible just fine, never had this problem.
>
> On Thu, Jul 25, 2019 at 1:31 PM Nathan Harper 
> wrote:
> Hi all,
>
> We've run into a strange issue with one of our clusters managed with
> ceph-ansible.   We're adding some RGW nodes to our cluster, and so re-ran
> site.yml against the cluster.  The new RGWs added successfully, but
>
> When we did, we started to get slow requests, effectively across the whole
> cluster.   Quickly we realised that the firewall was now (apparently)
> blocking Ceph communications.   I say apparently, because the config looks
> correct:
>
> [root@osdsrv05 ~]# firewall-cmd --list-all
> public (active)
>   target: default
>   icmp-block-inversion: no
>   interfaces:
>   sources: MailScanner has detected a possible fraud attempt from
> "172.20.22.0" claiming to be 172.20.22.0/24 MailScanner has detected a
> possible fraud attempt from "172.20.23.0" claiming to be 172.20.23.0/24
>   services: ssh dhcpv6-client ceph
>   ports:
>   protocols:
>   masquerade: no
>   forward-ports:
>   source-ports:
>   icmp-blocks:
>   rich rules:
>
> If we drop the firewall everything goes back healthy.   All the clients
> (Openstack cinder) are on the 172.20.22.0 network (172.20.23.0 is the
> replication network).  Has anyone seen this?
> --
> Nathan Harper // IT Systems Lead
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
*Nathan Harper* // IT Systems Lead

*e: *nathan.har...@cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
*w: *www.cfms.org.uk
CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
Green // Bristol // BS16 7FR

CFMS Services Ltd is registered in England and Wales No 05742022 - a
subsidiary of CFMS Ltd
CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1
4QP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
 I have a rbd mirroring setup with primary and secondary clusters as peers
and I have a pool enabled image mode.., In this i created a rbd image ,
enabled with journaling.
But whenever i enable mirroring on the image,  I m getting error in
rbdmirror.log and  osd.log.
I have increased the timeouts.. nothing worked and couldnt traceout the
error
please guide me to solve this error.

*Logs*
http://paste.openstack.org/show/754766/

-- 


*Regards,Ajitha R*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com